mirror of
https://github.com/Farama-Foundation/Gymnasium.git
synced 2025-08-01 06:07:08 +00:00
Merge v1.0.0 (#682)
Co-authored-by: Kallinteris Andreas <30759571+Kallinteris-Andreas@users.noreply.github.com> Co-authored-by: Jet <38184875+jjshoots@users.noreply.github.com> Co-authored-by: Omar Younis <42100908+younik@users.noreply.github.com>
This commit is contained in:
172
docs/introduction/basic_usage.md
Normal file
172
docs/introduction/basic_usage.md
Normal file
@@ -0,0 +1,172 @@
|
||||
---
|
||||
layout: "contents"
|
||||
title: Basic Usage
|
||||
firstpage:
|
||||
---
|
||||
|
||||
# Basic Usage
|
||||
|
||||
```{eval-rst}
|
||||
.. py:currentmodule:: gymnasium
|
||||
|
||||
Gymnasium is a project that provides an API for all single agent reinforcement learning environments, and includes implementations of common environments: cartpole, pendulum, mountain-car, mujoco, atari, and more.
|
||||
|
||||
The API contains four key functions: :meth:`make`, :meth:`Env.reset`, :meth:`Env.step` and :meth:`Env.render`, that this basic usage will introduce you to. At the core of Gymnasium is :class:`Env`, a high-level python class representing a markov decision process (MDP) from reinforcement learning theory (this is not a perfect reconstruction, and is missing several components of MDPs). Within gymnasium, environments (MDPs) are implemented as :class:`Env` classes, along with :class:`Wrapper`, provide helpful utilities to change actions passed to the environment and modified the observations, rewards, termination or truncations conditions passed back to the user.
|
||||
```
|
||||
|
||||
## Initializing Environments
|
||||
|
||||
```{eval-rst}
|
||||
.. py:currentmodule:: gymnasium
|
||||
|
||||
Initializing environments is very easy in Gymnasium and can be done via the :meth:`make` function:
|
||||
```
|
||||
|
||||
```python
|
||||
import gymnasium as gym
|
||||
env = gym.make('CartPole-v1')
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. py:currentmodule:: gymnasium
|
||||
|
||||
This will return an :class:`Env` for users to interact with. To see all environments you can create, use :meth:`pprint_registry`. Furthermore, :meth:`make` provides a number of additional arguments for specifying keywords to the environment, adding more or less wrappers, etc.
|
||||
```
|
||||
|
||||
## Interacting with the Environment
|
||||
|
||||
The classic "agent-environment loop" pictured below is simplified representation of reinforcement learning that Gymnasium implements.
|
||||
|
||||
```{image} /_static/diagrams/AE_loop.png
|
||||
:width: 50%
|
||||
:align: center
|
||||
:class: only-light
|
||||
```
|
||||
|
||||
```{image} /_static/diagrams/AE_loop_dark.png
|
||||
:width: 50%
|
||||
:align: center
|
||||
:class: only-dark
|
||||
```
|
||||
|
||||
This loop is implemented using the following gymnasium code
|
||||
|
||||
```python
|
||||
import gymnasium as gym
|
||||
env = gym.make("LunarLander-v2", render_mode="human")
|
||||
observation, info = env.reset()
|
||||
|
||||
for _ in range(1000):
|
||||
action = env.action_space.sample() # agent policy that uses the observation and info
|
||||
observation, reward, terminated, truncated, info = env.step(action)
|
||||
|
||||
if terminated or truncated:
|
||||
observation, info = env.reset()
|
||||
|
||||
env.close()
|
||||
```
|
||||
|
||||
The output should look something like this:
|
||||
|
||||
```{figure} https://user-images.githubusercontent.com/15806078/153222406-af5ce6f0-4696-4a24-a683-46ad4939170c.gif
|
||||
:width: 50%
|
||||
:align: center
|
||||
```
|
||||
|
||||
### Explaining the code
|
||||
|
||||
```{eval-rst}
|
||||
.. py:currentmodule:: gymnasium
|
||||
|
||||
First, an environment is created using :meth:`make` with an additional keyword ``"render_mode"`` that specifies how the environment should be visualised.
|
||||
|
||||
.. py:currentmodule:: gymnasium.Env
|
||||
|
||||
See :meth:`render` for details on the default meaning of different render modes. In this example, we use the ``"LunarLander"`` environment where the agent controls a spaceship that needs to land safely.
|
||||
|
||||
After initializing the environment, we :meth:`reset` the environment to get the first observation of the environment. For initializing the environment with a particular random seed or options (see environment documentation for possible values) use the ``seed`` or ``options`` parameters with :meth:`reset`.
|
||||
|
||||
Next, the agent performs an action in the environment, :meth:`step`, this can be imagined as moving a robot or pressing a button on a games' controller that causes a change within the environment. As a result, the agent receives a new observation from the updated environment along with a reward for taking the action. This reward could be for instance positive for destroying an enemy or a negative reward for moving into lava. One such action-observation exchange is referred to as a **timestep**.
|
||||
|
||||
However, after some timesteps, the environment may end, this is called the terminal state. For instance, the robot may have crashed, or the agent have succeeded in completing a task, the environment will need to stop as the agent cannot continue. In gymnasium, if the environment has terminated, this is returned by :meth:`step`. Similarly, we may also want the environment to end after a fixed number of timesteps, in this case, the environment issues a truncated signal. If either of ``terminated`` or ``truncated`` are ``True`` then :meth:`reset` should be called next to restart the environment.
|
||||
```
|
||||
|
||||
## Action and observation spaces
|
||||
|
||||
```{eval-rst}
|
||||
.. py:currentmodule:: gymnasium.Env
|
||||
|
||||
Every environment specifies the format of valid actions and observations with the :attr:`action_space` and :attr:`observation_space` attributes. This is helpful for both knowing the expected input and output of the environment as all valid actions and observation should be contained with the respective space.
|
||||
|
||||
In the example, we sampled random actions via ``env.action_space.sample()`` instead of using an agent policy, mapping observations to actions which users will want to make. See one of the agent tutorials for an example of creating and training an agent policy.
|
||||
|
||||
.. py:currentmodule:: gymnasium
|
||||
|
||||
Every environment should have the attributes :attr:`Env.action_space` and :attr:`Env.observation_space`, both of which should be instances of classes that inherit from :class:`spaces.Space`. Gymnasium has support for a majority of possible spaces users might need:
|
||||
|
||||
.. py:currentmodule:: gymnasium.spaces
|
||||
|
||||
- :class:`Box`: describes an n-dimensional continuous space. It's a bounded space where we can define the upper and lower
|
||||
limits which describe the valid values our observations can take.
|
||||
- :class:`Discrete`: describes a discrete space where ``{0, 1, ..., n-1}`` are the possible values our observation or action can take.
|
||||
Values can be shifted to ``{a, a+1, ..., a+n-1}`` using an optional argument.
|
||||
- :class:`Dict`: represents a dictionary of simple spaces.
|
||||
- :class:`Tuple`: represents a tuple of simple spaces.
|
||||
- :class:`MultiBinary`: creates an n-shape binary space. Argument n can be a number or a list of numbers.
|
||||
- :class:`MultiDiscrete`: consists of a series of :class:`Discrete` action spaces with a different number of actions in each element.
|
||||
|
||||
For example usage of spaces, see their `documentation </api/spaces>`_ along with `utility functions </api/spaces/utils>`_. There are a couple of more niche spaces :class:`Graph`, :class:`Sequence` and :class:`Text`.
|
||||
```
|
||||
|
||||
## Modifying the environment
|
||||
|
||||
```{eval-rst}
|
||||
.. py:currentmodule:: gymnasium.wrappers
|
||||
|
||||
Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly. Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can also be chained to combine their effects. Most environments that are generated via ``gymnasium.make`` will already be wrapped by default using the :class:`TimeLimitV0`, :class:`OrderEnforcingV0` and :class:`PassiveEnvCheckerV0`.
|
||||
|
||||
In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along with (possibly optional) parameters to the wrapper's constructor:
|
||||
```
|
||||
|
||||
```python
|
||||
>>> import gymnasium as gym
|
||||
>>> from gymnasium.wrappers import FlattenObservation
|
||||
>>> env = gym.make("CarRacing-v2")
|
||||
>>> env.observation_space.shape
|
||||
(96, 96, 3)
|
||||
>>> wrapped_env = FlattenObservation(env)
|
||||
>>> wrapped_env.observation_space.shape
|
||||
(27648,)
|
||||
```
|
||||
|
||||
```{eval-rst}
|
||||
.. py:currentmodule:: gymnasium.wrappers
|
||||
|
||||
Gymnasium already provides many commonly used wrappers for you. Some examples:
|
||||
|
||||
- :class:`TimeLimitV0`: Issue a truncated signal if a maximum number of timesteps has been exceeded (or the base environment has issued a truncated signal).
|
||||
- :class:`ClipActionV0`: Clip the action such that it lies in the action space (of type `Box`).
|
||||
- :class:`RescaleActionV0`: Rescale actions to lie in a specified interval
|
||||
- :class:`TimeAwareObservationV0`: Add information about the index of timestep to observation. In some cases helpful to ensure that transitions are Markov.
|
||||
```
|
||||
|
||||
For a full list of implemented wrappers in gymnasium, see [wrappers](/api/wrappers).
|
||||
|
||||
```{eval-rst}
|
||||
.. py:currentmodule:: gymnasium.Env
|
||||
|
||||
If you have a wrapped environment, and you want to get the unwrapped environment underneath all the layers of wrappers (so that you can manually call a function or change some underlying aspect of the environment), you can use the :attr:`unwrapped` attribute. If the environment is already a base environment, the :attr:`unwrapped` attribute will just return itself.
|
||||
```
|
||||
|
||||
```python
|
||||
>>> wrapped_env
|
||||
<FlattenObservation<TimeLimit<OrderEnforcing<PassiveEnvChecker<CarRacing<CarRacing-v2>>>>>>
|
||||
>>> wrapped_env.unwrapped
|
||||
<gymnasium.envs.box2d.car_racing.CarRacing object at 0x7f04efcb8850>
|
||||
```
|
||||
|
||||
## More information
|
||||
|
||||
* [Making a Custom environment using the Gymnasium API](/tutorials/gymnasium_basics/environment_creation/)
|
||||
* [Training an agent to play blackjack](/tutorials/training_agents/blackjack_tutorial)
|
||||
* [Compatibility with OpenAI Gym](/introduction/gym_compatibility)
|
53
docs/introduction/gym_compatibility.md
Normal file
53
docs/introduction/gym_compatibility.md
Normal file
@@ -0,0 +1,53 @@
|
||||
---
|
||||
layout: "contents"
|
||||
title: Compatibility With Gym
|
||||
---
|
||||
|
||||
# Compatibility with Gym
|
||||
|
||||
Gymnasium provides a number of compatibility methods for a range of Environment implementations.
|
||||
|
||||
## Loading OpenAI Gym environments
|
||||
|
||||
```{eval-rst}
|
||||
.. py:currentmodule:: gymnasium.wrappers
|
||||
|
||||
For environments that are registered solely in OpenAI Gym and not in Gymnasium, Gymnasium v0.26.3 and above allows importing them through either a special environment or a wrapper. The ``"GymV26Environment-v0"`` environment was introduced in Gymnasium v0.26.3, and allows importing of Gym environments through the ``env_name`` argument along with other relevant kwargs environment kwargs. To perform conversion through a wrapper, the environment itself can be passed to the wrapper :class:`EnvCompatibility` through the ``env`` kwarg.
|
||||
```
|
||||
|
||||
An example of this is atari 0.8.0 which does not have a gymnasium implementation.
|
||||
```python
|
||||
import gymnasium as gym
|
||||
|
||||
env = gym.make("GymV26Environment-v0", env_id="ALE/Pong-v5")
|
||||
```
|
||||
|
||||
## Gym v0.21 Environment Compatibility
|
||||
|
||||
```{eval-rst}
|
||||
.. py:currentmodule:: gymnasium
|
||||
|
||||
A number of environments have not updated to the recent Gym changes, in particular since v0.21. This update is significant for the introduction of ``termination`` and ``truncation`` signatures in favour of the previously used ``done``. To allow backward compatibility, Gym and Gymnasium v0.26+ include an ``apply_api_compatibility`` kwarg when calling :meth:`make` that automatically converts a v0.21 API compliant environment to one that is compatible with v0.26+.
|
||||
```
|
||||
|
||||
```python
|
||||
import gym
|
||||
|
||||
env = gym.make("OldV21Env-v0", apply_api_compatibility=True)
|
||||
```
|
||||
|
||||
Additionally, in Gymnasium we provide specialist environments for compatibility that for ``env_id`` will call ``gym.make``.
|
||||
```python
|
||||
import gymnasium
|
||||
|
||||
env = gymnasium.make("GymV21Environment-v0", env_id="CartPole-v1", render_mode="human")
|
||||
# or
|
||||
env = gymnasium.make("GymV21Environment-v0", env=OldV21Env())
|
||||
|
||||
```
|
||||
|
||||
## Step API Compatibility
|
||||
|
||||
```{eval-rst}
|
||||
If environments implement the (old) done step API, Gymnasium provides both functions (:meth:`gymnasium.utils.step_api_compatibility.convert_to_terminated_truncated_step_api`) and wrappers (:class:`gymnasium.wrappers.StepAPICompatibility`) that will convert an environment with the old step API (using ``done``) to the new step API (using ``termination`` and ``truncation``).
|
||||
```
|
103
docs/introduction/migration-guide.md
Normal file
103
docs/introduction/migration-guide.md
Normal file
@@ -0,0 +1,103 @@
|
||||
---
|
||||
layout: "contents"
|
||||
title: Migration Guide
|
||||
---
|
||||
|
||||
# v0.21 to v0.26 Migration Guide
|
||||
|
||||
```{eval-rst}
|
||||
.. py:currentmodule:: gymnasium.wrappers
|
||||
|
||||
Gymnasium is a fork of `OpenAI Gym v0.26 <https://github.com/openai/gym/releases/tag/0.26.2>`_, which introduced a large breaking change from `Gym v0.21 <https://github.com/openai/gym/releases/tag/v0.21.0>`_. In this guide, we briefly outline the API changes from Gym v0.21 - which a number of tutorials have been written for - to Gym v0.26. For environments still stuck in the v0.21 API, users can use the :class:`EnvCompatibility` wrapper to convert them to v0.26 compliant.
|
||||
For more information, see the `guide </content/gym_compatibility>`_
|
||||
```
|
||||
|
||||
## Example code for v0.21
|
||||
|
||||
```python
|
||||
import gym
|
||||
env = gym.make("LunarLander-v2", options={})
|
||||
env.seed(123)
|
||||
observation = env.reset()
|
||||
|
||||
done = False
|
||||
while not done:
|
||||
action = env.action_space.sample() # agent policy that uses the observation and info
|
||||
observation, reward, done, info = env.step(action)
|
||||
|
||||
env.render(mode="human")
|
||||
|
||||
env.close()
|
||||
```
|
||||
|
||||
## Example code for v0.26
|
||||
|
||||
```python
|
||||
import gym
|
||||
env = gym.make("LunarLander-v2", render_mode="human")
|
||||
observation, info = env.reset(seed=123, options={})
|
||||
|
||||
done = False
|
||||
while not done:
|
||||
action = env.action_space.sample() # agent policy that uses the observation and info
|
||||
observation, reward, terminated, truncated, info = env.step(action)
|
||||
|
||||
done = terminated or truncated
|
||||
|
||||
env.close()
|
||||
```
|
||||
|
||||
## Seed and random number generator
|
||||
|
||||
```{eval-rst}
|
||||
.. py:currentmodule:: gymnasium.Env
|
||||
|
||||
The ``Env.seed()`` has been removed from the Gym v0.26 environments in favour of ``Env.reset(seed=seed)``. This allows seeding to only be changed on environment reset. The decision to remove ``seed`` was because some environments use emulators that cannot change random number generators within an episode and must be done at the beginning of a new episode. We are aware of cases where controlling the random number generator is important, in these cases, if the environment uses the built-in random number generator, users can set the seed manually with the attribute :attr:`np_random`.
|
||||
|
||||
Gymnasium v0.26 changed to using ``numpy.random.Generator`` instead of a custom random number generator. This means that several functions such as ``randint`` were removed in favour of ``integers``. While some environments might use external random number generator, we recommend using the attribute :attr:`np_random` that wrappers and external users can access and utilise.
|
||||
```
|
||||
|
||||
## Environment Reset
|
||||
|
||||
```{eval-rst}
|
||||
In v0.26, :meth:`reset` takes two optional parameters and returns one value. This contrasts to v0.21 which takes no parameters and returns ``None``. The two parameters are ``seed`` for setting the random number generator and ``options`` which allows additional data to be passed to the environment on reset. For example, in classic control, the ``options`` parameter now allows users to modify the range of the state bound. See the original `PR <https://github.com/openai/gym/pull/2921>`_ for more details.
|
||||
|
||||
:meth:`reset` further returns ``info``, similar to the ``info`` returned by :meth:`step`. This is important because ``info`` can include metrics or valid action mask that is used or saved in the next step.
|
||||
|
||||
To update older environments, we highly recommend that ``super().reset(seed=seed)`` is called on the first line of :meth:`reset`. This will automatically update the :attr:`np_random` with the seed value.
|
||||
```
|
||||
|
||||
## Environment Step
|
||||
|
||||
```{eval-rst}
|
||||
In v0.21, the type definition of :meth:`step` is ``tuple[ObsType, SupportsFloat, bool, dict[str, Any]`` representing the next observation, the reward from the step, if the episode is done and additional info from the step. Due to reproducibility issues that will be expanded on in a blog post soon, we have changed the type definition to ``tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]]`` adding an extra boolean value. This extra bool corresponds to the older `done` now changed to `terminated` and `truncated`. These changes were introduced in Gym `v0.26 <https://github.com/openai/gym/releases/tag/0.26.0>`_ (turned off by default in `v25 <https://github.com/openai/gym/releases/tag/0.25.0>`_).
|
||||
|
||||
For users wishing to update, in most cases, replacing ``done`` with ``terminated`` and ``truncated=False`` in :meth:`step` should address most issues. However, environments that have reasons for episode truncation rather than termination should read through the associated `PR <https://github.com/openai/gym/pull/2752>`_. For users looping through an environment, they should modify ``done = terminated or truncated`` as is show in the example code. For training libraries, the primary difference is to change ``done`` to ``terminated``, indicating whether bootstrapping should or shouldn't happen.
|
||||
```
|
||||
|
||||
## TimeLimit Wrapper
|
||||
```{eval-rst}
|
||||
In v0.21, the :class:`TimeLimit` wrapper added an extra key in the ``info`` dictionary ``TimeLimit.truncated`` whenever the agent reached the time limit without reaching a terminal state.
|
||||
|
||||
In v0.26, this information is instead communicated through the `truncated` return value described in the previous section, which is `True` if the agent reaches the time limit, whether or not it reaches a terminal state. The old dictionary entry is equivalent to ``truncated and not terminated``
|
||||
```
|
||||
|
||||
## Environment Render
|
||||
|
||||
```{eval-rst}
|
||||
In v0.26, a new render API was introduced such that the render mode is fixed at initialisation as some environments don't allow on-the-fly render mode changes. Therefore, users should now specify the :attr:`render_mode` within ``gym.make`` as shown in the v0.26 example code above.
|
||||
|
||||
For a more complete explanation of the changes, please refer to this `summary <https://younis.dev/blog/render-api/>`_.
|
||||
```
|
||||
|
||||
## Removed code
|
||||
|
||||
```{eval-rst}
|
||||
.. py:currentmodule:: gymnasium.wrappers
|
||||
|
||||
* GoalEnv - This was removed, users needing it should reimplement the environment or use Gymnasium Robotics which contains an implementation of this environment.
|
||||
* ``from gym.envs.classic_control import rendering`` - This was removed in favour of users implementing their own rendering systems. Gymnasium environments are coded using pygame.
|
||||
* Robotics environments - The robotics environments have been moved to the `Gymnasium Robotics <https://robotics.farama.org/>`_ project.
|
||||
* Monitor wrapper - This wrapper was replaced with two separate wrapper, :class:`RecordVideo` and :class:`RecordEpisodeStatistics`
|
||||
|
||||
```
|
Reference in New Issue
Block a user