Updates the Env, Wrapper and Vector API documentation (#48)

This commit is contained in:
Mark Towers
2022-10-12 15:58:01 +01:00
committed by GitHub
parent 9070295cc9
commit 2a7ebc4271
122 changed files with 1412 additions and 1179 deletions

View File

@@ -1,85 +0,0 @@
# Core
## gymnasium.Env
```{eval-rst}
.. autofunction:: gymnasium.Env.step
```
```{eval-rst}
.. autofunction:: gymnasium.Env.reset
```
```{eval-rst}
.. autofunction:: gymnasium.Env.render
```
### Attributes
```{eval-rst}
.. autoattribute:: gymnasium.Env.action_space
This attribute gives the format of valid actions. It is of datatype `Space` provided by Gymnasium. For example, if the action space is of type `Discrete` and gives the value `Discrete(2)`, this means there are two valid discrete actions: 0 & 1.
.. code::
>>> env.action_space
Discrete(2)
>>> env.observation_space
Box(-3.4028234663852886e+38, 3.4028234663852886e+38, (4,), float32)
```
```{eval-rst}
.. autoattribute:: gymnasium.Env.observation_space
This attribute gives the format of valid observations. It is of datatype :class:`Space` provided by Gymnasium. For example, if the observation space is of type :class:`Box` and the shape of the object is ``(4,)``, this denotes a valid observation will be an array of 4 numbers. We can check the box bounds as well with attributes.
.. code::
>>> env.observation_space.high
array([4.8000002e+00, 3.4028235e+38, 4.1887903e-01, 3.4028235e+38], dtype=float32)
>>> env.observation_space.low
array([-4.8000002e+00, -3.4028235e+38, -4.1887903e-01, -3.4028235e+38], dtype=float32)
```
```{eval-rst}
.. autoattribute:: gymnasium.Env.reward_range
This attribute is a tuple corresponding to min and max possible rewards. Default range is set to ``(-inf,+inf)``. You can set it if you want a narrower range.
```
### Additional Methods
```{eval-rst}
.. autofunction:: gymnasium.Env.close
```
```{eval-rst}
.. autofunction:: gymnasium.Env.seed
```
## gymnasium.Wrapper
```{eval-rst}
.. autoclass:: gymnasium.Wrapper
```
## gymnasium.ObservationWrapper
```{eval-rst}
.. autoclass:: gymnasium.ObservationWrapper
```
## gymnasium.RewardWrapper
```{eval-rst}
.. autoclass:: gymnasium.RewardWrapper
```
## gymnasium.ActionWrapper
```{eval-rst}
.. autoclass:: gymnasium.ActionWrapper
```

79
docs/api/env.md Normal file
View File

@@ -0,0 +1,79 @@
---
title: Utils
---
# Env
## gymnasium.Env
```{eval-rst}
.. autoclass:: gymnasium.Env
```
### Methods
```{eval-rst}
.. autofunction:: gymnasium.Env.step
.. autofunction:: gymnasium.Env.reset
.. autofunction:: gymnasium.Env.render
```
### Attributes
```{eval-rst}
.. autoattribute:: gymnasium.Env.action_space
The Space object corresponding to valid actions, all valid actions should be contained with the space. For example, if the action space is of type `Discrete` and gives the value `Discrete(2)`, this means there are two valid discrete actions: 0 & 1.
.. code::
>>> env.action_space
Discrete(2)
>>> env.observation_space
Box(-3.4028234663852886e+38, 3.4028234663852886e+38, (4,), float32)
.. autoattribute:: gymnasium.Env.observation_space
The Space object corresponding to valid observations, all valid observations should be contained with the space. For example, if the observation space is of type :class:`Box` and the shape of the object is ``(4,)``, this denotes a valid observation will be an array of 4 numbers. We can check the box bounds as well with attributes.
.. code::
>>> env.observation_space.high
array([4.8000002e+00, 3.4028235e+38, 4.1887903e-01, 3.4028235e+38], dtype=float32)
>>> env.observation_space.low
array([-4.8000002e+00, -3.4028235e+38, -4.1887903e-01, -3.4028235e+38], dtype=float32)
.. autoattribute:: gymnasium.Env.metadata
The metadata of the environment containing rendering modes, rendering fps, etc
.. autoattribute:: gymnasium.Env.render_mode
The render mode of the environment determined at initialisation
.. autoattribute:: gymnasium.Env.reward_range
A tuple corresponding to the minimum and maximum possible rewards for an agent over an episode. The default reward range is set to :math:`(-\infty,+\infty)`.
.. autoattribute:: gymnasium.Env.spec
The ``EnvSpec`` of the environment normally set during :py:meth:`gymnasium.make`
```
### Additional Methods
```{eval-rst}
.. autofunction:: gymnasium.Env.close
.. autoproperty:: gymnasium.Env.unwrapped
.. autoproperty:: gymnasium.Env.np_random
```
### Implementing environments
```{eval-rst}
.. py:currentmodule:: gymnasium
When implementing an environment, the :meth:Env.reset and :meth:`Env.step` functions much be created describing the
dynamics of the environment.
For more information see the environment creation tutorial.
```

31
docs/api/registry.md Normal file
View File

@@ -0,0 +1,31 @@
---
title: Registry
---
# Registry
Gymnasium allows users to automatically load environments, pre-wrapped with several important wrappers.
Environments can also be created through python imports.
## Make
```{eval-rst}
.. autofunction:: gymnasium.make
```
## Register
```{eval-rst}
.. autofunction:: gymnasium.register
```
## All registered environments
To find all the registered Gymnasium environments, use the `gymnasium.envs.registry.keys()`.
This will not include environments registered only in OpenAI Gym however can be loaded by `gymnasium.make`.
## Spec
```{eval-rst}
.. autofunction:: gymnasium.spec
```

View File

@@ -1,3 +1,8 @@
---
title: Spaces
---
# Spaces
```{toctree}
@@ -5,37 +10,38 @@
spaces/fundamental
spaces/composite
spaces/utils
spaces/vector_utils
```
```{eval-rst}
.. autoclass:: gymnasium.spaces.Space
```
## General Functions
## Attributes
```{eval-rst}
.. autoproperty:: gymnasium.spaces.space.Space.shape
.. property:: Space.dtype
Return the data type of this space.
```
## Methods
Each space implements the following functions:
```{eval-rst}
.. autofunction:: gymnasium.spaces.Space.sample
.. autofunction:: gymnasium.spaces.Space.contains
.. autoproperty:: gymnasium.spaces.Space.shape
.. property:: gymnasium.spaces.Space.dtype
Return the data type of this space.
.. autofunction:: gymnasium.spaces.Space.seed
.. autofunction:: gymnasium.spaces.Space.to_jsonable
.. autofunction:: gymnasium.spaces.Space.from_jsonable
.. autofunction:: gymnasium.spaces.space.Space.sample
.. autofunction:: gymnasium.spaces.space.Space.contains
.. autofunction:: gymnasium.spaces.space.Space.seed
.. autofunction:: gymnasium.spaces.space.Space.to_jsonable
.. autofunction:: gymnasium.spaces.space.Space.from_jsonable
```
## Fundamental Spaces
Gymnasium has a number of fundamental spaces that are used as building boxes for more complex spaces.
```{eval-rst}
.. currentmodule:: gymnasium.spaces
@@ -48,6 +54,8 @@ Each space implements the following functions:
## Composite Spaces
Often environment spaces require joining fundamental spaces together for vectorised environments, separate agents or readability of the space.
```{eval-rst}
* :py:class:`Dict` - Supports a dictionary of keys and subspaces, used for a fixed number of unordered spaces
* :py:class:`Tuple` - Supports a tuple of subspaces, used for multiple for a fixed number of ordered spaces
@@ -57,9 +65,29 @@ Each space implements the following functions:
## Utils
Gymnasium contains a number of helpful utility functions for flattening and unflattening spaces.
This can be important for passing information to neural networks.
```{eval-rst}
* :py:class:`utils.flatdim` - The number of dimensions the flattened space will contain
* :py:class:`utils.flatten_space` - Flattens a space for which the `flattened` space instances will contain
* :py:class:`utils.flatten` - Flattens an instance of a space that is contained within the flattened version of the space
* :py:class:`utils.unflatten` - The reverse of the `flatten_space` function
```
## Vector Utils
When vectorizing environments, it is necessary to modify the observation and action spaces for new batched spaces sizes.
Therefore, Gymnasium provides a number of additional functions used when using a space with a Vector environment.
```{eval-rst}
.. currentmodule:: gymnasium
* :py:class:`vector.utils.batch_space`
* :py:class:`vector.utils.concatenate`
* :py:class:`vector.utils.iterate`
* :py:class:`vector.utils.create_empty_array`
* :py:class:`vector.utils.create_shared_memory`
* :py:class:`vector.utils.read_from_shared_memory`
* :py:class:`vector.utils.write_to_shared_memory`
```

View File

@@ -5,7 +5,8 @@
```{eval-rst}
.. autoclass:: gymnasium.spaces.Dict
.. automethod:: sample
.. automethod:: gymnasium.spaces.Dict.sample
.. automethod:: gymnasium.spaces.Dict.seed
```
## Tuple
@@ -13,7 +14,8 @@
```{eval-rst}
.. autoclass:: gymnasium.spaces.Tuple
.. automethod:: sample
.. automethod:: gymnasium.spaces.Tuple.sample
.. automethod:: gymnasium.spaces.Tuple.seed
```
## Sequence
@@ -21,7 +23,8 @@
```{eval-rst}
.. autoclass:: gymnasium.spaces.Sequence
.. automethod:: sample
.. automethod:: gymnasium.spaces.Sequence.sample
.. automethod:: gymnasium.spaces.Sequence.seed
```
## Graph
@@ -29,5 +32,6 @@
```{eval-rst}
.. autoclass:: gymnasium.spaces.Graph
.. automethod:: sample
.. automethod:: gymnasium.spaces.Graph.sample
.. automethod:: gymnasium.spaces.Graph.seed
```

View File

@@ -9,24 +9,25 @@ title: Fundamental Spaces
```{eval-rst}
.. autoclass:: gymnasium.spaces.Box
.. automethod:: is_bounded
.. automethod:: sample
.. automethod:: gymnasium.spaces.Box.sample
.. automethod:: gymnasium.spaces.Box.seed
.. automethod:: gymnasium.spaces.Box.is_bounded
```
## Discrete
```{eval-rst}
.. autoclass:: gymnasium.spaces.Discrete
.. automethod:: sample
.. automethod:: gymnasium.spaces.Discrete.sample
.. automethod:: gymnasium.spaces.Discrete.seed
```
## MultiBinary
```{eval-rst}
.. autoclass:: gymnasium.spaces.MultiBinary
.. automethod:: sample
.. automethod:: gymnasium.spaces.MultiBinary.sample
.. automethod:: gymnasium.spaces.MultiBinary.seed
```
## MultiDiscrete
@@ -34,7 +35,8 @@ title: Fundamental Spaces
```{eval-rst}
.. autoclass:: gymnasium.spaces.MultiDiscrete
.. automethod:: sample
.. automethod:: gymnasium.spaces.MultiDiscrete.sample
.. automethod:: gymnasium.spaces.MultiDiscrete.seed
```
## Text
@@ -42,5 +44,6 @@ title: Fundamental Spaces
```{eval-rst}
.. autoclass:: gymnasium.spaces.Text
.. automethod:: sample
.. automethod:: gymnasium.spaces.Text.sample
.. automethod:: gymnasium.spaces.Text.seed
```

View File

@@ -6,10 +6,7 @@ title: Utils
```{eval-rst}
.. autofunction:: gymnasium.spaces.utils.flatdim
.. autofunction:: gymnasium.spaces.utils.flatten_space
.. autofunction:: gymnasium.spaces.utils.flatten
.. autofunction:: gymnasium.spaces.utils.unflatten
```

View File

@@ -0,0 +1,20 @@
---
title: Vector Utils
---
# Spaces Vector Utils
```{eval-rst}
.. autofunction:: gymnasium.vector.utils.batch_space
.. autofunction:: gymnasium.vector.utils.concatenate
.. autofunction:: gymnasium.vector.utils.iterate
```
## Shared Memory Utils
```{eval-rst}
.. autofunction:: gymnasium.vector.utils.create_empty_array
.. autofunction:: gymnasium.vector.utils.create_shared_memory
.. autofunction:: gymnasium.vector.utils.read_from_shared_memory
.. autofunction:: gymnasium.vector.utils.write_to_shared_memory
```

View File

@@ -7,32 +7,29 @@ title: Utils
## Visualization
```{eval-rst}
.. autoclass:: gymnasium.utils.play.PlayableGame
.. automethod:: process_event
.. autofunction:: gymnasium.utils.play.play
.. autoclass:: gymnasium.utils.play.PlayPlot
.. automethod:: callback
.. autofunction:: gymnasium.utils.play.display_arr
.. autofunction:: gymnasium.utils.play.play
.. autoclass:: gymnasium.utils.play.PlayableGame
.. automethod:: process_event
```
## Save Rendering Videos
```{eval-rst}
.. autofunction:: gymnasium.utils.save_video.capped_cubic_video_schedule
.. autofunction:: gymnasium.utils.save_video.save_video
.. autofunction:: gymnasium.utils.save_video.capped_cubic_video_schedule
```
## Old to New Step API Compatibility
```{eval-rst}
.. autofunction:: gymnasium.utils.step_api_compatibility.step_api_compatibility
.. autofunction:: gymnasium.utils.step_api_compatibility.convert_to_terminated_truncated_step_api
.. autofunction:: gymnasium.utils.step_api_compatibility.convert_to_done_step_api
.. autofunction:: gymnasium.utils.step_api_compatibility.step_api_compatibility
```
## Seeding
@@ -43,16 +40,6 @@ title: Utils
## Environment Checking
### Invasive
```{eval-rst}
.. autofunction:: gymnasium.utils.env_checker.check_env
.. autofunction:: gymnasium.utils.env_checker.data_equivalence
.. autofunction:: gymnasium.utils.env_checker.check_reset_seed
.. autofunction:: gymnasium.utils.env_checker.check_reset_options
.. autofunction:: gymnasium.utils.env_checker.check_reset_return_info_deprecation
.. autofunction:: gymnasium.utils.env_checker.check_seed_deprecation
.. autofunction:: gymnasium.utils.env_checker.check_reset_return_type
.. autofunction:: gymnasium.utils.env_checker.check_space_limit
```

View File

@@ -4,15 +4,26 @@ title: Vector
# Vector
```{eval-rst}
.. autofunction:: gymnasium.vector.make
```
## VectorEnv
## Gymnasium.vector.VectorEnv
```{eval-rst}
.. attribute:: gymnasium.vector.VectorEnv.action_space
.. autoclass:: gymnasium.vector.VectorEnv
```
### Methods
```{eval-rst}
.. automethod:: gymnasium.vector.VectorEnv.reset
.. automethod:: gymnasium.vector.VectorEnv.step
.. automethod:: gymnasium.vector.VectorEnv.close
```
### Attributes
```{eval-rst}
.. attribute:: action_space
The (batched) action space. The input actions of `step` must be valid elements of `action_space`.::
@@ -20,7 +31,7 @@ title: Vector
>>> envs.action_space
MultiDiscrete([2 2 2])
.. attribute:: gymnasium.vector.VectorEnv.observation_space
.. attribute:: observation_space
The (batched) observation space. The observations returned by `reset` and `step` are valid elements of `observation_space`.::
@@ -28,7 +39,7 @@ title: Vector
>>> envs.observation_space
Box([[-4.8 ...]], [[4.8 ...]], (3, 4), float32)
.. attribute:: gymnasium.vector.VectorEnv.single_action_space
.. attribute:: single_action_space
The action space of an environment copy.::
@@ -36,55 +47,29 @@ title: Vector
>>> envs.single_action_space
Discrete(2)
.. attribute:: gymnasium.vector.VectorEnv.single_observation_space
.. attribute:: single_observation_space
The observation space of an environment copy.::
>>> envs = gymnasium.vector.make("CartPole-v1", num_envs=3)
>>> envs.single_action_space
Box([-4.8 ...], [4.8 ...], (4,), float32)
```
```
### Reset
## Making Vector Environments
```{eval-rst}
.. automethod:: gymnasium.vector.VectorEnv.reset
```
```python
>>> import gymnasium as gym
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.reset()
(array([[-0.02240574, -0.03439831, -0.03904812, 0.02810693],
[ 0.01586068, 0.01929009, 0.02394426, 0.04016077],
[-0.01314174, 0.03893502, -0.02400815, 0.0038326 ]],
dtype=float32), {})
.. autofunction:: gymnasium.vector.make
```
### Step
## Async Vector Env
```{eval-rst}
.. automethod:: gymnasium.vector.VectorEnv.step
```
```python
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.reset()
>>> actions = np.array([1, 0, 1])
>>> observations, rewards, termination, truncation, infos = envs.step(actions)
>>> observations
array([[ 0.00122802, 0.16228443, 0.02521779, -0.23700266],
[ 0.00788269, -0.17490888, 0.03393489, 0.31735462],
[ 0.04918966, 0.19421194, 0.02938497, -0.29495203]],
dtype=float32)
>>> rewards
array([1., 1., 1.])
>>> termination
array([False, False, False])
>>> termination
array([False, False, False])
>>> infos
{}
.. autoclass:: gymnasium.vector.AsyncVectorEnv
```
## Sync Vector Env
```{eval-rst}
.. autoclass:: gymnasium.vector.SyncVectorEnv
```

View File

@@ -1,196 +1,136 @@
---
title: Wrappers
lastpage:
title: Wrapper
---
# Wrappers
Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly.
Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can
also be chained to combine their effects. Most environments that are generated via `gymnasium.make` will already be wrapped by default.
In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along
with (possibly optional) parameters to the wrapper's constructor:
```python
>>> import gymnasium as gym
>>> from gymnasium.wrappers import RescaleAction
>>> base_env = gym.make("BipedalWalker-v3")
>>> base_env.action_space
Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
>>> wrapped_env = RescaleAction(base_env, min_action=0, max_action=1)
>>> wrapped_env.action_space
Box([0. 0. 0. 0.], [1. 1. 1. 1.], (4,), float32)
```
You can access the environment underneath the **first** wrapper by using
the `.env` attribute:
```python
>>> wrapped_env
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
>>> wrapped_env.env
<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>
```{toctree}
:hidden:
wrappers/misc_wrappers
wrappers/action_wrappers
wrappers/observation_wrappers
wrappers/reward_wrappers
```
If you want to get to the environment underneath **all** of the layers of wrappers,
you can use the `.unwrapped` attribute.
If the environment is already a bare environment, the `.unwrapped` attribute will just return itself.
## gymnasium.Wrapper
```python
>>> wrapped_env
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
>>> wrapped_env.unwrapped
<gymnasium.envs.box2d.bipedal_walker.BipedalWalker object at 0x7f87d70712d0>
```{eval-rst}
.. autoclass:: gymnasium.Wrapper
```
There are three common things you might want a wrapper to do:
### Methods
- Transform actions before applying them to the base environment
- Transform observations that are returned by the base environment
- Transform rewards that are returned by the base environment
```{eval-rst}
.. autofunction:: gymnasium.Wrapper.step
.. autofunction:: gymnasium.Wrapper.reset
.. autofunction:: gymnasium.Wrapper.close
```
Such wrappers can be easily implemented by inheriting from `ActionWrapper`, `ObservationWrapper`, or `RewardWrapper` and implementing the
respective transformation. If you need a wrapper to do more complicated tasks, you can inherit from the `Wrapper` class directly.
The code that is presented in the following sections can also be found in
the [gym-examples](https://github.com/Farama-Foundation/gym-examples) repository
### Attributes
## ActionWrapper
If you would like to apply a function to the action before passing it to the base environment,
you can simply inherit from `ActionWrapper` and overwrite the method `action` to implement that transformation.
The transformation defined in that method must take values in the base environment's action space.
However, its domain might differ from the original action space. In that case, you need to specify the new
action space of the wrapper by setting `self.action_space` in the `__init__` method of your wrapper.
```{eval-rst}
.. autoproperty:: gymnasium.Wrapper.action_space
.. autoproperty:: gymnasium.Wrapper.observation_space
.. autoproperty:: gymnasium.Wrapper.reward_range
.. autoproperty:: gymnasium.Wrapper.spec
.. autoproperty:: gymnasium.Wrapper.metadata
.. autoproperty:: gymnasium.Wrapper.np_random
.. autoproperty:: gymnasium.Wrapper.unwrapped
```
Let's say you have an environment with action space of type `Box`, but you would
only like to use a finite subset of actions. Then, you might want to implement the following wrapper
## Gymnasium Wrappers
```python
class DiscreteActions(gym.ActionWrapper):
def __init__(self, env, disc_to_cont):
super().__init__(env)
self.disc_to_cont = disc_to_cont
self.action_space = Discrete(len(disc_to_cont))
Gymnasium provides a number of commonly used wrappers listed below. More information can be found on the particular
wrapper in the page on the wrapper type
```{eval-rst}
.. py:currentmodule:: gymnasium.wrappers
.. list-table::
:header-rows: 1
def action(self, act):
return self.disc_to_cont[act]
if __name__ == "__main__":
env = gym.make("LunarLanderContinuous-v2")
wrapped_env = DiscreteActions(env, [np.array([1,0]), np.array([-1,0]),
np.array([0,1]), np.array([0,-1])])
print(wrapped_env.action_space) #Discrete(4)
* - Name
- Type
- Description
* - :class:`AtariPreprocessing`
- Misc Wrapper
- Implements the common preprocessing applied tp Atari environments
* - :class:`AutoResetWrapper`
- Misc Wrapper
- The wrapped environment will automatically reset when the terminated or truncated state is reached.
* - :class:`ClipAction`
- Action Wrapper
- Clip the continuous action to the valid bound specified by the environment's `action_space`
* - :class:`EnvCompatibility`
- Misc Wrapper
- Provides compatibility for environments written in the OpenAI Gym v0.21 API to look like Gymnasium environments
* - :class:`FilterObservation`
- Observation Wrapper
- Filters a dictionary observation spaces to only requested keys
* - :class:`FlattenObservation`
- Observation Wrapper
- An Observation wrapper that flattens the observation
* - :class:`FrameStack`
- Observation Wrapper
- AnObservation wrapper that stacks the observations in a rolling manner.
* - :class:`GrayScaleObservation`
- Observation Wrapper
- Convert the image observation from RGB to gray scale.
* - :class:`HumanRendering`
- Misc Wrapper
- Allows human like rendering for environments that support "rgb_array" rendering
* - :class:`NormalizeObservation`
- Observation Wrapper
- This wrapper will normalize observations s.t. each coordinate is centered with unit variance.
* - :class:`NormalizeReward`
- Reward Wrapper
- This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
* - :class:`OrderEnforcing`
- Misc Wrapper
- This will produce an error if `step` or `render` is called before `reset`
* - :class:`PixelObservationWrapper`
- Observation Wrapper
- Augment observations by pixel values obtained via `render` that can be added to or replaces the environments observation.
* - :class:`RecordEpisodeStatistics`
- Misc Wrapper
- This will keep track of cumulative rewards and episode lengths returning them at the end.
* - :class:`RecordVideo`
- Misc Wrapper
- This wrapper will record videos of rollouts.
* - :class:`RenderCollection`
- Misc Wrapper
- Enable list versions of render modes, i.e. "rgb_array_list" for "rgb_array" such that the rendering for each step are saved in a list until `render` is called.
* - :class:`RescaleAction`
- Action Wrapper
- Rescales the continuous action space of the environment to a range \[`min_action`, `max_action`], where `min_action` and `max_action` are numpy arrays or floats.
* - :class:`ResizeObservation`
- Observation Wrapper
- This wrapper works on environments with image observations (or more generally observations of shape AxBxC) and resizes the observation to the shape given by the tuple `shape`.
* - :class:`StepAPICompatibility`
- Misc Wrapper
- Modifies an environment step function from (old) done to the (new) termination / truncation API.
* - :class:`TimeAwareObservation`
- Observation Wrapper
- Augment the observation with current time step in the trajectory (by appending it to the observation).
* - :class:`TimeLimit`
- Misc Wrapper
- This wrapper will emit a truncated signal if the specified number of steps is exceeded in an episode.
* - :class:`TransformObservation`
- Observation Wrapper
- This wrapper will apply function to observations
* - :class:`TransformReward`
- Reward Wrapper
- This wrapper will apply function to rewards
* - :class:`VectorListInfo`
- Misc Wrapper
- This wrapper will convert the info of a vectorized environment from the `dict` format to a `list` of dictionaries where the i-th dictionary contains info of the i-th environment.
```
Among others, Gymnasium provides the action wrappers `ClipAction` and `RescaleAction`.
## ObservationWrapper
If you would like to apply a function to the observation that is returned by the base environment before passing
it to learning code, you can simply inherit from `ObservationWrapper` and overwrite the method `observation` to
implement that transformation. The transformation defined in that method must be defined on the base environment's
observation space. However, it may take values in a different space. In that case, you need to specify the new
observation space of the wrapper by setting `self.observation_space` in the `__init__` method of your wrapper.
For example, you might have a 2D navigation task where the environment returns dictionaries as observations with keys `"agent_position"`
and `"target_position"`. A common thing to do might be to throw away some degrees of freedom and only consider
the position of the target relative to the agent, i.e. `observation["target_position"] - observation["agent_position"]`.
For this, you could implement an observation wrapper like this:
```python
class RelativePosition(gym.ObservationWrapper):
def __init__(self, env):
super().__init__(env)
self.observation_space = Box(shape=(2,), low=-np.inf, high=np.inf)
def observation(self, obs):
return obs["target"] - obs["agent"]
```
Among others, Gymnasium provides the observation wrapper `TimeAwareObservation`, which adds information about the index of the timestep
to the observation.
## RewardWrapper
If you would like to apply a function to the reward that is returned by the base environment before passing
it to learning code, you can simply inherit from `RewardWrapper` and overwrite the method `reward` to
implement that transformation. This transformation might change the reward range; to specify the reward range of
your wrapper, you can simply define `self.reward_range` in `__init__`.
Let us look at an example: Sometimes (especially when we do not have control over the reward because it is intrinsic), we want to clip the reward
to a range to gain some numerical stability. To do that, we could, for instance, implement the following wrapper:
```python
class ClipReward(gym.RewardWrapper):
def __init__(self, env, min_reward, max_reward):
super().__init__(env)
self.min_reward = min_reward
self.max_reward = max_reward
self.reward_range = (min_reward, max_reward)
def reward(self, reward):
return np.clip(reward, self.min_reward, self.max_reward)
```
## AutoResetWrapper
Some users may want a wrapper which will automatically reset its wrapped environment when its wrapped environment reaches the done state. An advantage of this environment is that it will never produce undefined behavior as standard gymnasium environments do when stepping beyond the done state.
When calling step causes `self.env.step()` to return `(terminated or truncated)=True`,
`self.env.reset()` is called,
and the return format of `self.step()` is as follows:
```python
new_obs, final_reward, final_terminated, final_truncated, info
```
`new_obs` is the first observation after calling `self.env.reset()`,
`final_reward` is the reward after calling `self.env.step()`,
prior to calling `self.env.reset()`
The expression `(final_terminated or final_truncated)` is always `True`
`info` is a dict containing all the keys from the info dict returned by
the call to `self.env.reset()`, with additional keys `final_observation`
containing the observation returned by the last call to `self.env.step()`
and `final_info` containing the info dict returned by the last call
to `self.env.step()`.
If `(terminated or truncated)` is not true when `self.env.step()` is called, `self.step()` returns
```python
obs, reward, terminated, truncated, info
```
as normal.
The AutoResetWrapper is not applied by default when calling `gymnasium.make()`, but can be applied by setting the optional `autoreset` argument to `True`:
```python
env = gym.make("CartPole-v1", autoreset=True)
```
The AutoResetWrapper can also be applied using its constructor:
```python
env = gym.make("CartPole-v1")
env = AutoResetWrapper(env)
```
```{note}
When using the AutoResetWrapper to collect rollouts, note
that the when `self.env.step()` returns `done`, a
new observation from after calling `self.env.reset()` is returned
by `self.step()` alongside the terminal reward and done state from the
previous episode . If you need the terminal state from the previous
episode, you need to retrieve it via the the `final_observation` key
in the info dict. Make sure you know what you're doing if you
use this wrapper!
```
## General Wrappers
## Implementing a custom wrapper
Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
reward based on data in `info` or change the rendering behavior).
Such wrappers can be implemented by inheriting from `Wrapper`.
Such wrappers can be implemented by inheriting from Misc Wrapper.
- You can set a new action or observation space by defining `self.action_space` or `self.observation_space` in `__init__`, respectively
- You can set new metadata and reward range by defining `self.metadata` and `self.reward_range` in `__init__`, respectively
@@ -204,6 +144,8 @@ initialization of the environment. However, *Reacher* does not allow you to do t
of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms:
```python
import gymnasium as gym
class ReacherRewardWrapper(gym.Wrapper):
def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
super().__init__(env)
@@ -221,29 +163,4 @@ class ReacherRewardWrapper(gym.Wrapper):
```{note}
It is *not* sufficient to use a `RewardWrapper` in this case!
```
## Available Wrappers
| Name | Type | Arguments | Description |
|---------------------------|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `AtariPreprocessing` | `gymnasium.Wrapper` | `env: gymnasium.Env`, `noop_max: int = 30`, `frame_skip: int = 4`, `screen_size: int = 84`, `terminal_on_life_loss: bool = False`, `grayscale_obs: bool = True`, `grayscale_newaxis: bool = False`, `scale_obs: bool = False` | Implements the best practices from Machado et al. (2018), "Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents" but will be deprecated soon. |
| `AutoResetWrapper` | `gymnasium.Wrapper` | `env` | The wrapped environment will automatically reset when the done state is reached. Make sure you read the documentation before using this wrapper! |
| `ClipAction` | `gymnasium.ActionWrapper` | `env` | Clip the continuous action to the valid bound specified by the environment's `action_space` |
| `FilterObservation` | `gymnasium.ObservationWrapper` | `env`, `filter_keys=None` | If you have an environment that returns dictionaries as observations, but you would like to only keep a subset of the entries, you can use this wrapper. `filter_keys` should be an iterable that contains the keys that are kept in the new observation. If it is `None`, all keys will be kept and the wrapper has no effect. |
| `FlattenObservation` | `gymnasium.ObservationWrapper` | `env` | Observation wrapper that flattens the observation |
| `FrameStack` | `gymnasium.ObservationWrapper` | `env`, `num_stack`, `lz4_compress=False` | Observation wrapper that stacks the observations in a rolling manner. For example, if the number of stacks is 4, then the returned observation contains the most recent 4 observations. Observations will be objects of type `LazyFrames`. This object can be cast to a numpy array via `np.asarray(obs)`. You can also access single frames or slices via the usual `__getitem__` syntax. If `lz4_compress` is set to true, the `LazyFrames` object will compress the frames internally (losslessly). The first observation (i.e. the one returned by `reset`) will consist of `num_stack` repitions of the first frame. |
| `GrayScaleObservation` | `gymnasium.ObservationWrapper` | `env`, `keep_dim=False` | Convert the image observation from RGB to gray scale. By default, the resulting observation will be 2-dimensional. If `keep_dim` is set to true, a singleton dimension will be added (i.e. the observations are of shape AxBx1). |
| `NormalizeReward` | `gymnasium.Wrapper` | `env`, `gamma=0.99`, `epsilon=1e-8` | This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance. `epsilon` is a stability parameter and `gamma` is the discount factor that is used in the exponential moving average. The exponential moving average will have variance `(1 - gamma)**2`. The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently. |
| `NormalizeObservation` | `gymnasium.Wrapper` | `env`, `epsilon=1e-8` | This wrapper will normalize observations s.t. each coordinate is centered with unit variance. The normalization depends on past trajectories and observations will not be normalized correctly if the wrapper was newly instantiated or the policy was changed recently. `epsilon` is a stability parameter that is used when scaling the observations. |
| `OrderEnforcing` | `gymnasium.Wrapper` | `env` | This will produce an error if `step` is called before an initial `reset` |
| `PixelObservationWrapper` | `gymnasium.ObservationWrapper` | `env`, `pixels_only=True`, `render_kwargs=None`, `pixel_keys=("pixels",)` | Augment observations by pixel values obtained via `render`. You can specify whether the original observations should be discarded entirely or be augmented by setting `pixels_only`. Also, you can provide keyword arguments for `render`. |
| `RecordEpisodeStatistics` | `gymnasium.Wrapper` | `env`, `deque_size=100` | This will keep track of cumulative rewards and episode lengths. At the end of an episode, the statistics of the episode will be added to `info`. Moreover, the rewards and episode lengths are stored in buffers that can be accessed via `wrapped_env.return_queue` and `wrapped_env.length_queue` respectively. The size of these buffers can be set via `deque_size`. |
| `RecordVideo` | `gymnasium.Wrapper` | `env`, `video_folder: str`, `episode_trigger: Callable[[int], bool] = None`, `step_trigger: Callable[[int], bool] = None`, `video_length: int = 0`, `name_prefix: str = "rl-video"` | This wrapper will record videos of rollouts. The results will be saved in the folder specified via `video_folder`. You can specify a prefix for the filenames via `name_prefix`. Usually, you only want to record the environment intermittently, say every hundreth episode. To allow this, you can pass `episode_trigger` or `step_trigger`. At most one of these should be passed. These functions will accept an episode index or step index, respectively. They should return a boolean that indicates whether a recording should be started at this point. If neither `episode_trigger`, nor `step_trigger` is passed, a default `episode_trigger` will be used. By default, the recording will be stopped once a done signal has been emitted by the environment. However, you can also create recordings of fixed length (possibly spanning several episodes) by passing a strictly positive value for `video_length`. |
| `RescaleAction` | `gymnasium.ActionWrapper` | `env`, `min_action`, `max_action` | Rescales the continuous action space of the environment to a range \[`min_action`, `max_action`], where `min_action` and `max_action` are numpy arrays or floats. |
| `ResizeObservation` | `gymnasium.ObservationWrapper` | `env`, `shape` | This wrapper works on environments with image observations (or more generally observations of shape AxBxC) and resizes the observation to the shape given by the tuple `shape`. The argument `shape` may also be an integer. In that case, the observation is scaled to a square of sidelength `shape` |
| `TimeAwareObservation` | `gymnasium.ObservationWrapper` | `env` | Augment the observation with current time step in the trajectory (by appending it to the observation). This can be useful to ensure that things stay Markov. Currently it only works with one-dimensional observation spaces. |
| `TimeLimit` | `gymnasium.Wrapper` | `env`, `max_episode_steps=None` | Probably the most useful wrapper in Gymnasium. This wrapper will emit a done signal if the specified number of steps is exceeded in an episode. In order to be able to distinguish termination and truncation, you need to check `info`. If it does not contain the key `"TimeLimit.truncated"`, the environment did not reach the timelimit. Otherwise, `info["TimeLimit.truncated"]` will be true if the episode was terminated because of the time limit. |
| `TransformObservation` | `gymnasium.ObservationWrapper` | `env`, `f` | This wrapper will apply `f` to observations |
| `TransformReward` | `gymnasium.RewardWrapper` | `env`, `f` | This wrapper will apply `f` to rewards |
| `VectorListInfo` | `gymnasium.Wrapper` | `env` | This wrapper will convert the info of a vectorized environment from the `dict` format to a `list` of dictionaries where the _i-th_ dictionary contains info of the _i-th_ environment. If using other wrappers that perform operation on info like `RecordEpisodeStatistics`, this need to be the outermost wrapper. |
```

View File

@@ -0,0 +1,22 @@
# Action Wrappers
## Action Wrapper
```{eval-rst}
.. autoclass:: gymnasium.ActionWrapper
.. autofunction:: gymnasium.ActionWrapper.action
```
## Clip Action
```{eval-rst}
.. autoclass:: gymnasium.wrappers.ClipAction
```
## Rescale Action
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RescaleAction
```

View File

@@ -0,0 +1,68 @@
# Misc Wrappers
## Atari Preprocessing
```{eval-rst}
.. autoclass:: gymnasium.wrappers.AtariPreprocessing
```
## Autoreset
```{eval-rst}
.. autoclass:: gymnasium.wrappers.AutoResetWrapper
```
## Compatibility
```{eval-rst}
.. autoclass:: gymnasium.wrappers.EnvCompatibility
.. autoclass:: gymnasium.wrappers.StepAPICompatibility
```
## Passive Environment Checker
```{eval-rst}
.. autoclass:: gymnasium.wrappers.PassiveEnvChecker
```
## Human Rendering
```{eval-rst}
.. autoclass:: gymnasium.wrappers.HumanRendering
```
## Order Enforcing
```{eval-rst}
.. autoclass:: gymnasium.wrappers.OrderEnforcing
```
## Record Episode Statistics
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RecordEpisodeStatistics
```
## Record Video
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RecordVideo
```
## Render Collection
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RenderCollection
```
## Time Limit
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TimeLimit
```
## Vector List Info
```{eval-rst}
.. autoclass:: gymnasium.wrappers.VectorListInfo
```

View File

@@ -0,0 +1,62 @@
# Observation Wrappers
## Observation Wrapper
```{eval-rst}
.. autoclass:: gymnasium.ObservationWrapper
.. autofunction:: gymnasium.ObservationWrapper.observation
```
## Transform Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TransformObservation
```
## Filter Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.FilterObservation
```
## Flatten Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.FlattenObservation
```
## Framestack Observations
```{eval-rst}
.. autoclass:: gymnasium.wrappers.FrameStack
```
## Gray Scale Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.GrayScaleObservation
```
## Normalize Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.NormalizeObservation
```
## Pixel Observation Wrapper
```{eval-rst}
.. autoclass:: gymnasium.wrappers.PixelObservationWrapper
```
## Resize Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.ResizeObservation
```
## Time Aware Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TimeAwareObservation
```

View File

@@ -0,0 +1,22 @@
# Reward Wrappers
## Reward Wrapper
```{eval-rst}
.. autoclass:: gymnasium.RewardWrapper
.. autofunction:: gymnasium.RewardWrapper.reward
```
## Transform Reward
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TransformReward
```
## Normalize Reward
```{eval-rst}
.. autoclass:: gymnasium.wrappers.NormalizeReward
```