Updated Wrapper docs (#173)

This commit is contained in:
Markus Krimmel
2022-12-03 13:46:11 +01:00
committed by GitHub
parent 4b7f941db3
commit 851b2f4be6
8 changed files with 218 additions and 249 deletions

View File

@@ -12,6 +12,11 @@ wrappers/observation_wrappers
wrappers/reward_wrappers
```
```{eval-rst}
.. automodule:: gymnasium.wrappers
```
## gymnasium.Wrapper
```{eval-rst}
@@ -35,6 +40,13 @@ wrappers/reward_wrappers
.. autoproperty:: gymnasium.Wrapper.spec
.. autoproperty:: gymnasium.Wrapper.metadata
.. autoproperty:: gymnasium.Wrapper.np_random
.. attribute:: gymnasium.Wrapper.env
The environment (one level underneath) this wrapper.
This may itself be a wrapped environment.
To obtain the environment underneath all layers of wrappers, use :attr:`gymnasium.Wrapper.unwrapped`.
.. autoproperty:: gymnasium.Wrapper.unwrapped
```
@@ -124,43 +136,4 @@ wrapper in the page on the wrapper type
* - :class:`VectorListInfo`
- Misc Wrapper
- This wrapper will convert the info of a vectorized environment from the `dict` format to a `list` of dictionaries where the i-th dictionary contains info of the i-th environment.
```
## Implementing a custom wrapper
Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
reward based on data in `info` or change the rendering behavior).
Such wrappers can be implemented by inheriting from Misc Wrapper.
- You can set a new action or observation space by defining `self.action_space` or `self.observation_space` in `__init__`, respectively
- You can set new metadata and reward range by defining `self.metadata` and `self.reward_range` in `__init__`, respectively
- You can override `step`, `render`, `close` etc. If you do this, you can access the environment that was passed
to your wrapper (which *still* might be wrapped in some other wrapper) by accessing the attribute `self.env`.
Let's also take a look at an example for this case. Most MuJoCo environments return a reward that consists
of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that
penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during
initialization of the environment. However, *Reacher* does not allow you to do this! Nevertheless, all individual terms
of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms:
```python
import gymnasium as gym
class ReacherRewardWrapper(gym.Wrapper):
def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
super().__init__(env)
self.reward_dist_weight = reward_dist_weight
self.reward_ctrl_weight = reward_ctrl_weight
def step(self, action):
obs, _, terminated, truncated, info = self.env.step(action)
reward = (
self.reward_dist_weight * info["reward_dist"]
+ self.reward_ctrl_weight * info["reward_ctrl"]
)
return obs, reward, terminated, truncated, info
```
```{note}
It is *not* sufficient to use a `RewardWrapper` in this case!
```

View File

@@ -1,22 +1,16 @@
# Action Wrappers
## Action Wrapper
## Base Class
```{eval-rst}
.. autoclass:: gymnasium.ActionWrapper
.. autofunction:: gymnasium.ActionWrapper.action
.. automethod:: gymnasium.ActionWrapper.action
```
## Clip Action
## Available Action Wrappers
```{eval-rst}
.. autoclass:: gymnasium.wrappers.ClipAction
```
## Rescale Action
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RescaleAction
```

View File

@@ -1,68 +1,15 @@
# Misc Wrappers
## Atari Preprocessing
```{eval-rst}
.. autoclass:: gymnasium.wrappers.AtariPreprocessing
```
## Autoreset
```{eval-rst}
.. autoclass:: gymnasium.wrappers.AutoResetWrapper
```
## Compatibility
```{eval-rst}
.. autoclass:: gymnasium.wrappers.EnvCompatibility
.. autoclass:: gymnasium.wrappers.StepAPICompatibility
```
## Passive Environment Checker
```{eval-rst}
.. autoclass:: gymnasium.wrappers.PassiveEnvChecker
```
## Human Rendering
```{eval-rst}
.. autoclass:: gymnasium.wrappers.HumanRendering
```
## Order Enforcing
```{eval-rst}
.. autoclass:: gymnasium.wrappers.OrderEnforcing
```
## Record Episode Statistics
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RecordEpisodeStatistics
```
## Record Video
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RecordVideo
```
## Render Collection
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RenderCollection
```
## Time Limit
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TimeLimit
```
## Vector List Info
```{eval-rst}
.. autoclass:: gymnasium.wrappers.VectorListInfo
```

View File

@@ -1,62 +1,23 @@
# Observation Wrappers
## Observation Wrapper
## Base Class
```{eval-rst}
.. autoclass:: gymnasium.ObservationWrapper
.. autofunction:: gymnasium.ObservationWrapper.observation
.. automethod:: gymnasium.ObservationWrapper.observation
```
## Transform Observation
## Available Observation Wrappers
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TransformObservation
```
## Filter Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.FilterObservation
```
## Flatten Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.FlattenObservation
```
## Framestack Observations
```{eval-rst}
.. autoclass:: gymnasium.wrappers.FrameStack
```
## Gray Scale Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.GrayScaleObservation
```
## Normalize Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.NormalizeObservation
```
## Pixel Observation Wrapper
```{eval-rst}
.. autoclass:: gymnasium.wrappers.PixelObservationWrapper
```
## Resize Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.ResizeObservation
```
## Time Aware Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TimeAwareObservation
```

View File

@@ -1,22 +1,17 @@
# Reward Wrappers
## Reward Wrapper
## Base Class
```{eval-rst}
.. autoclass:: gymnasium.RewardWrapper
.. autofunction:: gymnasium.RewardWrapper.reward
.. automethod:: gymnasium.RewardWrapper.reward
```
## Transform Reward
## Available Reward Wrappers
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TransformReward
```
## Normalize Reward
```{eval-rst}
.. autoclass:: gymnasium.wrappers.NormalizeReward
```

View File

@@ -0,0 +1,137 @@
"""
Implementing Custom Wrappers
============================
In this tutorial we will describe how to implement your own custom wrappers.
Wrappers are a great way to add functionality to your environments in a modular way.
This will save you a lot of boilerplate code.
We will show how to create a wrapper by
- Inheriting from :class:`gymnasium.ObservationWrapper`
- Inheriting from :class:`gymnasium.ActionWrapper`
- Inheriting from :class:`gymnasium.RewardWrapper`
- Inheriting from :class:`gymnasium.Wrapper`
Before following this tutorial, make sure to check out the docs of the :mod:`gymnasium.wrappers` module.
"""
# %%
# Inheriting from :class:`gymnasium.ObservationWrapper`
# -----------------------------------------------------
# Observation wrappers are useful if you want to apply some function to the observations that are returned
# by an environment. If you implement an observation wrapper, you only need to define this transformation
# by implementing the :meth:`gymnasium.ObservationWrapper.observation` method. Moreover, you should remember to
# update the observation space, if the transformation changes the shape of observations (e.g. by transforming
# dictionaries into numpy arrays, as in the following example).
#
# Imagine you have a 2D navigation task where the environment returns dictionaries as observations with
# keys ``"agent_position"`` and ``"target_position"``. A common thing to do might be to throw away some degrees of
# freedom and only consider the position of the target relative to the agent, i.e.
# ``observation["target_position"] - observation["agent_position"]``. For this, you could implement an
# observation wrapper like this:
import numpy as np
from gym import ActionWrapper, ObservationWrapper, RewardWrapper, Wrapper
import gymnasium as gym
from gymnasium.spaces import Box, Discrete
class RelativePosition(ObservationWrapper):
def __init__(self, env):
super().__init__(env)
self.observation_space = Box(shape=(2,), low=-np.inf, high=np.inf)
def observation(self, obs):
return obs["target"] - obs["agent"]
# %%
# Inheriting from :class:`gymnasium.ActionWrapper`
# ------------------------------------------------
# Action wrappers can be used to apply a transformation to actions before applying them to the environment.
# If you implement an action wrapper, you need to define that transformation by implementing
# :meth:`gymnasium.ActionWrapper.action`. Moreover, you should specify the domain of that transformation
# by updating the action space of the wrapper.
#
# Lets say you have an environment with action space of type :class:`gymnasium.spaces.Box`, but you would only like
# to use a finite subset of actions. Then, you might want to implement the following wrapper:
class DiscreteActions(ActionWrapper):
def __init__(self, env, disc_to_cont):
super().__init__(env)
self.disc_to_cont = disc_to_cont
self.action_space = Discrete(len(disc_to_cont))
def action(self, act):
return self.disc_to_cont[act]
if __name__ == "__main__":
env = gym.make("LunarLanderContinuous-v2")
wrapped_env = DiscreteActions(
env, [np.array([1, 0]), np.array([-1, 0]), np.array([0, 1]), np.array([0, -1])]
)
print(wrapped_env.action_space) # Discrete(4)
# %%
# Inheriting from :class:`gymnasium.RewardWrapper`
# ------------------------------------------------
# Reward wrappers are used to transform the reward that is returned by an environment.
# As for the previous wrappers, you need to specify that transformation by implementing the
# :meth:`gymnasium.RewardWrapper.reward` method. Also, you might want to update the reward range of the wrapper.
#
# Let us look at an example: Sometimes (especially when we do not have control over the reward
# because it is intrinsic), we want to clip the reward to a range to gain some numerical stability.
# To do that, we could, for instance, implement the following wrapper:
from typing import SupportsFloat
class ClipReward(RewardWrapper):
def __init__(self, env, min_reward, max_reward):
super().__init__(env)
self.min_reward = min_reward
self.max_reward = max_reward
self.reward_range = (min_reward, max_reward)
def reward(self, r: SupportsFloat) -> SupportsFloat:
return np.clip(r, self.min_reward, self.max_reward)
# %%
# Inheriting from :class:`gymnasium.Wrapper`
# ------------------------------------------
# Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
# reward based on data in ``info`` or change the rendering behavior).
# Such wrappers can be implemented by inheriting from :class:`gymnasium.Wrapper`.
#
# - You can set a new action or observation space by defining ``self.action_space`` or ``self.observation_space`` in ``__init__``, respectively
# - You can set new metadata and reward range by defining ``self.metadata`` and ``self.reward_range`` in ``__init__``, respectively
# - You can override :meth:`gymnasium.Wrapper.step`, :meth:`gymnasium.Wrapper.render`, :meth:`gymnasium.Wrapper.close` etc.
# If you do this, you can access the environment that was passed
# to your wrapper (which *still* might be wrapped in some other wrapper) by accessing the attribute :attr:`env`.
#
# Let's also take a look at an example for this case. Most MuJoCo environments return a reward that consists
# of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that
# penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during
# initialization of the environment. However, *Reacher* does not allow you to do this! Nevertheless, all individual terms
# of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms:
class ReacherRewardWrapper(Wrapper):
def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
super().__init__(env)
self.reward_dist_weight = reward_dist_weight
self.reward_ctrl_weight = reward_ctrl_weight
def step(self, action):
obs, _, terminated, truncated, info = self.env.step(action)
reward = (
self.reward_dist_weight * info["reward_dist"]
+ self.reward_ctrl_weight * info["reward_ctrl"]
)
return obs, reward, terminated, truncated, info

View File

@@ -236,58 +236,16 @@ WrapperActType = TypeVar("WrapperActType")
class Wrapper(Env[WrapperObsType, WrapperActType]):
"""Wraps a :class:`gymnasium.Env` to allow a modular transformation of the :meth:`step` and :meth:`reset` methods.
This class is the base class of all wrappers to change the behavior of the underlying environment allowing
modification to the :attr:`action_space`, :attr:`observation_space`, :attr:`reward_range` and :attr:`metadata`
that doesn't change the underlying environment attributes.
This class is the base class of all wrappers to change the behavior of the underlying environment.
Wrappers that inherit from this class can modify the :attr:`action_space`, :attr:`observation_space`,
:attr:`reward_range` and :attr:`metadata` attributes, without changing the underlying environment's attributes.
Moreover, the behavior of the :meth:`step` and :meth:`reset` methods can be changed by these wrappers.
In addition, for several attributes (:attr:`spec`, :attr:`render_mode`, :attr:`np_random`) will point back to the
wrapper's environment.
Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly.
Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can
also be chained to combine their effects. Most environments that are generated via `gymnasium.make` will already be wrapped by default.
In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along
with (possibly optional) parameters to the wrapper's constructor.
>>> import gymnasium as gym
>>> from gymnasium.wrappers import RescaleAction
>>> base_env = gym.make("BipedalWalker-v3")
>>> base_env.action_space
Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
>>> wrapped_env = RescaleAction(base_env, min_action=0, max_action=1)
>>> wrapped_env.action_space
Box([0. 0. 0. 0.], [1. 1. 1. 1.], (4,), float32)
You can access the environment underneath the **first** wrapper by using the :attr:`env` attribute.
As the :class:`Wrapper` class inherits from :class:`Env` then :attr:`env` can be another wrapper.
>>> wrapped_env
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
>>> wrapped_env.env
<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>
If you want to get to the environment underneath **all** of the layers of wrappers, you can use the `.unwrapped` attribute.
If the environment is already a bare environment, the `.unwrapped` attribute will just return itself.
>>> wrapped_env
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
>>> wrapped_env.unwrapped
<gymnasium.envs.box2d.bipedal_walker.BipedalWalker object at 0x7f87d70712d0>
There are three common things you might want a wrapper to do:
- Transform actions before applying them to the base environment
- Transform observations that are returned by the base environment
- Transform rewards that are returned by the base environment
Such wrappers can be easily implemented by inheriting from `ActionWrapper`, `ObservationWrapper`, or `RewardWrapper` and implementing the
respective transformation. If you need a wrapper to do more complicated tasks, you can inherit from the `Wrapper` class directly.
The code that is presented in the following sections can also be found in
the [gym-examples](https://github.com/Farama-Foundation/gym-examples) repository
Some attributes (:attr:`spec`, :attr:`render_mode`, :attr:`np_random`) will point back to the wrapper's environment
(i.e. to the corresponding attributes of :attr:`env`).
Note:
Don't forget to call ``super().__init__(env)``
If you inherit from :class:`Wrapper`, don't forget to call ``super().__init__(env)``
"""
def __init__(self, env: Env[ObsType, ActType]):
@@ -425,7 +383,10 @@ class Wrapper(Env[WrapperObsType, WrapperActType]):
@property
def unwrapped(self) -> Env[ObsType, ActType]:
"""Returns the base environment of the wrapper."""
"""Returns the base environment of the wrapper.
This will be the bare :class:`gymnasium.Env` environment, underneath all layers of wrappers.
"""
return self.env.unwrapped
@@ -438,20 +399,6 @@ class ObservationWrapper(Wrapper[WrapperObsType, ActType]):
reflected by the :attr:`env` observation space. Otherwise, you need to specify the new observation space of the
wrapper by setting :attr:`self.observation_space` in the :meth:`__init__` method of your wrapper.
For example, you might have a 2D navigation task where the environment returns dictionaries as observations with
keys ``"agent_position"`` and ``"target_position"``. A common thing to do might be to throw away some degrees of
freedom and only consider the position of the target relative to the agent, i.e.
``observation["target_position"] - observation["agent_position"]``. For this, you could implement an
observation wrapper like this::
class RelativePosition(gym.ObservationWrapper):
def __init__(self, env):
super().__init__(env)
self.observation_space = Box(shape=(2,), low=-np.inf, high=np.inf)
def observation(self, obs):
return obs["target"] - obs["agent"]
Among others, Gymnasium provides the observation wrapper :class:`TimeAwareObservation`, which adds information about the
index of the timestep to the observation.
"""
@@ -494,20 +441,6 @@ class RewardWrapper(Wrapper[ObsType, ActType]):
:meth:`reward` to implement that transformation.
This transformation might change the :attr:`reward_range`; to specify the :attr:`reward_range` of your wrapper,
you can simply define :attr:`self.reward_range` in :meth:`__init__`.
Let us look at an example: Sometimes (especially when we do not have control over the reward
because it is intrinsic), we want to clip the reward to a range to gain some numerical stability.
To do that, we could, for instance, implement the following wrapper::
class ClipReward(gym.RewardWrapper):
def __init__(self, env, min_reward, max_reward):
super().__init__(env)
self.min_reward = min_reward
self.max_reward = max_reward
self.reward_range = (min_reward, max_reward)
def reward(self, r: SupportsFloat) -> SupportsFloat:
return np.clip(r, self.min_reward, self.max_reward)
"""
def __init__(self, env: Env[ObsType, ActType]):
@@ -543,24 +476,6 @@ class ActionWrapper(Wrapper[ObsType, WrapperActType]):
In that case, you need to specify the new action space of the wrapper by setting :attr:`self.action_space` in
the :meth:`__init__` method of your wrapper.
Lets say you have an environment with action space of type :class:`gymnasium.spaces.Box`, but you would only like
to use a finite subset of actions. Then, you might want to implement the following wrapper::
class DiscreteActions(gym.ActionWrapper):
def __init__(self, env, disc_to_cont):
super().__init__(env)
self.disc_to_cont = disc_to_cont
self.action_space = Discrete(len(disc_to_cont))
def action(self, act):
return self.disc_to_cont[act]
if __name__ == "__main__":
env = gym.make("LunarLanderContinuous-v2")
wrapped_env = DiscreteActions(env, [np.array([1,0]), np.array([-1,0]),
np.array([0,1]), np.array([0,-1])])
print(wrapped_env.action_space) #Discrete(4)
Among others, Gymnasium provides the action wrappers :class:`ClipAction` and :class:`RescaleAction` for clipping and rescaling actions.
"""

View File

@@ -1,4 +1,51 @@
"""Module of wrapper classes."""
"""Module of wrapper classes.
Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly.
Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can
also be chained to combine their effects.
Most environments that are generated via :meth:`gymnasium.make` will already be wrapped by default.
In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along
with (possibly optional) parameters to the wrapper's constructor.
>>> import gymnasium as gym
>>> from gymnasium.wrappers import RescaleAction
>>> base_env = gym.make("BipedalWalker-v3")
>>> base_env.action_space
Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
>>> wrapped_env = RescaleAction(base_env, min_action=0, max_action=1)
>>> wrapped_env.action_space
Box([0. 0. 0. 0.], [1. 1. 1. 1.], (4,), float32)
You can access the environment underneath the **first** wrapper by using the :attr:`gymnasium.Wrapper.env` attribute.
As the :class:`gymnasium.Wrapper` class inherits from :class:`gymnasium.Env` then :attr:`gymnasium.Wrapper.env` can be another wrapper.
>>> wrapped_env
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
>>> wrapped_env.env
<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>
If you want to get to the environment underneath **all** of the layers of wrappers, you can use the
:attr:`gymnasium.Wrapper.unwrapped` attribute.
If the environment is already a bare environment, the :attr:`gymnasium.Wrapper.unwrapped` attribute will just return itself.
>>> wrapped_env
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
>>> wrapped_env.unwrapped
<gymnasium.envs.box2d.bipedal_walker.BipedalWalker object at 0x7f87d70712d0>
There are three common things you might want a wrapper to do:
- Transform actions before applying them to the base environment
- Transform observations that are returned by the base environment
- Transform rewards that are returned by the base environment
Such wrappers can be easily implemented by inheriting from :class:`gymnasium.ActionWrapper`,
:class:`gymnasium.ObservationWrapper`, or :class:`gymnasium.RewardWrapper` and implementing the respective transformation.
If you need a wrapper to do more complicated tasks, you can inherit from the :class:`gymnasium.Wrapper` class directly.
If you'd like to implement your own custom wrapper, check out `the corresponding tutorial <../../tutorials/implementing_custom_wrappers>`_.
"""
from gymnasium.wrappers.atari_preprocessing import AtariPreprocessing
from gymnasium.wrappers.autoreset import AutoResetWrapper
from gymnasium.wrappers.clip_action import ClipAction