diff --git a/docs/api/wrappers.md b/docs/api/wrappers.md index a1ef35a16..4ea61e22b 100644 --- a/docs/api/wrappers.md +++ b/docs/api/wrappers.md @@ -12,6 +12,11 @@ wrappers/observation_wrappers wrappers/reward_wrappers ``` +```{eval-rst} +.. automodule:: gymnasium.wrappers + +``` + ## gymnasium.Wrapper ```{eval-rst} @@ -35,6 +40,13 @@ wrappers/reward_wrappers .. autoproperty:: gymnasium.Wrapper.spec .. autoproperty:: gymnasium.Wrapper.metadata .. autoproperty:: gymnasium.Wrapper.np_random +.. attribute:: gymnasium.Wrapper.env + + The environment (one level underneath) this wrapper. + + This may itself be a wrapped environment. + To obtain the environment underneath all layers of wrappers, use :attr:`gymnasium.Wrapper.unwrapped`. + .. autoproperty:: gymnasium.Wrapper.unwrapped ``` @@ -124,43 +136,4 @@ wrapper in the page on the wrapper type * - :class:`VectorListInfo` - Misc Wrapper - This wrapper will convert the info of a vectorized environment from the `dict` format to a `list` of dictionaries where the i-th dictionary contains info of the i-th environment. -``` - -## Implementing a custom wrapper - -Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the -reward based on data in `info` or change the rendering behavior). -Such wrappers can be implemented by inheriting from Misc Wrapper. - -- You can set a new action or observation space by defining `self.action_space` or `self.observation_space` in `__init__`, respectively -- You can set new metadata and reward range by defining `self.metadata` and `self.reward_range` in `__init__`, respectively -- You can override `step`, `render`, `close` etc. If you do this, you can access the environment that was passed -to your wrapper (which *still* might be wrapped in some other wrapper) by accessing the attribute `self.env`. - -Let's also take a look at an example for this case. Most MuJoCo environments return a reward that consists -of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that -penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during -initialization of the environment. However, *Reacher* does not allow you to do this! Nevertheless, all individual terms -of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms: - -```python -import gymnasium as gym - -class ReacherRewardWrapper(gym.Wrapper): - def __init__(self, env, reward_dist_weight, reward_ctrl_weight): - super().__init__(env) - self.reward_dist_weight = reward_dist_weight - self.reward_ctrl_weight = reward_ctrl_weight - - def step(self, action): - obs, _, terminated, truncated, info = self.env.step(action) - reward = ( - self.reward_dist_weight * info["reward_dist"] - + self.reward_ctrl_weight * info["reward_ctrl"] - ) - return obs, reward, terminated, truncated, info -``` - -```{note} -It is *not* sufficient to use a `RewardWrapper` in this case! ``` \ No newline at end of file diff --git a/docs/api/wrappers/action_wrappers.md b/docs/api/wrappers/action_wrappers.md index c8fa8d8ff..00b81a4ce 100644 --- a/docs/api/wrappers/action_wrappers.md +++ b/docs/api/wrappers/action_wrappers.md @@ -1,22 +1,16 @@ # Action Wrappers -## Action Wrapper +## Base Class ```{eval-rst} .. autoclass:: gymnasium.ActionWrapper - .. autofunction:: gymnasium.ActionWrapper.action + .. automethod:: gymnasium.ActionWrapper.action ``` -## Clip Action - +## Available Action Wrappers ```{eval-rst} .. autoclass:: gymnasium.wrappers.ClipAction -``` - -## Rescale Action - -```{eval-rst} .. autoclass:: gymnasium.wrappers.RescaleAction ``` diff --git a/docs/api/wrappers/misc_wrappers.md b/docs/api/wrappers/misc_wrappers.md index 065db00f9..011392289 100644 --- a/docs/api/wrappers/misc_wrappers.md +++ b/docs/api/wrappers/misc_wrappers.md @@ -1,68 +1,15 @@ # Misc Wrappers - -## Atari Preprocessing - ```{eval-rst} .. autoclass:: gymnasium.wrappers.AtariPreprocessing -``` - -## Autoreset - -```{eval-rst} .. autoclass:: gymnasium.wrappers.AutoResetWrapper -``` - -## Compatibility - -```{eval-rst} .. autoclass:: gymnasium.wrappers.EnvCompatibility .. autoclass:: gymnasium.wrappers.StepAPICompatibility -``` - -## Passive Environment Checker - -```{eval-rst} .. autoclass:: gymnasium.wrappers.PassiveEnvChecker -``` - -## Human Rendering - -```{eval-rst} .. autoclass:: gymnasium.wrappers.HumanRendering -``` - -## Order Enforcing - -```{eval-rst} .. autoclass:: gymnasium.wrappers.OrderEnforcing -``` - -## Record Episode Statistics - -```{eval-rst} .. autoclass:: gymnasium.wrappers.RecordEpisodeStatistics -``` - -## Record Video - -```{eval-rst} .. autoclass:: gymnasium.wrappers.RecordVideo -``` - -## Render Collection - -```{eval-rst} .. autoclass:: gymnasium.wrappers.RenderCollection -``` - -## Time Limit - -```{eval-rst} .. autoclass:: gymnasium.wrappers.TimeLimit -``` - -## Vector List Info - -```{eval-rst} .. autoclass:: gymnasium.wrappers.VectorListInfo ``` diff --git a/docs/api/wrappers/observation_wrappers.md b/docs/api/wrappers/observation_wrappers.md index 33bf97b2d..14238fc68 100644 --- a/docs/api/wrappers/observation_wrappers.md +++ b/docs/api/wrappers/observation_wrappers.md @@ -1,62 +1,23 @@ # Observation Wrappers -## Observation Wrapper +## Base Class ```{eval-rst} .. autoclass:: gymnasium.ObservationWrapper -.. autofunction:: gymnasium.ObservationWrapper.observation + + .. automethod:: gymnasium.ObservationWrapper.observation ``` -## Transform Observation +## Available Observation Wrappers ```{eval-rst} .. autoclass:: gymnasium.wrappers.TransformObservation -``` - -## Filter Observation - -```{eval-rst} .. autoclass:: gymnasium.wrappers.FilterObservation -``` - -## Flatten Observation - -```{eval-rst} .. autoclass:: gymnasium.wrappers.FlattenObservation -``` - -## Framestack Observations - -```{eval-rst} .. autoclass:: gymnasium.wrappers.FrameStack -``` - -## Gray Scale Observation - -```{eval-rst} .. autoclass:: gymnasium.wrappers.GrayScaleObservation -``` - -## Normalize Observation - -```{eval-rst} .. autoclass:: gymnasium.wrappers.NormalizeObservation -``` - -## Pixel Observation Wrapper - -```{eval-rst} .. autoclass:: gymnasium.wrappers.PixelObservationWrapper -``` - -## Resize Observation - -```{eval-rst} .. autoclass:: gymnasium.wrappers.ResizeObservation -``` - -## Time Aware Observation - -```{eval-rst} .. autoclass:: gymnasium.wrappers.TimeAwareObservation ``` diff --git a/docs/api/wrappers/reward_wrappers.md b/docs/api/wrappers/reward_wrappers.md index d590f892c..45d0476dd 100644 --- a/docs/api/wrappers/reward_wrappers.md +++ b/docs/api/wrappers/reward_wrappers.md @@ -1,22 +1,17 @@ # Reward Wrappers -## Reward Wrapper +## Base Class ```{eval-rst} .. autoclass:: gymnasium.RewardWrapper - .. autofunction:: gymnasium.RewardWrapper.reward + .. automethod:: gymnasium.RewardWrapper.reward ``` -## Transform Reward +## Available Reward Wrappers ```{eval-rst} .. autoclass:: gymnasium.wrappers.TransformReward -``` - -## Normalize Reward - -```{eval-rst} .. autoclass:: gymnasium.wrappers.NormalizeReward ``` diff --git a/docs/tutorials/implementing_custom_wrappers.py b/docs/tutorials/implementing_custom_wrappers.py new file mode 100644 index 000000000..f0eb9a344 --- /dev/null +++ b/docs/tutorials/implementing_custom_wrappers.py @@ -0,0 +1,137 @@ +""" +Implementing Custom Wrappers +============================ + +In this tutorial we will describe how to implement your own custom wrappers. +Wrappers are a great way to add functionality to your environments in a modular way. +This will save you a lot of boilerplate code. + +We will show how to create a wrapper by + +- Inheriting from :class:`gymnasium.ObservationWrapper` +- Inheriting from :class:`gymnasium.ActionWrapper` +- Inheriting from :class:`gymnasium.RewardWrapper` +- Inheriting from :class:`gymnasium.Wrapper` + +Before following this tutorial, make sure to check out the docs of the :mod:`gymnasium.wrappers` module. +""" + +# %% +# Inheriting from :class:`gymnasium.ObservationWrapper` +# ----------------------------------------------------- +# Observation wrappers are useful if you want to apply some function to the observations that are returned +# by an environment. If you implement an observation wrapper, you only need to define this transformation +# by implementing the :meth:`gymnasium.ObservationWrapper.observation` method. Moreover, you should remember to +# update the observation space, if the transformation changes the shape of observations (e.g. by transforming +# dictionaries into numpy arrays, as in the following example). +# +# Imagine you have a 2D navigation task where the environment returns dictionaries as observations with +# keys ``"agent_position"`` and ``"target_position"``. A common thing to do might be to throw away some degrees of +# freedom and only consider the position of the target relative to the agent, i.e. +# ``observation["target_position"] - observation["agent_position"]``. For this, you could implement an +# observation wrapper like this: + +import numpy as np +from gym import ActionWrapper, ObservationWrapper, RewardWrapper, Wrapper + +import gymnasium as gym +from gymnasium.spaces import Box, Discrete + + +class RelativePosition(ObservationWrapper): + def __init__(self, env): + super().__init__(env) + self.observation_space = Box(shape=(2,), low=-np.inf, high=np.inf) + + def observation(self, obs): + return obs["target"] - obs["agent"] + + +# %% +# Inheriting from :class:`gymnasium.ActionWrapper` +# ------------------------------------------------ +# Action wrappers can be used to apply a transformation to actions before applying them to the environment. +# If you implement an action wrapper, you need to define that transformation by implementing +# :meth:`gymnasium.ActionWrapper.action`. Moreover, you should specify the domain of that transformation +# by updating the action space of the wrapper. +# +# Let’s say you have an environment with action space of type :class:`gymnasium.spaces.Box`, but you would only like +# to use a finite subset of actions. Then, you might want to implement the following wrapper: + + +class DiscreteActions(ActionWrapper): + def __init__(self, env, disc_to_cont): + super().__init__(env) + self.disc_to_cont = disc_to_cont + self.action_space = Discrete(len(disc_to_cont)) + + def action(self, act): + return self.disc_to_cont[act] + + +if __name__ == "__main__": + env = gym.make("LunarLanderContinuous-v2") + wrapped_env = DiscreteActions( + env, [np.array([1, 0]), np.array([-1, 0]), np.array([0, 1]), np.array([0, -1])] + ) + print(wrapped_env.action_space) # Discrete(4) + + +# %% +# Inheriting from :class:`gymnasium.RewardWrapper` +# ------------------------------------------------ +# Reward wrappers are used to transform the reward that is returned by an environment. +# As for the previous wrappers, you need to specify that transformation by implementing the +# :meth:`gymnasium.RewardWrapper.reward` method. Also, you might want to update the reward range of the wrapper. +# +# Let us look at an example: Sometimes (especially when we do not have control over the reward +# because it is intrinsic), we want to clip the reward to a range to gain some numerical stability. +# To do that, we could, for instance, implement the following wrapper: + +from typing import SupportsFloat + + +class ClipReward(RewardWrapper): + def __init__(self, env, min_reward, max_reward): + super().__init__(env) + self.min_reward = min_reward + self.max_reward = max_reward + self.reward_range = (min_reward, max_reward) + + def reward(self, r: SupportsFloat) -> SupportsFloat: + return np.clip(r, self.min_reward, self.max_reward) + + +# %% +# Inheriting from :class:`gymnasium.Wrapper` +# ------------------------------------------ +# Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the +# reward based on data in ``info`` or change the rendering behavior). +# Such wrappers can be implemented by inheriting from :class:`gymnasium.Wrapper`. +# +# - You can set a new action or observation space by defining ``self.action_space`` or ``self.observation_space`` in ``__init__``, respectively +# - You can set new metadata and reward range by defining ``self.metadata`` and ``self.reward_range`` in ``__init__``, respectively +# - You can override :meth:`gymnasium.Wrapper.step`, :meth:`gymnasium.Wrapper.render`, :meth:`gymnasium.Wrapper.close` etc. +# If you do this, you can access the environment that was passed +# to your wrapper (which *still* might be wrapped in some other wrapper) by accessing the attribute :attr:`env`. +# +# Let's also take a look at an example for this case. Most MuJoCo environments return a reward that consists +# of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that +# penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during +# initialization of the environment. However, *Reacher* does not allow you to do this! Nevertheless, all individual terms +# of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms: + + +class ReacherRewardWrapper(Wrapper): + def __init__(self, env, reward_dist_weight, reward_ctrl_weight): + super().__init__(env) + self.reward_dist_weight = reward_dist_weight + self.reward_ctrl_weight = reward_ctrl_weight + + def step(self, action): + obs, _, terminated, truncated, info = self.env.step(action) + reward = ( + self.reward_dist_weight * info["reward_dist"] + + self.reward_ctrl_weight * info["reward_ctrl"] + ) + return obs, reward, terminated, truncated, info diff --git a/gymnasium/core.py b/gymnasium/core.py index de7005dec..15a2ab9b4 100644 --- a/gymnasium/core.py +++ b/gymnasium/core.py @@ -236,58 +236,16 @@ WrapperActType = TypeVar("WrapperActType") class Wrapper(Env[WrapperObsType, WrapperActType]): """Wraps a :class:`gymnasium.Env` to allow a modular transformation of the :meth:`step` and :meth:`reset` methods. - This class is the base class of all wrappers to change the behavior of the underlying environment allowing - modification to the :attr:`action_space`, :attr:`observation_space`, :attr:`reward_range` and :attr:`metadata` - that doesn't change the underlying environment attributes. + This class is the base class of all wrappers to change the behavior of the underlying environment. + Wrappers that inherit from this class can modify the :attr:`action_space`, :attr:`observation_space`, + :attr:`reward_range` and :attr:`metadata` attributes, without changing the underlying environment's attributes. + Moreover, the behavior of the :meth:`step` and :meth:`reset` methods can be changed by these wrappers. - In addition, for several attributes (:attr:`spec`, :attr:`render_mode`, :attr:`np_random`) will point back to the - wrapper's environment. - - Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly. - Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can - also be chained to combine their effects. Most environments that are generated via `gymnasium.make` will already be wrapped by default. - - In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along - with (possibly optional) parameters to the wrapper's constructor. - - >>> import gymnasium as gym - >>> from gymnasium.wrappers import RescaleAction - >>> base_env = gym.make("BipedalWalker-v3") - >>> base_env.action_space - Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32) - >>> wrapped_env = RescaleAction(base_env, min_action=0, max_action=1) - >>> wrapped_env.action_space - Box([0. 0. 0. 0.], [1. 1. 1. 1.], (4,), float32) - - You can access the environment underneath the **first** wrapper by using the :attr:`env` attribute. - As the :class:`Wrapper` class inherits from :class:`Env` then :attr:`env` can be another wrapper. - - >>> wrapped_env - >>>> - >>> wrapped_env.env - >>> - - If you want to get to the environment underneath **all** of the layers of wrappers, you can use the `.unwrapped` attribute. - If the environment is already a bare environment, the `.unwrapped` attribute will just return itself. - - >>> wrapped_env - >>>> - >>> wrapped_env.unwrapped - - - There are three common things you might want a wrapper to do: - - - Transform actions before applying them to the base environment - - Transform observations that are returned by the base environment - - Transform rewards that are returned by the base environment - - Such wrappers can be easily implemented by inheriting from `ActionWrapper`, `ObservationWrapper`, or `RewardWrapper` and implementing the - respective transformation. If you need a wrapper to do more complicated tasks, you can inherit from the `Wrapper` class directly. - The code that is presented in the following sections can also be found in - the [gym-examples](https://github.com/Farama-Foundation/gym-examples) repository + Some attributes (:attr:`spec`, :attr:`render_mode`, :attr:`np_random`) will point back to the wrapper's environment + (i.e. to the corresponding attributes of :attr:`env`). Note: - Don't forget to call ``super().__init__(env)`` + If you inherit from :class:`Wrapper`, don't forget to call ``super().__init__(env)`` """ def __init__(self, env: Env[ObsType, ActType]): @@ -425,7 +383,10 @@ class Wrapper(Env[WrapperObsType, WrapperActType]): @property def unwrapped(self) -> Env[ObsType, ActType]: - """Returns the base environment of the wrapper.""" + """Returns the base environment of the wrapper. + + This will be the bare :class:`gymnasium.Env` environment, underneath all layers of wrappers. + """ return self.env.unwrapped @@ -438,20 +399,6 @@ class ObservationWrapper(Wrapper[WrapperObsType, ActType]): reflected by the :attr:`env` observation space. Otherwise, you need to specify the new observation space of the wrapper by setting :attr:`self.observation_space` in the :meth:`__init__` method of your wrapper. - For example, you might have a 2D navigation task where the environment returns dictionaries as observations with - keys ``"agent_position"`` and ``"target_position"``. A common thing to do might be to throw away some degrees of - freedom and only consider the position of the target relative to the agent, i.e. - ``observation["target_position"] - observation["agent_position"]``. For this, you could implement an - observation wrapper like this:: - - class RelativePosition(gym.ObservationWrapper): - def __init__(self, env): - super().__init__(env) - self.observation_space = Box(shape=(2,), low=-np.inf, high=np.inf) - - def observation(self, obs): - return obs["target"] - obs["agent"] - Among others, Gymnasium provides the observation wrapper :class:`TimeAwareObservation`, which adds information about the index of the timestep to the observation. """ @@ -494,20 +441,6 @@ class RewardWrapper(Wrapper[ObsType, ActType]): :meth:`reward` to implement that transformation. This transformation might change the :attr:`reward_range`; to specify the :attr:`reward_range` of your wrapper, you can simply define :attr:`self.reward_range` in :meth:`__init__`. - - Let us look at an example: Sometimes (especially when we do not have control over the reward - because it is intrinsic), we want to clip the reward to a range to gain some numerical stability. - To do that, we could, for instance, implement the following wrapper:: - - class ClipReward(gym.RewardWrapper): - def __init__(self, env, min_reward, max_reward): - super().__init__(env) - self.min_reward = min_reward - self.max_reward = max_reward - self.reward_range = (min_reward, max_reward) - - def reward(self, r: SupportsFloat) -> SupportsFloat: - return np.clip(r, self.min_reward, self.max_reward) """ def __init__(self, env: Env[ObsType, ActType]): @@ -543,24 +476,6 @@ class ActionWrapper(Wrapper[ObsType, WrapperActType]): In that case, you need to specify the new action space of the wrapper by setting :attr:`self.action_space` in the :meth:`__init__` method of your wrapper. - Let’s say you have an environment with action space of type :class:`gymnasium.spaces.Box`, but you would only like - to use a finite subset of actions. Then, you might want to implement the following wrapper:: - - class DiscreteActions(gym.ActionWrapper): - def __init__(self, env, disc_to_cont): - super().__init__(env) - self.disc_to_cont = disc_to_cont - self.action_space = Discrete(len(disc_to_cont)) - - def action(self, act): - return self.disc_to_cont[act] - - if __name__ == "__main__": - env = gym.make("LunarLanderContinuous-v2") - wrapped_env = DiscreteActions(env, [np.array([1,0]), np.array([-1,0]), - np.array([0,1]), np.array([0,-1])]) - print(wrapped_env.action_space) #Discrete(4) - Among others, Gymnasium provides the action wrappers :class:`ClipAction` and :class:`RescaleAction` for clipping and rescaling actions. """ diff --git a/gymnasium/wrappers/__init__.py b/gymnasium/wrappers/__init__.py index 152dc4a22..b8b863af1 100644 --- a/gymnasium/wrappers/__init__.py +++ b/gymnasium/wrappers/__init__.py @@ -1,4 +1,51 @@ -"""Module of wrapper classes.""" +"""Module of wrapper classes. + +Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly. +Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can +also be chained to combine their effects. +Most environments that are generated via :meth:`gymnasium.make` will already be wrapped by default. + +In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along +with (possibly optional) parameters to the wrapper's constructor. + + >>> import gymnasium as gym + >>> from gymnasium.wrappers import RescaleAction + >>> base_env = gym.make("BipedalWalker-v3") + >>> base_env.action_space + Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32) + >>> wrapped_env = RescaleAction(base_env, min_action=0, max_action=1) + >>> wrapped_env.action_space + Box([0. 0. 0. 0.], [1. 1. 1. 1.], (4,), float32) + +You can access the environment underneath the **first** wrapper by using the :attr:`gymnasium.Wrapper.env` attribute. +As the :class:`gymnasium.Wrapper` class inherits from :class:`gymnasium.Env` then :attr:`gymnasium.Wrapper.env` can be another wrapper. + + >>> wrapped_env + >>>> + >>> wrapped_env.env + >>> + +If you want to get to the environment underneath **all** of the layers of wrappers, you can use the +:attr:`gymnasium.Wrapper.unwrapped` attribute. +If the environment is already a bare environment, the :attr:`gymnasium.Wrapper.unwrapped` attribute will just return itself. + + >>> wrapped_env + >>>> + >>> wrapped_env.unwrapped + + +There are three common things you might want a wrapper to do: + +- Transform actions before applying them to the base environment +- Transform observations that are returned by the base environment +- Transform rewards that are returned by the base environment + +Such wrappers can be easily implemented by inheriting from :class:`gymnasium.ActionWrapper`, +:class:`gymnasium.ObservationWrapper`, or :class:`gymnasium.RewardWrapper` and implementing the respective transformation. +If you need a wrapper to do more complicated tasks, you can inherit from the :class:`gymnasium.Wrapper` class directly. + +If you'd like to implement your own custom wrapper, check out `the corresponding tutorial <../../tutorials/implementing_custom_wrappers>`_. +""" from gymnasium.wrappers.atari_preprocessing import AtariPreprocessing from gymnasium.wrappers.autoreset import AutoResetWrapper from gymnasium.wrappers.clip_action import ClipAction