Updated Wrapper docs (#173)

2025-07-31 13:54:31 +00:00 · 2022-12-03 13:46:11 +01:00
parent 4b7f941db3
commit 851b2f4be6
8 changed files with 218 additions and 249 deletions
--- a/docs/api/wrappers.md
+++ b/docs/api/wrappers.md
@@ -12,6 +12,11 @@ wrappers/observation_wrappers
 wrappers/reward_wrappers
 ```
 ```{eval-rst}
 .. automodule:: gymnasium.wrappers
 ```
 ## gymnasium.Wrapper
 ```{eval-rst}
@@ -35,6 +40,13 @@ wrappers/reward_wrappers
 .. autoproperty:: gymnasium.Wrapper.spec
 .. autoproperty:: gymnasium.Wrapper.metadata
 .. autoproperty:: gymnasium.Wrapper.np_random
 .. attribute:: gymnasium.Wrapper.env
    The environment (one level underneath) this wrapper. 
    This may itself be a wrapped environment. 
    To obtain the environment underneath all layers of wrappers, use :attr:`gymnasium.Wrapper.unwrapped`.
 .. autoproperty:: gymnasium.Wrapper.unwrapped
 ```
@@ -124,43 +136,4 @@ wrapper in the page on the wrapper type
    * - :class:`VectorListInfo`          
      - Misc Wrapper            
      - This wrapper will convert the info of a vectorized environment from the `dict` format to a `list` of dictionaries where the i-th dictionary contains info of the i-th environment. 
 ```
 ## Implementing a custom wrapper
 Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
 reward based on data in `info` or change the rendering behavior). 
 Such wrappers can be implemented by inheriting from Misc Wrapper. 
 - You can set a new action or observation space by defining `self.action_space` or `self.observation_space` in `__init__`, respectively
 - You can set new metadata and reward range by defining `self.metadata` and `self.reward_range` in `__init__`, respectively
 - You can override `step`, `render`, `close` etc. If you do this, you can access the environment that was passed
 to your wrapper (which *still* might be wrapped in some other wrapper) by accessing the attribute `self.env`.
 Let's also take a look at an example for this case. Most MuJoCo environments return a reward that consists
 of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that
 penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during
 initialization of the environment. However, *Reacher* does not allow you to do this! Nevertheless, all individual terms
 of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms:
 ```python
 import gymnasium as gym
 class ReacherRewardWrapper(gym.Wrapper):
    def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
        super().__init__(env)
        self.reward_dist_weight = reward_dist_weight
        self.reward_ctrl_weight = reward_ctrl_weight
    def step(self, action):
        obs, _, terminated, truncated, info = self.env.step(action)
        reward = (
            self.reward_dist_weight * info["reward_dist"]
            + self.reward_ctrl_weight * info["reward_ctrl"]
        )
        return obs, reward, terminated, truncated, info
 ```
 ```{note}
 It is *not* sufficient to use a `RewardWrapper` in this case!
 ```
--- a/docs/api/wrappers/action_wrappers.md
+++ b/docs/api/wrappers/action_wrappers.md
@@ -1,22 +1,16 @@
 # Action Wrappers
-## Action Wrapper
+## Base Class
 ```{eval-rst}
 .. autoclass:: gymnasium.ActionWrapper
-    ..  autofunction:: gymnasium.ActionWrapper.action
+    ..  automethod:: gymnasium.ActionWrapper.action
 ```
-## Clip Action
+## Available Action Wrappers
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.ClipAction
 ```
 ## Rescale Action
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.RescaleAction
 ```
--- a/docs/api/wrappers/misc_wrappers.md
+++ b/docs/api/wrappers/misc_wrappers.md
@@ -1,68 +1,15 @@
 # Misc Wrappers
 ## Atari Preprocessing
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.AtariPreprocessing
 ```
 ## Autoreset
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.AutoResetWrapper
 ```
 ## Compatibility
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.EnvCompatibility
 .. autoclass:: gymnasium.wrappers.StepAPICompatibility
 ```
 ## Passive Environment Checker
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.PassiveEnvChecker
 ```
 ## Human Rendering
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.HumanRendering
 ```
 ## Order Enforcing
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.OrderEnforcing
 ```
 ## Record Episode Statistics
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.RecordEpisodeStatistics
 ```
 ## Record Video
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.RecordVideo
 ```
 ## Render Collection
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.RenderCollection
 ```
 ## Time Limit
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.TimeLimit
 ```
 ## Vector List Info
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.VectorListInfo
 ```
--- a/docs/api/wrappers/observation_wrappers.md
+++ b/docs/api/wrappers/observation_wrappers.md
@@ -1,62 +1,23 @@
 # Observation Wrappers
-## Observation Wrapper
+## Base Class
 ```{eval-rst}
 .. autoclass:: gymnasium.ObservationWrapper
-.. autofunction:: gymnasium.ObservationWrapper.observation
+
    .. automethod:: gymnasium.ObservationWrapper.observation
 ```
-## Transform Observation
+## Available Observation Wrappers
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.TransformObservation
 ```
 ## Filter Observation
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.FilterObservation
 ```
 ## Flatten Observation
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.FlattenObservation
 ```
 ## Framestack Observations
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.FrameStack
 ```
 ## Gray Scale Observation
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.GrayScaleObservation
 ```
 ## Normalize Observation
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.NormalizeObservation
 ```
 ## Pixel Observation Wrapper
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.PixelObservationWrapper
 ```
 ## Resize Observation
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.ResizeObservation
 ```
 ## Time Aware Observation
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.TimeAwareObservation
 ```
--- a/docs/api/wrappers/reward_wrappers.md
+++ b/docs/api/wrappers/reward_wrappers.md
@@ -1,22 +1,17 @@
 # Reward Wrappers
-## Reward Wrapper
+## Base Class
 ```{eval-rst}
 .. autoclass:: gymnasium.RewardWrapper
-    .. autofunction:: gymnasium.RewardWrapper.reward
+    .. automethod:: gymnasium.RewardWrapper.reward
 ```
-## Transform Reward
+## Available Reward Wrappers
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.TransformReward
 ```
 ## Normalize Reward
 ```{eval-rst}
 .. autoclass:: gymnasium.wrappers.NormalizeReward
 ```
--- a/docs/tutorials/implementing_custom_wrappers.py
+++ b/docs/tutorials/implementing_custom_wrappers.py
@@ -0,0 +1,137 @@
 """
 Implementing Custom Wrappers
 ============================
 In this tutorial we will describe how to implement your own custom wrappers.
 Wrappers are a great way to add functionality to your environments in a modular way.
 This will save you a lot of boilerplate code.
 We will show how to create a wrapper by
 - Inheriting from :class:`gymnasium.ObservationWrapper`
 - Inheriting from :class:`gymnasium.ActionWrapper`
 - Inheriting from :class:`gymnasium.RewardWrapper`
 - Inheriting from :class:`gymnasium.Wrapper`
 Before following this tutorial, make sure to check out the docs of the :mod:`gymnasium.wrappers` module.
 """
 # %%
 # Inheriting from :class:`gymnasium.ObservationWrapper`
 # -----------------------------------------------------
 # Observation wrappers are useful if you want to apply some function to the observations that are returned
 # by an environment. If you implement an observation wrapper, you only need to define this transformation
 # by implementing the :meth:`gymnasium.ObservationWrapper.observation` method. Moreover, you should remember to
 # update the observation space, if the transformation changes the shape of observations (e.g. by transforming
 # dictionaries into numpy arrays, as in the following example).
 #
 # Imagine you have a 2D navigation task where the environment returns dictionaries as observations with
 # keys ``"agent_position"`` and ``"target_position"``. A common thing to do might be to throw away some degrees of
 # freedom and only consider the position of the target relative to the agent, i.e.
 # ``observation["target_position"] - observation["agent_position"]``. For this, you could implement an
 # observation wrapper like this:
 import numpy as np
 from gym import ActionWrapper, ObservationWrapper, RewardWrapper, Wrapper
 import gymnasium as gym
 from gymnasium.spaces import Box, Discrete
 class RelativePosition(ObservationWrapper):
    def __init__(self, env):
        super().__init__(env)
        self.observation_space = Box(shape=(2,), low=-np.inf, high=np.inf)
    def observation(self, obs):
        return obs["target"] - obs["agent"]
 # %%
 # Inheriting from :class:`gymnasium.ActionWrapper`
 # ------------------------------------------------
 # Action wrappers can be used to apply a transformation to actions before applying them to the environment.
 # If you implement an action wrapper, you need to define that transformation by implementing
 # :meth:`gymnasium.ActionWrapper.action`. Moreover, you should specify the domain of that transformation
 # by updating the action space of the wrapper.
 #
 # Let’s say you have an environment with action space of type :class:`gymnasium.spaces.Box`, but you would only like
 # to use a finite subset of actions. Then, you might want to implement the following wrapper:
 class DiscreteActions(ActionWrapper):
    def __init__(self, env, disc_to_cont):
        super().__init__(env)
        self.disc_to_cont = disc_to_cont
        self.action_space = Discrete(len(disc_to_cont))
    def action(self, act):
        return self.disc_to_cont[act]
 if __name__ == "__main__":
    env = gym.make("LunarLanderContinuous-v2")
    wrapped_env = DiscreteActions(
        env, [np.array([1, 0]), np.array([-1, 0]), np.array([0, 1]), np.array([0, -1])]
    )
    print(wrapped_env.action_space)  # Discrete(4)
 # %%
 # Inheriting from :class:`gymnasium.RewardWrapper`
 # ------------------------------------------------
 # Reward wrappers are used to transform the reward that is returned by an environment.
 # As for the previous wrappers, you need to specify that transformation by implementing the
 # :meth:`gymnasium.RewardWrapper.reward` method. Also, you might want to update the reward range of the wrapper.
 #
 # Let us look at an example: Sometimes (especially when we do not have control over the reward
 # because it is intrinsic), we want to clip the reward to a range to gain some numerical stability.
 # To do that, we could, for instance, implement the following wrapper:
 from typing import SupportsFloat
 class ClipReward(RewardWrapper):
    def __init__(self, env, min_reward, max_reward):
        super().__init__(env)
        self.min_reward = min_reward
        self.max_reward = max_reward
        self.reward_range = (min_reward, max_reward)
    def reward(self, r: SupportsFloat) -> SupportsFloat:
        return np.clip(r, self.min_reward, self.max_reward)
 # %%
 # Inheriting from :class:`gymnasium.Wrapper`
 # ------------------------------------------
 # Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
 # reward based on data in ``info`` or change the rendering behavior).
 # Such wrappers can be implemented by inheriting from :class:`gymnasium.Wrapper`.
 #
 # - You can set a new action or observation space by defining ``self.action_space`` or ``self.observation_space`` in ``__init__``, respectively
 # - You can set new metadata and reward range by defining ``self.metadata`` and ``self.reward_range`` in ``__init__``, respectively
 # - You can override :meth:`gymnasium.Wrapper.step`, :meth:`gymnasium.Wrapper.render`, :meth:`gymnasium.Wrapper.close` etc.
 # If you do this, you can access the environment that was passed
 # to your wrapper (which *still* might be wrapped in some other wrapper) by accessing the attribute :attr:`env`.
 #
 # Let's also take a look at an example for this case. Most MuJoCo environments return a reward that consists
 # of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that
 # penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during
 # initialization of the environment. However, *Reacher* does not allow you to do this! Nevertheless, all individual terms
 # of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms:
 class ReacherRewardWrapper(Wrapper):
    def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
        super().__init__(env)
        self.reward_dist_weight = reward_dist_weight
        self.reward_ctrl_weight = reward_ctrl_weight
    def step(self, action):
        obs, _, terminated, truncated, info = self.env.step(action)
        reward = (
            self.reward_dist_weight * info["reward_dist"]
            + self.reward_ctrl_weight * info["reward_ctrl"]
        )
        return obs, reward, terminated, truncated, info
--- a/gymnasium/core.py
+++ b/gymnasium/core.py
@@ -236,58 +236,16 @@ WrapperActType = TypeVar("WrapperActType")
 class Wrapper(Env[WrapperObsType, WrapperActType]):
    """Wraps a :class:`gymnasium.Env` to allow a modular transformation of the :meth:`step` and :meth:`reset` methods.
-    This class is the base class of all wrappers to change the behavior of the underlying environment allowing
+    This class is the base class of all wrappers to change the behavior of the underlying environment.
-    modification to the :attr:`action_space`, :attr:`observation_space`, :attr:`reward_range` and :attr:`metadata`
+    Wrappers that inherit from this class can modify the :attr:`action_space`, :attr:`observation_space`,
-    that doesn't change the underlying environment attributes.
+    :attr:`reward_range` and :attr:`metadata` attributes, without changing the underlying environment's attributes.
    Moreover, the behavior of the :meth:`step` and :meth:`reset` methods can be changed by these wrappers.
-    In addition, for several attributes (:attr:`spec`, :attr:`render_mode`, :attr:`np_random`) will point back to the
+    Some attributes (:attr:`spec`, :attr:`render_mode`, :attr:`np_random`) will point back to the wrapper's environment
-    wrapper's environment.
+    (i.e. to the corresponding attributes of :attr:`env`).
    Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly.
    Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can
    also be chained to combine their effects. Most environments that are generated via `gymnasium.make` will already be wrapped by default.
    In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along
    with (possibly optional) parameters to the wrapper's constructor.
        >>> import gymnasium as gym
        >>> from gymnasium.wrappers import RescaleAction
        >>> base_env = gym.make("BipedalWalker-v3")
        >>> base_env.action_space
        Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
        >>> wrapped_env = RescaleAction(base_env, min_action=0, max_action=1)
        >>> wrapped_env.action_space
        Box([0. 0. 0. 0.], [1. 1. 1. 1.], (4,), float32)
    You can access the environment underneath the **first** wrapper by using the :attr:`env` attribute.
    As the :class:`Wrapper` class inherits from :class:`Env` then :attr:`env` can be another wrapper.
        >>> wrapped_env
        <RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
        >>> wrapped_env.env
        <TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>
    If you want to get to the environment underneath **all** of the layers of wrappers, you can use the `.unwrapped` attribute.
    If the environment is already a bare environment, the `.unwrapped` attribute will just return itself.
        >>> wrapped_env
        <RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
        >>> wrapped_env.unwrapped
        <gymnasium.envs.box2d.bipedal_walker.BipedalWalker object at 0x7f87d70712d0>
    There are three common things you might want a wrapper to do:
    - Transform actions before applying them to the base environment
    - Transform observations that are returned by the base environment
    - Transform rewards that are returned by the base environment
    Such wrappers can be easily implemented by inheriting from `ActionWrapper`, `ObservationWrapper`, or `RewardWrapper` and implementing the
    respective transformation. If you need a wrapper to do more complicated tasks, you can inherit from the `Wrapper` class directly.
    The code that is presented in the following sections can also be found in
    the [gym-examples](https://github.com/Farama-Foundation/gym-examples) repository
    Note:
-        Don't forget to call ``super().__init__(env)``
+        If you inherit from :class:`Wrapper`, don't forget to call ``super().__init__(env)``
    """
    def __init__(self, env: Env[ObsType, ActType]):
@@ -425,7 +383,10 @@ class Wrapper(Env[WrapperObsType, WrapperActType]):
    @property
    def unwrapped(self) -> Env[ObsType, ActType]:
-        """Returns the base environment of the wrapper."""
+        """Returns the base environment of the wrapper.
        This will be the bare :class:`gymnasium.Env` environment, underneath all layers of wrappers.
        """
        return self.env.unwrapped
@@ -438,20 +399,6 @@ class ObservationWrapper(Wrapper[WrapperObsType, ActType]):
    reflected by the :attr:`env` observation space. Otherwise, you need to specify the new observation space of the
    wrapper by setting :attr:`self.observation_space` in the :meth:`__init__` method of your wrapper.
    For example, you might have a 2D navigation task where the environment returns dictionaries as observations with
    keys ``"agent_position"`` and ``"target_position"``. A common thing to do might be to throw away some degrees of
    freedom and only consider the position of the target relative to the agent, i.e.
    ``observation["target_position"] - observation["agent_position"]``. For this, you could implement an
    observation wrapper like this::
        class RelativePosition(gym.ObservationWrapper):
            def __init__(self, env):
                super().__init__(env)
                self.observation_space = Box(shape=(2,), low=-np.inf, high=np.inf)
            def observation(self, obs):
                return obs["target"] - obs["agent"]
    Among others, Gymnasium provides the observation wrapper :class:`TimeAwareObservation`, which adds information about the
    index of the timestep to the observation.
    """
@@ -494,20 +441,6 @@ class RewardWrapper(Wrapper[ObsType, ActType]):
    :meth:`reward` to implement that transformation.
    This transformation might change the :attr:`reward_range`; to specify the :attr:`reward_range` of your wrapper,
    you can simply define :attr:`self.reward_range` in :meth:`__init__`.
    Let us look at an example: Sometimes (especially when we do not have control over the reward
    because it is intrinsic), we want to clip the reward to a range to gain some numerical stability.
    To do that, we could, for instance, implement the following wrapper::
        class ClipReward(gym.RewardWrapper):
            def __init__(self, env, min_reward, max_reward):
                super().__init__(env)
                self.min_reward = min_reward
                self.max_reward = max_reward
                self.reward_range = (min_reward, max_reward)
            def reward(self, r: SupportsFloat) -> SupportsFloat:
                return np.clip(r, self.min_reward, self.max_reward)
    """
    def __init__(self, env: Env[ObsType, ActType]):
@@ -543,24 +476,6 @@ class ActionWrapper(Wrapper[ObsType, WrapperActType]):
    In that case, you need to specify the new action space of the wrapper by setting :attr:`self.action_space` in
    the :meth:`__init__` method of your wrapper.
    Let’s say you have an environment with action space of type :class:`gymnasium.spaces.Box`, but you would only like
    to use a finite subset of actions. Then, you might want to implement the following wrapper::
        class DiscreteActions(gym.ActionWrapper):
            def __init__(self, env, disc_to_cont):
                super().__init__(env)
                self.disc_to_cont = disc_to_cont
                self.action_space = Discrete(len(disc_to_cont))
            def action(self, act):
                return self.disc_to_cont[act]
        if __name__ == "__main__":
            env = gym.make("LunarLanderContinuous-v2")
            wrapped_env = DiscreteActions(env, [np.array([1,0]), np.array([-1,0]),
                                                np.array([0,1]), np.array([0,-1])])
            print(wrapped_env.action_space)         #Discrete(4)
    Among others, Gymnasium provides the action wrappers :class:`ClipAction` and :class:`RescaleAction` for clipping and rescaling actions.
    """
--- a/gymnasium/wrappers/init.py
+++ b/gymnasium/wrappers/init.py
@@ -1,4 +1,51 @@
-"""Module of wrapper classes."""
+"""Module of wrapper classes.
 Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly.
 Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can
 also be chained to combine their effects.
 Most environments that are generated via :meth:`gymnasium.make` will already be wrapped by default.
 In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along
 with (possibly optional) parameters to the wrapper's constructor.
    >>> import gymnasium as gym
    >>> from gymnasium.wrappers import RescaleAction
    >>> base_env = gym.make("BipedalWalker-v3")
    >>> base_env.action_space
    Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
    >>> wrapped_env = RescaleAction(base_env, min_action=0, max_action=1)
    >>> wrapped_env.action_space
    Box([0. 0. 0. 0.], [1. 1. 1. 1.], (4,), float32)
 You can access the environment underneath the **first** wrapper by using the :attr:`gymnasium.Wrapper.env` attribute.
 As the :class:`gymnasium.Wrapper` class inherits from :class:`gymnasium.Env` then :attr:`gymnasium.Wrapper.env` can be another wrapper.
    >>> wrapped_env
    <RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
    >>> wrapped_env.env
    <TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>
 If you want to get to the environment underneath **all** of the layers of wrappers, you can use the
 :attr:`gymnasium.Wrapper.unwrapped` attribute.
 If the environment is already a bare environment, the :attr:`gymnasium.Wrapper.unwrapped` attribute will just return itself.
    >>> wrapped_env
    <RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
    >>> wrapped_env.unwrapped
    <gymnasium.envs.box2d.bipedal_walker.BipedalWalker object at 0x7f87d70712d0>
 There are three common things you might want a wrapper to do:
 - Transform actions before applying them to the base environment
 - Transform observations that are returned by the base environment
 - Transform rewards that are returned by the base environment
 Such wrappers can be easily implemented by inheriting from :class:`gymnasium.ActionWrapper`,
 :class:`gymnasium.ObservationWrapper`, or :class:`gymnasium.RewardWrapper` and implementing the respective transformation.
 If you need a wrapper to do more complicated tasks, you can inherit from the :class:`gymnasium.Wrapper` class directly.
 If you'd like to implement your own custom wrapper, check out `the corresponding tutorial <../../tutorials/implementing_custom_wrappers>`_.
 """
 from gymnasium.wrappers.atari_preprocessing import AtariPreprocessing
 from gymnasium.wrappers.autoreset import AutoResetWrapper
 from gymnasium.wrappers.clip_action import ClipAction