mirror of
https://github.com/Farama-Foundation/Gymnasium.git
synced 2025-07-31 13:54:31 +00:00
Updated Wrapper docs (#173)
This commit is contained in:
@@ -12,6 +12,11 @@ wrappers/observation_wrappers
|
|||||||
wrappers/reward_wrappers
|
wrappers/reward_wrappers
|
||||||
```
|
```
|
||||||
|
|
||||||
|
```{eval-rst}
|
||||||
|
.. automodule:: gymnasium.wrappers
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
## gymnasium.Wrapper
|
## gymnasium.Wrapper
|
||||||
|
|
||||||
```{eval-rst}
|
```{eval-rst}
|
||||||
@@ -35,6 +40,13 @@ wrappers/reward_wrappers
|
|||||||
.. autoproperty:: gymnasium.Wrapper.spec
|
.. autoproperty:: gymnasium.Wrapper.spec
|
||||||
.. autoproperty:: gymnasium.Wrapper.metadata
|
.. autoproperty:: gymnasium.Wrapper.metadata
|
||||||
.. autoproperty:: gymnasium.Wrapper.np_random
|
.. autoproperty:: gymnasium.Wrapper.np_random
|
||||||
|
.. attribute:: gymnasium.Wrapper.env
|
||||||
|
|
||||||
|
The environment (one level underneath) this wrapper.
|
||||||
|
|
||||||
|
This may itself be a wrapped environment.
|
||||||
|
To obtain the environment underneath all layers of wrappers, use :attr:`gymnasium.Wrapper.unwrapped`.
|
||||||
|
|
||||||
.. autoproperty:: gymnasium.Wrapper.unwrapped
|
.. autoproperty:: gymnasium.Wrapper.unwrapped
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -124,43 +136,4 @@ wrapper in the page on the wrapper type
|
|||||||
* - :class:`VectorListInfo`
|
* - :class:`VectorListInfo`
|
||||||
- Misc Wrapper
|
- Misc Wrapper
|
||||||
- This wrapper will convert the info of a vectorized environment from the `dict` format to a `list` of dictionaries where the i-th dictionary contains info of the i-th environment.
|
- This wrapper will convert the info of a vectorized environment from the `dict` format to a `list` of dictionaries where the i-th dictionary contains info of the i-th environment.
|
||||||
```
|
|
||||||
|
|
||||||
## Implementing a custom wrapper
|
|
||||||
|
|
||||||
Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
|
|
||||||
reward based on data in `info` or change the rendering behavior).
|
|
||||||
Such wrappers can be implemented by inheriting from Misc Wrapper.
|
|
||||||
|
|
||||||
- You can set a new action or observation space by defining `self.action_space` or `self.observation_space` in `__init__`, respectively
|
|
||||||
- You can set new metadata and reward range by defining `self.metadata` and `self.reward_range` in `__init__`, respectively
|
|
||||||
- You can override `step`, `render`, `close` etc. If you do this, you can access the environment that was passed
|
|
||||||
to your wrapper (which *still* might be wrapped in some other wrapper) by accessing the attribute `self.env`.
|
|
||||||
|
|
||||||
Let's also take a look at an example for this case. Most MuJoCo environments return a reward that consists
|
|
||||||
of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that
|
|
||||||
penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during
|
|
||||||
initialization of the environment. However, *Reacher* does not allow you to do this! Nevertheless, all individual terms
|
|
||||||
of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms:
|
|
||||||
|
|
||||||
```python
|
|
||||||
import gymnasium as gym
|
|
||||||
|
|
||||||
class ReacherRewardWrapper(gym.Wrapper):
|
|
||||||
def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
|
|
||||||
super().__init__(env)
|
|
||||||
self.reward_dist_weight = reward_dist_weight
|
|
||||||
self.reward_ctrl_weight = reward_ctrl_weight
|
|
||||||
|
|
||||||
def step(self, action):
|
|
||||||
obs, _, terminated, truncated, info = self.env.step(action)
|
|
||||||
reward = (
|
|
||||||
self.reward_dist_weight * info["reward_dist"]
|
|
||||||
+ self.reward_ctrl_weight * info["reward_ctrl"]
|
|
||||||
)
|
|
||||||
return obs, reward, terminated, truncated, info
|
|
||||||
```
|
|
||||||
|
|
||||||
```{note}
|
|
||||||
It is *not* sufficient to use a `RewardWrapper` in this case!
|
|
||||||
```
|
```
|
@@ -1,22 +1,16 @@
|
|||||||
# Action Wrappers
|
# Action Wrappers
|
||||||
|
|
||||||
## Action Wrapper
|
## Base Class
|
||||||
|
|
||||||
```{eval-rst}
|
```{eval-rst}
|
||||||
.. autoclass:: gymnasium.ActionWrapper
|
.. autoclass:: gymnasium.ActionWrapper
|
||||||
|
|
||||||
.. autofunction:: gymnasium.ActionWrapper.action
|
.. automethod:: gymnasium.ActionWrapper.action
|
||||||
```
|
```
|
||||||
|
|
||||||
## Clip Action
|
## Available Action Wrappers
|
||||||
|
|
||||||
```{eval-rst}
|
```{eval-rst}
|
||||||
.. autoclass:: gymnasium.wrappers.ClipAction
|
.. autoclass:: gymnasium.wrappers.ClipAction
|
||||||
```
|
|
||||||
|
|
||||||
## Rescale Action
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.RescaleAction
|
.. autoclass:: gymnasium.wrappers.RescaleAction
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@@ -1,68 +1,15 @@
|
|||||||
# Misc Wrappers
|
# Misc Wrappers
|
||||||
|
|
||||||
## Atari Preprocessing
|
|
||||||
|
|
||||||
```{eval-rst}
|
```{eval-rst}
|
||||||
.. autoclass:: gymnasium.wrappers.AtariPreprocessing
|
.. autoclass:: gymnasium.wrappers.AtariPreprocessing
|
||||||
```
|
|
||||||
|
|
||||||
## Autoreset
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.AutoResetWrapper
|
.. autoclass:: gymnasium.wrappers.AutoResetWrapper
|
||||||
```
|
|
||||||
|
|
||||||
## Compatibility
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.EnvCompatibility
|
.. autoclass:: gymnasium.wrappers.EnvCompatibility
|
||||||
.. autoclass:: gymnasium.wrappers.StepAPICompatibility
|
.. autoclass:: gymnasium.wrappers.StepAPICompatibility
|
||||||
```
|
|
||||||
|
|
||||||
## Passive Environment Checker
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.PassiveEnvChecker
|
.. autoclass:: gymnasium.wrappers.PassiveEnvChecker
|
||||||
```
|
|
||||||
|
|
||||||
## Human Rendering
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.HumanRendering
|
.. autoclass:: gymnasium.wrappers.HumanRendering
|
||||||
```
|
|
||||||
|
|
||||||
## Order Enforcing
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.OrderEnforcing
|
.. autoclass:: gymnasium.wrappers.OrderEnforcing
|
||||||
```
|
|
||||||
|
|
||||||
## Record Episode Statistics
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.RecordEpisodeStatistics
|
.. autoclass:: gymnasium.wrappers.RecordEpisodeStatistics
|
||||||
```
|
|
||||||
|
|
||||||
## Record Video
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.RecordVideo
|
.. autoclass:: gymnasium.wrappers.RecordVideo
|
||||||
```
|
|
||||||
|
|
||||||
## Render Collection
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.RenderCollection
|
.. autoclass:: gymnasium.wrappers.RenderCollection
|
||||||
```
|
|
||||||
|
|
||||||
## Time Limit
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.TimeLimit
|
.. autoclass:: gymnasium.wrappers.TimeLimit
|
||||||
```
|
|
||||||
|
|
||||||
## Vector List Info
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.VectorListInfo
|
.. autoclass:: gymnasium.wrappers.VectorListInfo
|
||||||
```
|
```
|
||||||
|
@@ -1,62 +1,23 @@
|
|||||||
# Observation Wrappers
|
# Observation Wrappers
|
||||||
|
|
||||||
## Observation Wrapper
|
## Base Class
|
||||||
|
|
||||||
```{eval-rst}
|
```{eval-rst}
|
||||||
.. autoclass:: gymnasium.ObservationWrapper
|
.. autoclass:: gymnasium.ObservationWrapper
|
||||||
.. autofunction:: gymnasium.ObservationWrapper.observation
|
|
||||||
|
.. automethod:: gymnasium.ObservationWrapper.observation
|
||||||
```
|
```
|
||||||
|
|
||||||
## Transform Observation
|
## Available Observation Wrappers
|
||||||
|
|
||||||
```{eval-rst}
|
```{eval-rst}
|
||||||
.. autoclass:: gymnasium.wrappers.TransformObservation
|
.. autoclass:: gymnasium.wrappers.TransformObservation
|
||||||
```
|
|
||||||
|
|
||||||
## Filter Observation
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.FilterObservation
|
.. autoclass:: gymnasium.wrappers.FilterObservation
|
||||||
```
|
|
||||||
|
|
||||||
## Flatten Observation
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.FlattenObservation
|
.. autoclass:: gymnasium.wrappers.FlattenObservation
|
||||||
```
|
|
||||||
|
|
||||||
## Framestack Observations
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.FrameStack
|
.. autoclass:: gymnasium.wrappers.FrameStack
|
||||||
```
|
|
||||||
|
|
||||||
## Gray Scale Observation
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.GrayScaleObservation
|
.. autoclass:: gymnasium.wrappers.GrayScaleObservation
|
||||||
```
|
|
||||||
|
|
||||||
## Normalize Observation
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.NormalizeObservation
|
.. autoclass:: gymnasium.wrappers.NormalizeObservation
|
||||||
```
|
|
||||||
|
|
||||||
## Pixel Observation Wrapper
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.PixelObservationWrapper
|
.. autoclass:: gymnasium.wrappers.PixelObservationWrapper
|
||||||
```
|
|
||||||
|
|
||||||
## Resize Observation
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.ResizeObservation
|
.. autoclass:: gymnasium.wrappers.ResizeObservation
|
||||||
```
|
|
||||||
|
|
||||||
## Time Aware Observation
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.TimeAwareObservation
|
.. autoclass:: gymnasium.wrappers.TimeAwareObservation
|
||||||
```
|
```
|
||||||
|
@@ -1,22 +1,17 @@
|
|||||||
|
|
||||||
# Reward Wrappers
|
# Reward Wrappers
|
||||||
|
|
||||||
## Reward Wrapper
|
## Base Class
|
||||||
|
|
||||||
```{eval-rst}
|
```{eval-rst}
|
||||||
.. autoclass:: gymnasium.RewardWrapper
|
.. autoclass:: gymnasium.RewardWrapper
|
||||||
|
|
||||||
.. autofunction:: gymnasium.RewardWrapper.reward
|
.. automethod:: gymnasium.RewardWrapper.reward
|
||||||
```
|
```
|
||||||
|
|
||||||
## Transform Reward
|
## Available Reward Wrappers
|
||||||
|
|
||||||
```{eval-rst}
|
```{eval-rst}
|
||||||
.. autoclass:: gymnasium.wrappers.TransformReward
|
.. autoclass:: gymnasium.wrappers.TransformReward
|
||||||
```
|
|
||||||
|
|
||||||
## Normalize Reward
|
|
||||||
|
|
||||||
```{eval-rst}
|
|
||||||
.. autoclass:: gymnasium.wrappers.NormalizeReward
|
.. autoclass:: gymnasium.wrappers.NormalizeReward
|
||||||
```
|
```
|
||||||
|
137
docs/tutorials/implementing_custom_wrappers.py
Normal file
137
docs/tutorials/implementing_custom_wrappers.py
Normal file
@@ -0,0 +1,137 @@
|
|||||||
|
"""
|
||||||
|
Implementing Custom Wrappers
|
||||||
|
============================
|
||||||
|
|
||||||
|
In this tutorial we will describe how to implement your own custom wrappers.
|
||||||
|
Wrappers are a great way to add functionality to your environments in a modular way.
|
||||||
|
This will save you a lot of boilerplate code.
|
||||||
|
|
||||||
|
We will show how to create a wrapper by
|
||||||
|
|
||||||
|
- Inheriting from :class:`gymnasium.ObservationWrapper`
|
||||||
|
- Inheriting from :class:`gymnasium.ActionWrapper`
|
||||||
|
- Inheriting from :class:`gymnasium.RewardWrapper`
|
||||||
|
- Inheriting from :class:`gymnasium.Wrapper`
|
||||||
|
|
||||||
|
Before following this tutorial, make sure to check out the docs of the :mod:`gymnasium.wrappers` module.
|
||||||
|
"""
|
||||||
|
|
||||||
|
# %%
|
||||||
|
# Inheriting from :class:`gymnasium.ObservationWrapper`
|
||||||
|
# -----------------------------------------------------
|
||||||
|
# Observation wrappers are useful if you want to apply some function to the observations that are returned
|
||||||
|
# by an environment. If you implement an observation wrapper, you only need to define this transformation
|
||||||
|
# by implementing the :meth:`gymnasium.ObservationWrapper.observation` method. Moreover, you should remember to
|
||||||
|
# update the observation space, if the transformation changes the shape of observations (e.g. by transforming
|
||||||
|
# dictionaries into numpy arrays, as in the following example).
|
||||||
|
#
|
||||||
|
# Imagine you have a 2D navigation task where the environment returns dictionaries as observations with
|
||||||
|
# keys ``"agent_position"`` and ``"target_position"``. A common thing to do might be to throw away some degrees of
|
||||||
|
# freedom and only consider the position of the target relative to the agent, i.e.
|
||||||
|
# ``observation["target_position"] - observation["agent_position"]``. For this, you could implement an
|
||||||
|
# observation wrapper like this:
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
from gym import ActionWrapper, ObservationWrapper, RewardWrapper, Wrapper
|
||||||
|
|
||||||
|
import gymnasium as gym
|
||||||
|
from gymnasium.spaces import Box, Discrete
|
||||||
|
|
||||||
|
|
||||||
|
class RelativePosition(ObservationWrapper):
|
||||||
|
def __init__(self, env):
|
||||||
|
super().__init__(env)
|
||||||
|
self.observation_space = Box(shape=(2,), low=-np.inf, high=np.inf)
|
||||||
|
|
||||||
|
def observation(self, obs):
|
||||||
|
return obs["target"] - obs["agent"]
|
||||||
|
|
||||||
|
|
||||||
|
# %%
|
||||||
|
# Inheriting from :class:`gymnasium.ActionWrapper`
|
||||||
|
# ------------------------------------------------
|
||||||
|
# Action wrappers can be used to apply a transformation to actions before applying them to the environment.
|
||||||
|
# If you implement an action wrapper, you need to define that transformation by implementing
|
||||||
|
# :meth:`gymnasium.ActionWrapper.action`. Moreover, you should specify the domain of that transformation
|
||||||
|
# by updating the action space of the wrapper.
|
||||||
|
#
|
||||||
|
# Let’s say you have an environment with action space of type :class:`gymnasium.spaces.Box`, but you would only like
|
||||||
|
# to use a finite subset of actions. Then, you might want to implement the following wrapper:
|
||||||
|
|
||||||
|
|
||||||
|
class DiscreteActions(ActionWrapper):
|
||||||
|
def __init__(self, env, disc_to_cont):
|
||||||
|
super().__init__(env)
|
||||||
|
self.disc_to_cont = disc_to_cont
|
||||||
|
self.action_space = Discrete(len(disc_to_cont))
|
||||||
|
|
||||||
|
def action(self, act):
|
||||||
|
return self.disc_to_cont[act]
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
env = gym.make("LunarLanderContinuous-v2")
|
||||||
|
wrapped_env = DiscreteActions(
|
||||||
|
env, [np.array([1, 0]), np.array([-1, 0]), np.array([0, 1]), np.array([0, -1])]
|
||||||
|
)
|
||||||
|
print(wrapped_env.action_space) # Discrete(4)
|
||||||
|
|
||||||
|
|
||||||
|
# %%
|
||||||
|
# Inheriting from :class:`gymnasium.RewardWrapper`
|
||||||
|
# ------------------------------------------------
|
||||||
|
# Reward wrappers are used to transform the reward that is returned by an environment.
|
||||||
|
# As for the previous wrappers, you need to specify that transformation by implementing the
|
||||||
|
# :meth:`gymnasium.RewardWrapper.reward` method. Also, you might want to update the reward range of the wrapper.
|
||||||
|
#
|
||||||
|
# Let us look at an example: Sometimes (especially when we do not have control over the reward
|
||||||
|
# because it is intrinsic), we want to clip the reward to a range to gain some numerical stability.
|
||||||
|
# To do that, we could, for instance, implement the following wrapper:
|
||||||
|
|
||||||
|
from typing import SupportsFloat
|
||||||
|
|
||||||
|
|
||||||
|
class ClipReward(RewardWrapper):
|
||||||
|
def __init__(self, env, min_reward, max_reward):
|
||||||
|
super().__init__(env)
|
||||||
|
self.min_reward = min_reward
|
||||||
|
self.max_reward = max_reward
|
||||||
|
self.reward_range = (min_reward, max_reward)
|
||||||
|
|
||||||
|
def reward(self, r: SupportsFloat) -> SupportsFloat:
|
||||||
|
return np.clip(r, self.min_reward, self.max_reward)
|
||||||
|
|
||||||
|
|
||||||
|
# %%
|
||||||
|
# Inheriting from :class:`gymnasium.Wrapper`
|
||||||
|
# ------------------------------------------
|
||||||
|
# Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
|
||||||
|
# reward based on data in ``info`` or change the rendering behavior).
|
||||||
|
# Such wrappers can be implemented by inheriting from :class:`gymnasium.Wrapper`.
|
||||||
|
#
|
||||||
|
# - You can set a new action or observation space by defining ``self.action_space`` or ``self.observation_space`` in ``__init__``, respectively
|
||||||
|
# - You can set new metadata and reward range by defining ``self.metadata`` and ``self.reward_range`` in ``__init__``, respectively
|
||||||
|
# - You can override :meth:`gymnasium.Wrapper.step`, :meth:`gymnasium.Wrapper.render`, :meth:`gymnasium.Wrapper.close` etc.
|
||||||
|
# If you do this, you can access the environment that was passed
|
||||||
|
# to your wrapper (which *still* might be wrapped in some other wrapper) by accessing the attribute :attr:`env`.
|
||||||
|
#
|
||||||
|
# Let's also take a look at an example for this case. Most MuJoCo environments return a reward that consists
|
||||||
|
# of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that
|
||||||
|
# penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during
|
||||||
|
# initialization of the environment. However, *Reacher* does not allow you to do this! Nevertheless, all individual terms
|
||||||
|
# of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms:
|
||||||
|
|
||||||
|
|
||||||
|
class ReacherRewardWrapper(Wrapper):
|
||||||
|
def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
|
||||||
|
super().__init__(env)
|
||||||
|
self.reward_dist_weight = reward_dist_weight
|
||||||
|
self.reward_ctrl_weight = reward_ctrl_weight
|
||||||
|
|
||||||
|
def step(self, action):
|
||||||
|
obs, _, terminated, truncated, info = self.env.step(action)
|
||||||
|
reward = (
|
||||||
|
self.reward_dist_weight * info["reward_dist"]
|
||||||
|
+ self.reward_ctrl_weight * info["reward_ctrl"]
|
||||||
|
)
|
||||||
|
return obs, reward, terminated, truncated, info
|
@@ -236,58 +236,16 @@ WrapperActType = TypeVar("WrapperActType")
|
|||||||
class Wrapper(Env[WrapperObsType, WrapperActType]):
|
class Wrapper(Env[WrapperObsType, WrapperActType]):
|
||||||
"""Wraps a :class:`gymnasium.Env` to allow a modular transformation of the :meth:`step` and :meth:`reset` methods.
|
"""Wraps a :class:`gymnasium.Env` to allow a modular transformation of the :meth:`step` and :meth:`reset` methods.
|
||||||
|
|
||||||
This class is the base class of all wrappers to change the behavior of the underlying environment allowing
|
This class is the base class of all wrappers to change the behavior of the underlying environment.
|
||||||
modification to the :attr:`action_space`, :attr:`observation_space`, :attr:`reward_range` and :attr:`metadata`
|
Wrappers that inherit from this class can modify the :attr:`action_space`, :attr:`observation_space`,
|
||||||
that doesn't change the underlying environment attributes.
|
:attr:`reward_range` and :attr:`metadata` attributes, without changing the underlying environment's attributes.
|
||||||
|
Moreover, the behavior of the :meth:`step` and :meth:`reset` methods can be changed by these wrappers.
|
||||||
|
|
||||||
In addition, for several attributes (:attr:`spec`, :attr:`render_mode`, :attr:`np_random`) will point back to the
|
Some attributes (:attr:`spec`, :attr:`render_mode`, :attr:`np_random`) will point back to the wrapper's environment
|
||||||
wrapper's environment.
|
(i.e. to the corresponding attributes of :attr:`env`).
|
||||||
|
|
||||||
Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly.
|
|
||||||
Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can
|
|
||||||
also be chained to combine their effects. Most environments that are generated via `gymnasium.make` will already be wrapped by default.
|
|
||||||
|
|
||||||
In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along
|
|
||||||
with (possibly optional) parameters to the wrapper's constructor.
|
|
||||||
|
|
||||||
>>> import gymnasium as gym
|
|
||||||
>>> from gymnasium.wrappers import RescaleAction
|
|
||||||
>>> base_env = gym.make("BipedalWalker-v3")
|
|
||||||
>>> base_env.action_space
|
|
||||||
Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
|
|
||||||
>>> wrapped_env = RescaleAction(base_env, min_action=0, max_action=1)
|
|
||||||
>>> wrapped_env.action_space
|
|
||||||
Box([0. 0. 0. 0.], [1. 1. 1. 1.], (4,), float32)
|
|
||||||
|
|
||||||
You can access the environment underneath the **first** wrapper by using the :attr:`env` attribute.
|
|
||||||
As the :class:`Wrapper` class inherits from :class:`Env` then :attr:`env` can be another wrapper.
|
|
||||||
|
|
||||||
>>> wrapped_env
|
|
||||||
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
|
|
||||||
>>> wrapped_env.env
|
|
||||||
<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>
|
|
||||||
|
|
||||||
If you want to get to the environment underneath **all** of the layers of wrappers, you can use the `.unwrapped` attribute.
|
|
||||||
If the environment is already a bare environment, the `.unwrapped` attribute will just return itself.
|
|
||||||
|
|
||||||
>>> wrapped_env
|
|
||||||
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
|
|
||||||
>>> wrapped_env.unwrapped
|
|
||||||
<gymnasium.envs.box2d.bipedal_walker.BipedalWalker object at 0x7f87d70712d0>
|
|
||||||
|
|
||||||
There are three common things you might want a wrapper to do:
|
|
||||||
|
|
||||||
- Transform actions before applying them to the base environment
|
|
||||||
- Transform observations that are returned by the base environment
|
|
||||||
- Transform rewards that are returned by the base environment
|
|
||||||
|
|
||||||
Such wrappers can be easily implemented by inheriting from `ActionWrapper`, `ObservationWrapper`, or `RewardWrapper` and implementing the
|
|
||||||
respective transformation. If you need a wrapper to do more complicated tasks, you can inherit from the `Wrapper` class directly.
|
|
||||||
The code that is presented in the following sections can also be found in
|
|
||||||
the [gym-examples](https://github.com/Farama-Foundation/gym-examples) repository
|
|
||||||
|
|
||||||
Note:
|
Note:
|
||||||
Don't forget to call ``super().__init__(env)``
|
If you inherit from :class:`Wrapper`, don't forget to call ``super().__init__(env)``
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, env: Env[ObsType, ActType]):
|
def __init__(self, env: Env[ObsType, ActType]):
|
||||||
@@ -425,7 +383,10 @@ class Wrapper(Env[WrapperObsType, WrapperActType]):
|
|||||||
|
|
||||||
@property
|
@property
|
||||||
def unwrapped(self) -> Env[ObsType, ActType]:
|
def unwrapped(self) -> Env[ObsType, ActType]:
|
||||||
"""Returns the base environment of the wrapper."""
|
"""Returns the base environment of the wrapper.
|
||||||
|
|
||||||
|
This will be the bare :class:`gymnasium.Env` environment, underneath all layers of wrappers.
|
||||||
|
"""
|
||||||
return self.env.unwrapped
|
return self.env.unwrapped
|
||||||
|
|
||||||
|
|
||||||
@@ -438,20 +399,6 @@ class ObservationWrapper(Wrapper[WrapperObsType, ActType]):
|
|||||||
reflected by the :attr:`env` observation space. Otherwise, you need to specify the new observation space of the
|
reflected by the :attr:`env` observation space. Otherwise, you need to specify the new observation space of the
|
||||||
wrapper by setting :attr:`self.observation_space` in the :meth:`__init__` method of your wrapper.
|
wrapper by setting :attr:`self.observation_space` in the :meth:`__init__` method of your wrapper.
|
||||||
|
|
||||||
For example, you might have a 2D navigation task where the environment returns dictionaries as observations with
|
|
||||||
keys ``"agent_position"`` and ``"target_position"``. A common thing to do might be to throw away some degrees of
|
|
||||||
freedom and only consider the position of the target relative to the agent, i.e.
|
|
||||||
``observation["target_position"] - observation["agent_position"]``. For this, you could implement an
|
|
||||||
observation wrapper like this::
|
|
||||||
|
|
||||||
class RelativePosition(gym.ObservationWrapper):
|
|
||||||
def __init__(self, env):
|
|
||||||
super().__init__(env)
|
|
||||||
self.observation_space = Box(shape=(2,), low=-np.inf, high=np.inf)
|
|
||||||
|
|
||||||
def observation(self, obs):
|
|
||||||
return obs["target"] - obs["agent"]
|
|
||||||
|
|
||||||
Among others, Gymnasium provides the observation wrapper :class:`TimeAwareObservation`, which adds information about the
|
Among others, Gymnasium provides the observation wrapper :class:`TimeAwareObservation`, which adds information about the
|
||||||
index of the timestep to the observation.
|
index of the timestep to the observation.
|
||||||
"""
|
"""
|
||||||
@@ -494,20 +441,6 @@ class RewardWrapper(Wrapper[ObsType, ActType]):
|
|||||||
:meth:`reward` to implement that transformation.
|
:meth:`reward` to implement that transformation.
|
||||||
This transformation might change the :attr:`reward_range`; to specify the :attr:`reward_range` of your wrapper,
|
This transformation might change the :attr:`reward_range`; to specify the :attr:`reward_range` of your wrapper,
|
||||||
you can simply define :attr:`self.reward_range` in :meth:`__init__`.
|
you can simply define :attr:`self.reward_range` in :meth:`__init__`.
|
||||||
|
|
||||||
Let us look at an example: Sometimes (especially when we do not have control over the reward
|
|
||||||
because it is intrinsic), we want to clip the reward to a range to gain some numerical stability.
|
|
||||||
To do that, we could, for instance, implement the following wrapper::
|
|
||||||
|
|
||||||
class ClipReward(gym.RewardWrapper):
|
|
||||||
def __init__(self, env, min_reward, max_reward):
|
|
||||||
super().__init__(env)
|
|
||||||
self.min_reward = min_reward
|
|
||||||
self.max_reward = max_reward
|
|
||||||
self.reward_range = (min_reward, max_reward)
|
|
||||||
|
|
||||||
def reward(self, r: SupportsFloat) -> SupportsFloat:
|
|
||||||
return np.clip(r, self.min_reward, self.max_reward)
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, env: Env[ObsType, ActType]):
|
def __init__(self, env: Env[ObsType, ActType]):
|
||||||
@@ -543,24 +476,6 @@ class ActionWrapper(Wrapper[ObsType, WrapperActType]):
|
|||||||
In that case, you need to specify the new action space of the wrapper by setting :attr:`self.action_space` in
|
In that case, you need to specify the new action space of the wrapper by setting :attr:`self.action_space` in
|
||||||
the :meth:`__init__` method of your wrapper.
|
the :meth:`__init__` method of your wrapper.
|
||||||
|
|
||||||
Let’s say you have an environment with action space of type :class:`gymnasium.spaces.Box`, but you would only like
|
|
||||||
to use a finite subset of actions. Then, you might want to implement the following wrapper::
|
|
||||||
|
|
||||||
class DiscreteActions(gym.ActionWrapper):
|
|
||||||
def __init__(self, env, disc_to_cont):
|
|
||||||
super().__init__(env)
|
|
||||||
self.disc_to_cont = disc_to_cont
|
|
||||||
self.action_space = Discrete(len(disc_to_cont))
|
|
||||||
|
|
||||||
def action(self, act):
|
|
||||||
return self.disc_to_cont[act]
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
env = gym.make("LunarLanderContinuous-v2")
|
|
||||||
wrapped_env = DiscreteActions(env, [np.array([1,0]), np.array([-1,0]),
|
|
||||||
np.array([0,1]), np.array([0,-1])])
|
|
||||||
print(wrapped_env.action_space) #Discrete(4)
|
|
||||||
|
|
||||||
Among others, Gymnasium provides the action wrappers :class:`ClipAction` and :class:`RescaleAction` for clipping and rescaling actions.
|
Among others, Gymnasium provides the action wrappers :class:`ClipAction` and :class:`RescaleAction` for clipping and rescaling actions.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
@@ -1,4 +1,51 @@
|
|||||||
"""Module of wrapper classes."""
|
"""Module of wrapper classes.
|
||||||
|
|
||||||
|
Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly.
|
||||||
|
Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can
|
||||||
|
also be chained to combine their effects.
|
||||||
|
Most environments that are generated via :meth:`gymnasium.make` will already be wrapped by default.
|
||||||
|
|
||||||
|
In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along
|
||||||
|
with (possibly optional) parameters to the wrapper's constructor.
|
||||||
|
|
||||||
|
>>> import gymnasium as gym
|
||||||
|
>>> from gymnasium.wrappers import RescaleAction
|
||||||
|
>>> base_env = gym.make("BipedalWalker-v3")
|
||||||
|
>>> base_env.action_space
|
||||||
|
Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
|
||||||
|
>>> wrapped_env = RescaleAction(base_env, min_action=0, max_action=1)
|
||||||
|
>>> wrapped_env.action_space
|
||||||
|
Box([0. 0. 0. 0.], [1. 1. 1. 1.], (4,), float32)
|
||||||
|
|
||||||
|
You can access the environment underneath the **first** wrapper by using the :attr:`gymnasium.Wrapper.env` attribute.
|
||||||
|
As the :class:`gymnasium.Wrapper` class inherits from :class:`gymnasium.Env` then :attr:`gymnasium.Wrapper.env` can be another wrapper.
|
||||||
|
|
||||||
|
>>> wrapped_env
|
||||||
|
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
|
||||||
|
>>> wrapped_env.env
|
||||||
|
<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>
|
||||||
|
|
||||||
|
If you want to get to the environment underneath **all** of the layers of wrappers, you can use the
|
||||||
|
:attr:`gymnasium.Wrapper.unwrapped` attribute.
|
||||||
|
If the environment is already a bare environment, the :attr:`gymnasium.Wrapper.unwrapped` attribute will just return itself.
|
||||||
|
|
||||||
|
>>> wrapped_env
|
||||||
|
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
|
||||||
|
>>> wrapped_env.unwrapped
|
||||||
|
<gymnasium.envs.box2d.bipedal_walker.BipedalWalker object at 0x7f87d70712d0>
|
||||||
|
|
||||||
|
There are three common things you might want a wrapper to do:
|
||||||
|
|
||||||
|
- Transform actions before applying them to the base environment
|
||||||
|
- Transform observations that are returned by the base environment
|
||||||
|
- Transform rewards that are returned by the base environment
|
||||||
|
|
||||||
|
Such wrappers can be easily implemented by inheriting from :class:`gymnasium.ActionWrapper`,
|
||||||
|
:class:`gymnasium.ObservationWrapper`, or :class:`gymnasium.RewardWrapper` and implementing the respective transformation.
|
||||||
|
If you need a wrapper to do more complicated tasks, you can inherit from the :class:`gymnasium.Wrapper` class directly.
|
||||||
|
|
||||||
|
If you'd like to implement your own custom wrapper, check out `the corresponding tutorial <../../tutorials/implementing_custom_wrappers>`_.
|
||||||
|
"""
|
||||||
from gymnasium.wrappers.atari_preprocessing import AtariPreprocessing
|
from gymnasium.wrappers.atari_preprocessing import AtariPreprocessing
|
||||||
from gymnasium.wrappers.autoreset import AutoResetWrapper
|
from gymnasium.wrappers.autoreset import AutoResetWrapper
|
||||||
from gymnasium.wrappers.clip_action import ClipAction
|
from gymnasium.wrappers.clip_action import ClipAction
|
||||||
|
Reference in New Issue
Block a user