Updated Wrapper docs (#173)

This commit is contained in:
Markus Krimmel
2022-12-03 13:46:11 +01:00
committed by GitHub
parent 4b7f941db3
commit 851b2f4be6
8 changed files with 218 additions and 249 deletions

View File

@@ -12,6 +12,11 @@ wrappers/observation_wrappers
wrappers/reward_wrappers
```
```{eval-rst}
.. automodule:: gymnasium.wrappers
```
## gymnasium.Wrapper
```{eval-rst}
@@ -35,6 +40,13 @@ wrappers/reward_wrappers
.. autoproperty:: gymnasium.Wrapper.spec
.. autoproperty:: gymnasium.Wrapper.metadata
.. autoproperty:: gymnasium.Wrapper.np_random
.. attribute:: gymnasium.Wrapper.env
The environment (one level underneath) this wrapper.
This may itself be a wrapped environment.
To obtain the environment underneath all layers of wrappers, use :attr:`gymnasium.Wrapper.unwrapped`.
.. autoproperty:: gymnasium.Wrapper.unwrapped
```
@@ -124,43 +136,4 @@ wrapper in the page on the wrapper type
* - :class:`VectorListInfo`
- Misc Wrapper
- This wrapper will convert the info of a vectorized environment from the `dict` format to a `list` of dictionaries where the i-th dictionary contains info of the i-th environment.
```
## Implementing a custom wrapper
Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
reward based on data in `info` or change the rendering behavior).
Such wrappers can be implemented by inheriting from Misc Wrapper.
- You can set a new action or observation space by defining `self.action_space` or `self.observation_space` in `__init__`, respectively
- You can set new metadata and reward range by defining `self.metadata` and `self.reward_range` in `__init__`, respectively
- You can override `step`, `render`, `close` etc. If you do this, you can access the environment that was passed
to your wrapper (which *still* might be wrapped in some other wrapper) by accessing the attribute `self.env`.
Let's also take a look at an example for this case. Most MuJoCo environments return a reward that consists
of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that
penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during
initialization of the environment. However, *Reacher* does not allow you to do this! Nevertheless, all individual terms
of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms:
```python
import gymnasium as gym
class ReacherRewardWrapper(gym.Wrapper):
def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
super().__init__(env)
self.reward_dist_weight = reward_dist_weight
self.reward_ctrl_weight = reward_ctrl_weight
def step(self, action):
obs, _, terminated, truncated, info = self.env.step(action)
reward = (
self.reward_dist_weight * info["reward_dist"]
+ self.reward_ctrl_weight * info["reward_ctrl"]
)
return obs, reward, terminated, truncated, info
```
```{note}
It is *not* sufficient to use a `RewardWrapper` in this case!
```

View File

@@ -1,22 +1,16 @@
# Action Wrappers
## Action Wrapper
## Base Class
```{eval-rst}
.. autoclass:: gymnasium.ActionWrapper
.. autofunction:: gymnasium.ActionWrapper.action
.. automethod:: gymnasium.ActionWrapper.action
```
## Clip Action
## Available Action Wrappers
```{eval-rst}
.. autoclass:: gymnasium.wrappers.ClipAction
```
## Rescale Action
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RescaleAction
```

View File

@@ -1,68 +1,15 @@
# Misc Wrappers
## Atari Preprocessing
```{eval-rst}
.. autoclass:: gymnasium.wrappers.AtariPreprocessing
```
## Autoreset
```{eval-rst}
.. autoclass:: gymnasium.wrappers.AutoResetWrapper
```
## Compatibility
```{eval-rst}
.. autoclass:: gymnasium.wrappers.EnvCompatibility
.. autoclass:: gymnasium.wrappers.StepAPICompatibility
```
## Passive Environment Checker
```{eval-rst}
.. autoclass:: gymnasium.wrappers.PassiveEnvChecker
```
## Human Rendering
```{eval-rst}
.. autoclass:: gymnasium.wrappers.HumanRendering
```
## Order Enforcing
```{eval-rst}
.. autoclass:: gymnasium.wrappers.OrderEnforcing
```
## Record Episode Statistics
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RecordEpisodeStatistics
```
## Record Video
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RecordVideo
```
## Render Collection
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RenderCollection
```
## Time Limit
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TimeLimit
```
## Vector List Info
```{eval-rst}
.. autoclass:: gymnasium.wrappers.VectorListInfo
```

View File

@@ -1,62 +1,23 @@
# Observation Wrappers
## Observation Wrapper
## Base Class
```{eval-rst}
.. autoclass:: gymnasium.ObservationWrapper
.. autofunction:: gymnasium.ObservationWrapper.observation
.. automethod:: gymnasium.ObservationWrapper.observation
```
## Transform Observation
## Available Observation Wrappers
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TransformObservation
```
## Filter Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.FilterObservation
```
## Flatten Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.FlattenObservation
```
## Framestack Observations
```{eval-rst}
.. autoclass:: gymnasium.wrappers.FrameStack
```
## Gray Scale Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.GrayScaleObservation
```
## Normalize Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.NormalizeObservation
```
## Pixel Observation Wrapper
```{eval-rst}
.. autoclass:: gymnasium.wrappers.PixelObservationWrapper
```
## Resize Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.ResizeObservation
```
## Time Aware Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TimeAwareObservation
```

View File

@@ -1,22 +1,17 @@
# Reward Wrappers
## Reward Wrapper
## Base Class
```{eval-rst}
.. autoclass:: gymnasium.RewardWrapper
.. autofunction:: gymnasium.RewardWrapper.reward
.. automethod:: gymnasium.RewardWrapper.reward
```
## Transform Reward
## Available Reward Wrappers
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TransformReward
```
## Normalize Reward
```{eval-rst}
.. autoclass:: gymnasium.wrappers.NormalizeReward
```