<h1>Wrappers<aclass="headerlink"href="#wrappers"title="Permalink to this heading">#</a></h1>
<p>Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly.
Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can
also be chained to combine their effects. Most environments that are generated via <codeclass="docutils literal notranslate"><spanclass="pre">gymnasium.make</span></code> will already be wrapped by default.</p>
<p>In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along
with (possibly optional) parameters to the wrapper’s constructor:</p>
<p>If you want to get to the environment underneath <strong>all</strong> of the layers of wrappers,
you can use the <codeclass="docutils literal notranslate"><spanclass="pre">.unwrapped</span></code> attribute.
If the environment is already a bare environment, the <codeclass="docutils literal notranslate"><spanclass="pre">.unwrapped</span></code> attribute will just return itself.</p>
<spanclass="go"><gymnasium.envs.box2d.bipedal_walker.BipedalWalker object at 0x7f87d70712d0></span>
</pre></div>
</div>
<p>There are three common things you might want a wrapper to do:</p>
<ulclass="simple">
<li><p>Transform actions before applying them to the base environment</p></li>
<li><p>Transform observations that are returned by the base environment</p></li>
<li><p>Transform rewards that are returned by the base environment</p></li>
</ul>
<p>Such wrappers can be easily implemented by inheriting from <codeclass="docutils literal notranslate"><spanclass="pre">ActionWrapper</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">ObservationWrapper</span></code>, or <codeclass="docutils literal notranslate"><spanclass="pre">RewardWrapper</span></code> and implementing the
respective transformation. If you need a wrapper to do more complicated tasks, you can inherit from the <codeclass="docutils literal notranslate"><spanclass="pre">Wrapper</span></code> class directly.
The code that is presented in the following sections can also be found in
<h2>ActionWrapper<aclass="headerlink"href="#actionwrapper"title="Permalink to this heading">#</a></h2>
<p>If you would like to apply a function to the action before passing it to the base environment,
you can simply inherit from <codeclass="docutils literal notranslate"><spanclass="pre">ActionWrapper</span></code> and overwrite the method <codeclass="docutils literal notranslate"><spanclass="pre">action</span></code> to implement that transformation.
The transformation defined in that method must take values in the base environment’s action space.
However, its domain might differ from the original action space. In that case, you need to specify the new
action space of the wrapper by setting <codeclass="docutils literal notranslate"><spanclass="pre">self.action_space</span></code> in the <codeclass="docutils literal notranslate"><spanclass="pre">__init__</span></code> method of your wrapper.</p>
<p>Let’s say you have an environment with action space of type <codeclass="docutils literal notranslate"><spanclass="pre">Box</span></code>, but you would
only like to use a finite subset of actions. Then, you might want to implement the following wrapper</p>
<p>Among others, Gymnasium provides the action wrappers <codeclass="docutils literal notranslate"><spanclass="pre">ClipAction</span></code> and <codeclass="docutils literal notranslate"><spanclass="pre">RescaleAction</span></code>.</p>
</section>
<sectionid="observationwrapper">
<h2>ObservationWrapper<aclass="headerlink"href="#observationwrapper"title="Permalink to this heading">#</a></h2>
<p>If you would like to apply a function to the observation that is returned by the base environment before passing
it to learning code, you can simply inherit from <codeclass="docutils literal notranslate"><spanclass="pre">ObservationWrapper</span></code> and overwrite the method <codeclass="docutils literal notranslate"><spanclass="pre">observation</span></code> to
implement that transformation. The transformation defined in that method must be defined on the base environment’s
observation space. However, it may take values in a different space. In that case, you need to specify the new
observation space of the wrapper by setting <codeclass="docutils literal notranslate"><spanclass="pre">self.observation_space</span></code> in the <codeclass="docutils literal notranslate"><spanclass="pre">__init__</span></code> method of your wrapper.</p>
<p>For example, you might have a 2D navigation task where the environment returns dictionaries as observations with keys <codeclass="docutils literal notranslate"><spanclass="pre">"agent_position"</span></code>
and <codeclass="docutils literal notranslate"><spanclass="pre">"target_position"</span></code>. A common thing to do might be to throw away some degrees of freedom and only consider
the position of the target relative to the agent, i.e. <codeclass="docutils literal notranslate"><spanclass="pre">observation["target_position"]</span><spanclass="pre">-</span><spanclass="pre">observation["agent_position"]</span></code>.
For this, you could implement an observation wrapper like this:</p>
<p>Among others, Gymnasium provides the observation wrapper <codeclass="docutils literal notranslate"><spanclass="pre">TimeAwareObservation</span></code>, which adds information about the index of the timestep
to the observation.</p>
</section>
<sectionid="rewardwrapper">
<h2>RewardWrapper<aclass="headerlink"href="#rewardwrapper"title="Permalink to this heading">#</a></h2>
<p>If you would like to apply a function to the reward that is returned by the base environment before passing
it to learning code, you can simply inherit from <codeclass="docutils literal notranslate"><spanclass="pre">RewardWrapper</span></code> and overwrite the method <codeclass="docutils literal notranslate"><spanclass="pre">reward</span></code> to
implement that transformation. This transformation might change the reward range; to specify the reward range of
your wrapper, you can simply define <codeclass="docutils literal notranslate"><spanclass="pre">self.reward_range</span></code> in <codeclass="docutils literal notranslate"><spanclass="pre">__init__</span></code>.</p>
<p>Let us look at an example: Sometimes (especially when we do not have control over the reward because it is intrinsic), we want to clip the reward
to a range to gain some numerical stability. To do that, we could, for instance, implement the following wrapper:</p>
<h2>AutoResetWrapper<aclass="headerlink"href="#autoresetwrapper"title="Permalink to this heading">#</a></h2>
<p>Some users may want a wrapper which will automatically reset its wrapped environment when its wrapped environment reaches the done state. An advantage of this environment is that it will never produce undefined behavior as standard gymnasium environments do when stepping beyond the done state.</p>
<p><codeclass="docutils literal notranslate"><spanclass="pre">new_obs</span></code> is the first observation after calling <codeclass="docutils literal notranslate"><spanclass="pre">self.env.reset()</span></code>,</p>
<p><codeclass="docutils literal notranslate"><spanclass="pre">terminal_reward</span></code> is the reward after calling <codeclass="docutils literal notranslate"><spanclass="pre">self.env.step()</span></code>,
prior to calling <codeclass="docutils literal notranslate"><spanclass="pre">self.env.reset()</span></code></p>
<p><codeclass="docutils literal notranslate"><spanclass="pre">info</span></code> is a dict containing all the keys from the info dict returned by
the call to <codeclass="docutils literal notranslate"><spanclass="pre">self.env.reset()</span></code>, with additional keys <codeclass="docutils literal notranslate"><spanclass="pre">terminal_observation</span></code>
containing the observation returned by the last call to <codeclass="docutils literal notranslate"><spanclass="pre">self.env.step()</span></code>
and <codeclass="docutils literal notranslate"><spanclass="pre">terminal_info</span></code> containing the info dict returned by the last call
to <codeclass="docutils literal notranslate"><spanclass="pre">self.env.step()</span></code>.</p>
<p>If <codeclass="docutils literal notranslate"><spanclass="pre">done</span></code> is not true when <codeclass="docutils literal notranslate"><spanclass="pre">self.env.step()</span></code> is called, <codeclass="docutils literal notranslate"><spanclass="pre">self.step()</span></code> returns</p>
<p>The AutoResetWrapper is not applied by default when calling <codeclass="docutils literal notranslate"><spanclass="pre">gymnasium.make()</span></code>, but can be applied by setting the optional <codeclass="docutils literal notranslate"><spanclass="pre">autoreset</span></code> argument to <codeclass="docutils literal notranslate"><spanclass="pre">True</span></code>:</p>
<p>When using the AutoResetWrapper to collect rollouts, note
that the when <codeclass="docutils literal notranslate"><spanclass="pre">self.env.step()</span></code> returns <codeclass="docutils literal notranslate"><spanclass="pre">done</span></code>, a
new observation from after calling <codeclass="docutils literal notranslate"><spanclass="pre">self.env.reset()</span></code> is returned
by <codeclass="docutils literal notranslate"><spanclass="pre">self.step()</span></code> alongside the terminal reward and done state from the
previous episode . If you need the terminal state from the previous
episode, you need to retrieve it via the the <codeclass="docutils literal notranslate"><spanclass="pre">terminal_observation</span></code> key
in the info dict. Make sure you know what you’re doing if you
use this wrapper!</p>
</div>
</section>
<sectionid="general-wrappers">
<h2>General Wrappers<aclass="headerlink"href="#general-wrappers"title="Permalink to this heading">#</a></h2>
<p>Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
reward based on data in <codeclass="docutils literal notranslate"><spanclass="pre">info</span></code> or change the rendering behavior).
Such wrappers can be implemented by inheriting from <codeclass="docutils literal notranslate"><spanclass="pre">Wrapper</span></code>.</p>
<ulclass="simple">
<li><p>You can set a new action or observation space by defining <codeclass="docutils literal notranslate"><spanclass="pre">self.action_space</span></code> or <codeclass="docutils literal notranslate"><spanclass="pre">self.observation_space</span></code> in <codeclass="docutils literal notranslate"><spanclass="pre">__init__</span></code>, respectively</p></li>
<li><p>You can set new metadata and reward range by defining <codeclass="docutils literal notranslate"><spanclass="pre">self.metadata</span></code> and <codeclass="docutils literal notranslate"><spanclass="pre">self.reward_range</span></code> in <codeclass="docutils literal notranslate"><spanclass="pre">__init__</span></code>, respectively</p></li>
<li><p>You can override <codeclass="docutils literal notranslate"><spanclass="pre">step</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">render</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">close</span></code> etc. If you do this, you can access the environment that was passed
to your wrapper (which <em>still</em> might be wrapped in some other wrapper) by accessing the attribute <codeclass="docutils literal notranslate"><spanclass="pre">self.env</span></code>.</p></li>
</ul>
<p>Let’s also take a look at an example for this case. Most MuJoCo environments return a reward that consists
of different terms: For instance, there might be a term that rewards the agent for completing the task and one term that
penalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during
initialization of the environment. However, <em>Reacher</em> does not allow you to do this! Nevertheless, all individual terms
of the reward are returned in <codeclass="docutils literal notranslate"><spanclass="pre">info</span></code>, so let us build a wrapper for Reacher that allows us to weight those terms:</p>
<td><p>Implements the best practices from Machado et al. (2018), “Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents” but will be deprecated soon.</p></td>
<td><p>The wrapped environment will automatically reset when the done state is reached. Make sure you read the documentation before using this wrapper!</p></td>
<td><p>Clip the continuous action to the valid bound specified by the environment’s <codeclass="docutils literal notranslate"><spanclass="pre">action_space</span></code></p></td>
<td><p>If you have an environment that returns dictionaries as observations, but you would like to only keep a subset of the entries, you can use this wrapper. <codeclass="docutils literal notranslate"><spanclass="pre">filter_keys</span></code> should be an iterable that contains the keys that are kept in the new observation. If it is <codeclass="docutils literal notranslate"><spanclass="pre">None</span></code>, all keys will be kept and the wrapper has no effect.</p></td>
<td><p>Observation wrapper that stacks the observations in a rolling manner. For example, if the number of stacks is 4, then the returned observation contains the most recent 4 observations. Observations will be objects of type <codeclass="docutils literal notranslate"><spanclass="pre">LazyFrames</span></code>. This object can be cast to a numpy array via <codeclass="docutils literal notranslate"><spanclass="pre">np.asarray(obs)</span></code>. You can also access single frames or slices via the usual <codeclass="docutils literal notranslate"><spanclass="pre">__getitem__</span></code> syntax. If <codeclass="docutils literal notranslate"><spanclass="pre">lz4_compress</span></code> is set to true, the <codeclass="docutils literal notranslate"><spanclass="pre">LazyFrames</span></code> object will compress the frames internally (losslessly). The first observation (i.e. the one returned by <codeclass="docutils literal notranslate"><spanclass="pre">reset</span></code>) will consist of <codeclass="docutils literal notranslate"><spanclass="pre">num_stack</span></code> repitions of the first frame.</p></td>
<td><p>Convert the image observation from RGB to gray scale. By default, the resulting observation will be 2-dimensional. If <codeclass="docutils literal notranslate"><spanclass="pre">keep_dim</span></code> is set to true, a singleton dimension will be added (i.e. the observations are of shape AxBx1).</p></td>
<td><p>This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance. <codeclass="docutils literal notranslate"><spanclass="pre">epsilon</span></code> is a stability parameter and <codeclass="docutils literal notranslate"><spanclass="pre">gamma</span></code> is the discount factor that is used in the exponential moving average. The exponential moving average will have variance <codeclass="docutils literal notranslate"><spanclass="pre">(1</span><spanclass="pre">-</span><spanclass="pre">gamma)**2</span></code>. The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently.</p></td>
<td><p>This wrapper will normalize observations s.t. each coordinate is centered with unit variance. The normalization depends on past trajectories and observations will not be normalized correctly if the wrapper was newly instantiated or the policy was changed recently. <codeclass="docutils literal notranslate"><spanclass="pre">epsilon</span></code> is a stability parameter that is used when scaling the observations.</p></td>
<td><p>This will produce an error if <codeclass="docutils literal notranslate"><spanclass="pre">step</span></code> is called before an initial <codeclass="docutils literal notranslate"><spanclass="pre">reset</span></code></p></td>
<td><p>Augment observations by pixel values obtained via <codeclass="docutils literal notranslate"><spanclass="pre">render</span></code>. You can specify whether the original observations should be discarded entirely or be augmented by setting <codeclass="docutils literal notranslate"><spanclass="pre">pixels_only</span></code>. Also, you can provide keyword arguments for <codeclass="docutils literal notranslate"><spanclass="pre">render</span></code>.</p></td>
<td><p>This will keep track of cumulative rewards and episode lengths. At the end of an episode, the statistics of the episode will be added to <codeclass="docutils literal notranslate"><spanclass="pre">info</span></code>. Moreover, the rewards and episode lengths are stored in buffers that can be accessed via <codeclass="docutils literal notranslate"><spanclass="pre">wrapped_env.return_queue</span></code> and <codeclass="docutils literal notranslate"><spanclass="pre">wrapped_env.length_queue</span></code> respectively. The size of these buffers can be set via <codeclass="docutils literal notranslate"><spanclass="pre">deque_size</span></code>.</p></td>
<td><p>This wrapper will record videos of rollouts. The results will be saved in the folder specified via <codeclass="docutils literal notranslate"><spanclass="pre">video_folder</span></code>. You can specify a prefix for the filenames via <codeclass="docutils literal notranslate"><spanclass="pre">name_prefix</span></code>. Usually, you only want to record the environment intermittently, say every hundreth episode. To allow this, you can pass <codeclass="docutils literal notranslate"><spanclass="pre">episode_trigger</span></code> or <codeclass="docutils literal notranslate"><spanclass="pre">step_trigger</span></code>. At most one of these should be passed. These functions will accept an episode index or step index, respectively. They should return a boolean that indicates whether a recording should be started at this point. If neither <codeclass="docutils literal notranslate"><spanclass="pre">episode_trigger</span></code>, nor <codeclass="docutils literal notranslate"><spanclass="pre">step_trigger</span></code> is passed, a default <codeclass="docutils literal notranslate"><spanclass="pre">episode_trigger</span></code> will be used. By default, the recording will be stopped once a done signal has been emitted by the environment. However, you can also create recordings of fixed length (possibly spanning several episodes) by passing a strictly positive value for <codeclass="docutils literal notranslate"><spanclass="pre">video_length</span></code>.</p></td>
<td><p>Rescales the continuous action space of the environment to a range [<codeclass="docutils literal notranslate"><spanclass="pre">min_action</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">max_action</span></code>], where <codeclass="docutils literal notranslate"><spanclass="pre">min_action</span></code> and <codeclass="docutils literal notranslate"><spanclass="pre">max_action</span></code> are numpy arrays or floats.</p></td>
<td><p>This wrapper works on environments with image observations (or more generally observations of shape AxBxC) and resizes the observation to the shape given by the tuple <codeclass="docutils literal notranslate"><spanclass="pre">shape</span></code>. The argument <codeclass="docutils literal notranslate"><spanclass="pre">shape</span></code> may also be an integer. In that case, the observation is scaled to a square of sidelength <codeclass="docutils literal notranslate"><spanclass="pre">shape</span></code></p></td>
<td><p>Augment the observation with current time step in the trajectory (by appending it to the observation). This can be useful to ensure that things stay Markov. Currently it only works with one-dimensional observation spaces.</p></td>
<td><p>Probably the most useful wrapper in Gymnasium. This wrapper will emit a done signal if the specified number of steps is exceeded in an episode. In order to be able to distinguish termination and truncation, you need to check <codeclass="docutils literal notranslate"><spanclass="pre">info</span></code>. If it does not contain the key <codeclass="docutils literal notranslate"><spanclass="pre">"TimeLimit.truncated"</span></code>, the environment did not reach the timelimit. Otherwise, <codeclass="docutils literal notranslate"><spanclass="pre">info["TimeLimit.truncated"]</span></code> will be true if the episode was terminated because of the time limit.</p></td>
<td><p>This wrapper will convert the info of a vectorized environment from the <codeclass="docutils literal notranslate"><spanclass="pre">dict</span></code> format to a <codeclass="docutils literal notranslate"><spanclass="pre">list</span></code> of dictionaries where the <em>i-th</em> dictionary contains info of the <em>i-th</em> environment. If using other wrappers that perform operation on info like <codeclass="docutils literal notranslate"><spanclass="pre">RecordEpisodeStatistics</span></code>, this need to be the outermost wrapper.</p></td>