{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Implementing Custom Wrappers\n\nIn this tutorial we will describe how to implement your own custom wrappers.\nWrappers are a great way to add functionality to your environments in a modular way.\nThis will save you a lot of boilerplate code.\n\nWe will show how to create a wrapper by\n\n- Inheriting from :class:`gymnasium.ObservationWrapper`\n- Inheriting from :class:`gymnasium.ActionWrapper`\n- Inheriting from :class:`gymnasium.RewardWrapper`\n- Inheriting from :class:`gymnasium.Wrapper`\n\nBefore following this tutorial, make sure to check out the docs of the :mod:`gymnasium.wrappers` module.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inheriting from :class:`gymnasium.ObservationWrapper`\nObservation wrappers are useful if you want to apply some function to the observations that are returned\nby an environment. If you implement an observation wrapper, you only need to define this transformation\nby implementing the :meth:`gymnasium.ObservationWrapper.observation` method. Moreover, you should remember to\nupdate the observation space, if the transformation changes the shape of observations (e.g. by transforming\ndictionaries into numpy arrays, as in the following example).\n\nImagine you have a 2D navigation task where the environment returns dictionaries as observations with\nkeys ``\"agent_position\"`` and ``\"target_position\"``. A common thing to do might be to throw away some degrees of\nfreedom and only consider the position of the target relative to the agent, i.e.\n``observation[\"target_position\"] - observation[\"agent_position\"]``. For this, you could implement an\nobservation wrapper like this:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\nfrom gym import ActionWrapper, ObservationWrapper, RewardWrapper, Wrapper\n\nimport gymnasium as gym\nfrom gymnasium.spaces import Box, Discrete\n\n\nclass RelativePosition(ObservationWrapper):\n def __init__(self, env):\n super().__init__(env)\n self.observation_space = Box(shape=(2,), low=-np.inf, high=np.inf)\n\n def observation(self, obs):\n return obs[\"target\"] - obs[\"agent\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inheriting from :class:`gymnasium.ActionWrapper`\nAction wrappers can be used to apply a transformation to actions before applying them to the environment.\nIf you implement an action wrapper, you need to define that transformation by implementing\n:meth:`gymnasium.ActionWrapper.action`. Moreover, you should specify the domain of that transformation\nby updating the action space of the wrapper.\n\nLet\u2019s say you have an environment with action space of type :class:`gymnasium.spaces.Box`, but you would only like\nto use a finite subset of actions. Then, you might want to implement the following wrapper:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "class DiscreteActions(ActionWrapper):\n def __init__(self, env, disc_to_cont):\n super().__init__(env)\n self.disc_to_cont = disc_to_cont\n self.action_space = Discrete(len(disc_to_cont))\n\n def action(self, act):\n return self.disc_to_cont[act]\n\n\nif __name__ == \"__main__\":\n env = gym.make(\"LunarLanderContinuous-v2\")\n wrapped_env = DiscreteActions(\n env, [np.array([1, 0]), np.array([-1, 0]), np.array([0, 1]), np.array([0, -1])]\n )\n print(wrapped_env.action_space) # Discrete(4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inheriting from :class:`gymnasium.RewardWrapper`\nReward wrappers are used to transform the reward that is returned by an environment.\nAs for the previous wrappers, you need to specify that transformation by implementing the\n:meth:`gymnasium.RewardWrapper.reward` method. Also, you might want to update the reward range of the wrapper.\n\nLet us look at an example: Sometimes (especially when we do not have control over the reward\nbecause it is intrinsic), we want to clip the reward to a range to gain some numerical stability.\nTo do that, we could, for instance, implement the following wrapper:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from typing import SupportsFloat\n\n\nclass ClipReward(RewardWrapper):\n def __init__(self, env, min_reward, max_reward):\n super().__init__(env)\n self.min_reward = min_reward\n self.max_reward = max_reward\n self.reward_range = (min_reward, max_reward)\n\n def reward(self, r: SupportsFloat) -> SupportsFloat:\n return np.clip(r, self.min_reward, self.max_reward)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inheriting from :class:`gymnasium.Wrapper`\nSometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the\nreward based on data in ``info`` or change the rendering behavior).\nSuch wrappers can be implemented by inheriting from :class:`gymnasium.Wrapper`.\n\n- You can set a new action or observation space by defining ``self.action_space`` or ``self.observation_space`` in ``__init__``, respectively\n- You can set new metadata and reward range by defining ``self.metadata`` and ``self.reward_range`` in ``__init__``, respectively\n- You can override :meth:`gymnasium.Wrapper.step`, :meth:`gymnasium.Wrapper.render`, :meth:`gymnasium.Wrapper.close` etc.\nIf you do this, you can access the environment that was passed\nto your wrapper (which *still* might be wrapped in some other wrapper) by accessing the attribute :attr:`env`.\n\nLet's also take a look at an example for this case. Most MuJoCo environments return a reward that consists\nof different terms: For instance, there might be a term that rewards the agent for completing the task and one term that\npenalizes large actions (i.e. energy usage). Usually, you can pass weight parameters for those terms during\ninitialization of the environment. However, *Reacher* does not allow you to do this! Nevertheless, all individual terms\nof the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "class ReacherRewardWrapper(Wrapper):\n def __init__(self, env, reward_dist_weight, reward_ctrl_weight):\n super().__init__(env)\n self.reward_dist_weight = reward_dist_weight\n self.reward_ctrl_weight = reward_ctrl_weight\n\n def step(self, action):\n obs, _, terminated, truncated, info = self.env.step(action)\n reward = (\n self.reward_dist_weight * info[\"reward_dist\"]\n + self.reward_ctrl_weight * info[\"reward_ctrl\"]\n )\n return obs, reward, terminated, truncated, info" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 0 }