Gymnasium/docs/tutorials/gymnasium_basics/environment_creation.py

# fmt: off
"""
Make your own custom environment
================================

This tutorial shows how to create new environment and links to relevant useful wrappers, utilities and tests included in Gymnasium.

Setup
------

Recommended solution
~~~~~~~~~~~~~~~~~~~~

1. Install ``pipx`` following the `pipx documentation <https://pypa.github.io/pipx/installation/>`_.
2. Then install Copier:

.. code:: console

    pipx install copier

Alternative solutions
~~~~~~~~~~~~~~~~~~~~~

Install Copier with Pip or Conda:

.. code:: console

    pip install copier

or

.. code:: console

    conda install -c conda-forge copier


Generate your environment
------------------------------

You can check that ``Copier`` has been correctly installed by running the following command, which should output a version number:

.. code:: console

    copier --version

Then you can just run the following command and replace the string ``path/to/directory`` by the path to the directory where you want to create your new project.

.. code:: console

    copier copy https://github.com/Farama-Foundation/gymnasium-env-template.git "path/to/directory"

Answer the questions, and when it's finished you should get a project structure like the following:

.. code:: sh

    .
    ├── gymnasium_env
    │        ├── envs
    │        │       ├── grid_world.py
    │        │       └── __init__.py
    │        ├── __init__.py
    │        └── wrappers
    │            ├── clip_reward.py
    │            ├── discrete_actions.py
    │            ├── __init__.py
    │            ├── reacher_weighted_reward.py
    │            └── relative_position.py
    ├── LICENSE
    ├── pyproject.toml
    └── README.md

Subclassing gymnasium.Env
-------------------------

Before learning how to create your own environment you should check out
`the documentation of Gymnasium’s API </api/env>`__.

To illustrate the process of subclassing ``gymnasium.Env``, we will
implement a very simplistic game, called ``GridWorldEnv``. We will write
the code for our custom environment in
``gymnasium_env/envs/grid_world.py``. The environment
consists of a 2-dimensional square grid of fixed size (specified via the
``size`` parameter during construction). The agent can move vertically
or horizontally between grid cells in each timestep. The goal of the
agent is to navigate to a target on the grid that has been placed
randomly at the beginning of the episode.

-  Observations provide the location of the target and agent.
-  There are 4 actions in our environment, corresponding to the
   movements “right”, “up”, “left”, and “down”.
-  A done signal is issued as soon as the agent has navigated to the
   grid cell where the target is located.
-  Rewards are binary and sparse, meaning that the immediate reward is
   always zero, unless the agent has reached the target, then it is 1.

An episode in this environment (with ``size=5``) might look like this:

 .. image:: /_static/videos/tutorials/environment-creation-example-episode.gif
    :width: 400
    :alt: Example episode of the custom environment

where the blue dot is the agent and the red square represents the
target.

Let us look at the source code of ``GridWorldEnv`` piece by piece:
"""

# %%
# Declaration and Initialization
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
# Our custom environment will inherit from the abstract class
# ``gymnasium.Env``. You shouldn’t forget to add the ``metadata``
# attribute to your class. There, you should specify the render-modes that
# are supported by your environment (e.g., ``"human"``, ``"rgb_array"``,
# ``"ansi"``) and the framerate at which your environment should be
# rendered. Every environment should support ``None`` as render-mode; you
# don’t need to add it in the metadata. In ``GridWorldEnv``, we will
# support the modes “rgb_array” and “human” and render at 4 FPS.
#
# The ``__init__`` method of our environment will accept the integer
# ``size``, that determines the size of the square grid. We will set up
# some variables for rendering and define ``self.observation_space`` and
# ``self.action_space``. In our case, observations should provide
# information about the location of the agent and target on the
# 2-dimensional grid. We will choose to represent observations in the form
# of dictionaries with keys ``"agent"`` and ``"target"``. An observation
# may look like ``{"agent": array([1, 0]), "target": array([0, 3])}``.
# Since we have 4 actions in our environment (“right”, “up”, “left”,
# “down”), we will use ``Discrete(4)`` as an action space. Here is the
# declaration of ``GridWorldEnv`` and the implementation of ``__init__``:


# gymnasium_env/envs/grid_world.py
from enum import Enum

import numpy as np
import pygame

import gymnasium as gym
from gymnasium import spaces


class Actions(Enum):
    RIGHT = 0
    UP = 1
    LEFT = 2
    DOWN = 3


class GridWorldEnv(gym.Env):
    metadata = {"render_modes": ["human", "rgb_array"], "render_fps": 4}

    def __init__(self, render_mode=None, size=5):
        self.size = size  # The size of the square grid
        self.window_size = 512  # The size of the PyGame window

        # Observations are dictionaries with the agent's and the target's location.
        # Each location is encoded as an element of {0, ..., `size`}^2, i.e. MultiDiscrete([size, size]).
        self.observation_space = spaces.Dict(
            {
                "agent": spaces.Box(0, size - 1, shape=(2,), dtype=int),
                "target": spaces.Box(0, size - 1, shape=(2,), dtype=int),
            }
        )
        self._agent_location = np.array([-1, -1], dtype=int)
        self._target_location = np.array([-1, -1], dtype=int)

        # We have 4 actions, corresponding to "right", "up", "left", "down"
        self.action_space = spaces.Discrete(4)

        """
        The following dictionary maps abstract actions from `self.action_space` to
        the direction we will walk in if that action is taken.
        i.e. 0 corresponds to "right", 1 to "up" etc.
        """
        self._action_to_direction = {
            Actions.RIGHT.value: np.array([1, 0]),
            Actions.UP.value: np.array([0, 1]),
            Actions.LEFT.value: np.array([-1, 0]),
            Actions.DOWN.value: np.array([0, -1]),
        }

        assert render_mode is None or render_mode in self.metadata["render_modes"]
        self.render_mode = render_mode

        """
        If human-rendering is used, `self.window` will be a reference
        to the window that we draw to. `self.clock` will be a clock that is used
        to ensure that the environment is rendered at the correct framerate in
        human-mode. They will remain `None` until human-mode is used for the
        first time.
        """
        self.window = None
        self.clock = None

# %%
# Constructing Observations From Environment States
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#
# Since we will need to compute observations both in ``reset`` and
# ``step``, it is often convenient to have a (private) method ``_get_obs``
# that translates the environment’s state into an observation. However,
# this is not mandatory and you may as well compute observations in
# ``reset`` and ``step`` separately:

    def _get_obs(self):
        return {"agent": self._agent_location, "target": self._target_location}

# %%
# We can also implement a similar method for the auxiliary information
# that is returned by ``step`` and ``reset``. In our case, we would like
# to provide the manhattan distance between the agent and the target:

    def _get_info(self):
        return {
            "distance": np.linalg.norm(
                self._agent_location - self._target_location, ord=1
            )
        }

# %%
# Oftentimes, info will also contain some data that is only available
# inside the ``step`` method (e.g., individual reward terms). In that case,
# we would have to update the dictionary that is returned by ``_get_info``
# in ``step``.

# %%
# Reset
# ~~~~~
#
# The ``reset`` method will be called to initiate a new episode. You may
# assume that the ``step`` method will not be called before ``reset`` has
# been called. Moreover, ``reset`` should be called whenever a done signal
# has been issued. Users may pass the ``seed`` keyword to ``reset`` to
# initialize any random number generator that is used by the environment
# to a deterministic state. It is recommended to use the random number
# generator ``self.np_random`` that is provided by the environment’s base
# class, ``gymnasium.Env``. If you only use this RNG, you do not need to
# worry much about seeding, *but you need to remember to call
# ``super().reset(seed=seed)``* to make sure that ``gymnasium.Env``
# correctly seeds the RNG. Once this is done, we can randomly set the
# state of our environment. In our case, we randomly choose the agent’s
# location and the random sample target positions, until it does not
# coincide with the agent’s position.
#
# The ``reset`` method should return a tuple of the initial observation
# and some auxiliary information. We can use the methods ``_get_obs`` and
# ``_get_info`` that we implemented earlier for that:

    def reset(self, seed=None, options=None):
        # We need the following line to seed self.np_random
        super().reset(seed=seed)

        # Choose the agent's location uniformly at random
        self._agent_location = self.np_random.integers(0, self.size, size=2, dtype=int)

        # We will sample the target's location randomly until it does not coincide with the agent's location
        self._target_location = self._agent_location
        while np.array_equal(self._target_location, self._agent_location):
            self._target_location = self.np_random.integers(
                0, self.size, size=2, dtype=int
            )

        observation = self._get_obs()
        info = self._get_info()

        if self.render_mode == "human":
            self._render_frame()

        return observation, info

# %%
# Step
# ~~~~
#
# The ``step`` method usually contains most of the logic of your
# environment. It accepts an ``action``, computes the state of the
# environment after applying that action and returns the 5-tuple
# ``(observation, reward, terminated, truncated, info)``. See
# :meth:`gymnasium.Env.step`. Once the new state of the environment has
# been computed, we can check whether it is a terminal state and we set
# ``done`` accordingly. Since we are using sparse binary rewards in
# ``GridWorldEnv``, computing ``reward`` is trivial once we know
# ``done``.To gather ``observation`` and ``info``, we can again make
# use of ``_get_obs`` and ``_get_info``:

    def step(self, action):
        # Map the action (element of {0,1,2,3}) to the direction we walk in
        direction = self._action_to_direction[action]
        # We use `np.clip` to make sure we don't leave the grid
        self._agent_location = np.clip(
            self._agent_location + direction, 0, self.size - 1
        )
        # An episode is done iff the agent has reached the target
        terminated = np.array_equal(self._agent_location, self._target_location)
        reward = 1 if terminated else 0  # Binary sparse rewards
        observation = self._get_obs()
        info = self._get_info()

        if self.render_mode == "human":
            self._render_frame()

        return observation, reward, terminated, False, info

# %%
# Rendering
# ~~~~~~~~~
#
# Here, we are using PyGame for rendering. A similar approach to rendering
# is used in many environments that are included with Gymnasium and you
# can use it as a skeleton for your own environments:

    def render(self):
        if self.render_mode == "rgb_array":
            return self._render_frame()

    def _render_frame(self):
        if self.window is None and self.render_mode == "human":
            pygame.init()
            pygame.display.init()
            self.window = pygame.display.set_mode(
                (self.window_size, self.window_size)
            )
        if self.clock is None and self.render_mode == "human":
            self.clock = pygame.time.Clock()

        canvas = pygame.Surface((self.window_size, self.window_size))
        canvas.fill((255, 255, 255))
        pix_square_size = (
            self.window_size / self.size
        )  # The size of a single grid square in pixels

        # First we draw the target
        pygame.draw.rect(
            canvas,
            (255, 0, 0),
            pygame.Rect(
                pix_square_size * self._target_location,
                (pix_square_size, pix_square_size),
            ),
        )
        # Now we draw the agent
        pygame.draw.circle(
            canvas,
            (0, 0, 255),
            (self._agent_location + 0.5) * pix_square_size,
            pix_square_size / 3,
        )

        # Finally, add some gridlines
        for x in range(self.size + 1):
            pygame.draw.line(
                canvas,
                0,
                (0, pix_square_size * x),
                (self.window_size, pix_square_size * x),
                width=3,
            )
            pygame.draw.line(
                canvas,
                0,
                (pix_square_size * x, 0),
                (pix_square_size * x, self.window_size),
                width=3,
            )

        if self.render_mode == "human":
            # The following line copies our drawings from `canvas` to the visible window
            self.window.blit(canvas, canvas.get_rect())
            pygame.event.pump()
            pygame.display.update()

            # We need to ensure that human-rendering occurs at the predefined framerate.
            # The following line will automatically add a delay to keep the framerate stable.
            self.clock.tick(self.metadata["render_fps"])
        else:  # rgb_array
            return np.transpose(
                np.array(pygame.surfarray.pixels3d(canvas)), axes=(1, 0, 2)
            )

# %%
# Close
# ~~~~~
#
# The ``close`` method should close any open resources that were used by
# the environment. In many cases, you don’t actually have to bother to
# implement this method. However, in our example ``render_mode`` may be
# ``"human"`` and we might need to close the window that has been opened:

    def close(self):
        if self.window is not None:
            pygame.display.quit()
            pygame.quit()


# %%
# In other environments ``close`` might also close files that were opened
# or release other resources. You shouldn’t interact with the environment
# after having called ``close``.

# %%
# Registering Envs
# ----------------
#
# In order for the custom environments to be detected by Gymnasium, they
# must be registered as follows. We will choose to put this code in
# ``gymnasium_env/__init__.py``.
#
# .. code:: python
#
#    from gymnasium.envs.registration import register
#
#    register(
#        id="gymnasium_env/GridWorld-v0",
#        entry_point="gymnasium_env.envs:GridWorldEnv",
#    )

# %%
# The environment ID consists of three components, two of which are
# optional: an optional namespace (here: ``gymnasium_env``), a mandatory
# name (here: ``GridWorld``) and an optional but recommended version
# (here: v0). It might have also been registered as ``GridWorld-v0`` (the
# recommended approach), ``GridWorld`` or ``gymnasium_env/GridWorld``, and
# the appropriate ID should then be used during environment creation.
#
# The keyword argument ``max_episode_steps=300`` will ensure that
# GridWorld environments that are instantiated via ``gymnasium.make`` will
# be wrapped in a ``TimeLimit`` wrapper (see `the wrapper
# documentation </api/wrappers>`__ for more information). A done signal
# will then be produced if the agent has reached the target *or* 300 steps
# have been executed in the current episode. To distinguish truncation and
# termination, you can check ``info["TimeLimit.truncated"]``.
#
# Apart from ``id`` and ``entrypoint``, you may pass the following
# additional keyword arguments to ``register``:
#
# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
# | Name                 | Type      | Default   | Description                                                                                                   |
# +======================+===========+===========+===============================================================================================================+
# | ``reward_threshold`` | ``float`` | ``None``  | The reward threshold before the task is  considered solved                                                    |
# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
# | ``nondeterministic`` | ``bool``  | ``False`` | Whether this environment is non-deterministic even after seeding                                              |
# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
# | ``max_episode_steps``| ``int``   | ``None``  | The maximum number of steps that an episode can consist of. If not ``None``, a ``TimeLimit`` wrapper is added |
# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
# | ``order_enforce``    | ``bool``  | ``True``  | Whether to wrap the environment in an  ``OrderEnforcing`` wrapper                                             |
# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
# | ``kwargs``           | ``dict``  | ``{}``    | The default kwargs to pass to the environment class                                                           |
# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
#
# Most of these keywords (except for ``max_episode_steps``,
# ``order_enforce`` and ``kwargs``) do not alter the behavior of
# environment instances but merely provide some extra information about
# your environment. After registration, our custom ``GridWorldEnv``
# environment can be created with
# ``env = gymnasium.make('gymnasium_env/GridWorld-v0')``.
#
# ``gymnasium_env/envs/__init__.py`` should have:
#
# .. code:: python
#
#    from gymnasium_env.envs.grid_world import GridWorldEnv
#
# If your environment is not registered, you may optionally pass a module
# to import, that would register your environment before creating it like
# this - ``env = gymnasium.make('module:Env-v0')``, where ``module``
# contains the registration code. For the GridWorld env, the registration
# code is run by importing ``gymnasium_env`` so if it were not possible to
# import gymnasium_env explicitly, you could register while making by
# ``env = gymnasium.make('gymnasium_env:gymnasium_env/GridWorld-v0')``. This
# is especially useful when you’re allowed to pass only the environment ID
# into a third-party codebase (eg. learning library). This lets you
# register your environment without needing to edit the library’s source
# code.

# %%
# Creating a Package
# ------------------
#
# The last step is to structure our code as a Python package. This
# involves configuring ``pyproject.toml``. A minimal example of how
# to do so is as follows:
#
# .. code:: toml
#
#    [build-system]
#    requires = ["hatchling"]
#    build-backend = "hatchling.build"
#
#    [project]
#    name = "gymnasium_env"
#    version = "0.0.1"
#    dependencies = [
#      "gymnasium",
#      "pygame==2.1.3",
#      "pre-commit",
#    ]
#
# Creating Environment Instances
# ------------------------------
#
# Now you can install your package locally with:
#
# .. code:: console
#
#    pip install -e .
#
# And you can create an instance of the environment via:
#
# .. code:: python
#
#    # run_gymnasium_env.py
#
#    import gymnasium
#    import gymnasium_env
#    env = gymnasium.make('gymnasium_env/GridWorld-v0')
#
# You can also pass keyword arguments of your environment’s constructor to
# ``gymnasium.make`` to customize the environment. In our case, we could
# do:
#
# .. code:: python
#
#    env = gymnasium.make('gymnasium_env/GridWorld-v0', size=10)
#
# Sometimes, you may find it more convenient to skip registration and call
# the environment’s constructor yourself. Some may find this approach more
# pythonic and environments that are instantiated like this are also
# perfectly fine (but remember to add wrappers as well!).
#
# Using Wrappers
# --------------
#
# Oftentimes, we want to use different variants of a custom environment,
# or we want to modify the behavior of an environment that is provided by
# Gymnasium or some other party. Wrappers allow us to do this without
# changing the environment implementation or adding any boilerplate code.
# Check out the `wrapper documentation </api/wrappers/>`__ for details on
# how to use wrappers and instructions for implementing your own. In our
# example, observations cannot be used directly in learning code because
# they are dictionaries. However, we don’t actually need to touch our
# environment implementation to fix this! We can simply add a wrapper on
# top of environment instances to flatten observations into a single
# array:
#
# .. code:: python
#
#    import gymnasium
#    import gymnasium_env
#    from gymnasium.wrappers import FlattenObservation
#
#    env = gymnasium.make('gymnasium_env/GridWorld-v0')
#    wrapped_env = FlattenObservation(env)
#    print(wrapped_env.reset())     # E.g.  [3 0 3 3], {}
#
# Wrappers have the big advantage that they make environments highly
# modular. For instance, instead of flattening the observations from
# GridWorld, you might only want to look at the relative position of the
# target and the agent. In the section on
# `ObservationWrappers </api/wrappers/observation_wrappers/#observation-wrappers>`__ we have
# implemented a wrapper that does this job. This wrapper is also available
# in ``gymnasium_env/wrappers/relative_position.py``:
#
# .. code:: python
#
#    import gymnasium
#    import gymnasium_env
#    from gymnasium_env.wrappers import RelativePosition
#
#    env = gymnasium.make('gymnasium_env/GridWorld-v0')
#    wrapped_env = RelativePosition(env)
#    print(wrapped_env.reset())     # E.g.  [-3  3], {}
-												Fix #118 (#121)


											
										
										
											2022-11-10 12:18:57 +00:00
+								# fmt: off
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								"""
 								Make your own custom environment
 								================================
-												Improve the tutorial rendering (#1353)


											
										
										
											2025-04-02 21:17:14 +01:00
+								This tutorial shows how to create new environment and links to relevant useful wrappers, utilities and tests included in Gymnasium.
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
 								Setup
 								------
 								Recommended solution
 								~~~~~~~~~~~~~~~~~~~~
 . Install ``pipx`` following the `pipx documentation <https://pypa.github.io/pipx/installation/>`_.
 . Then install Copier:
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
 								.. code:: console
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								    pipx install copier
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								Alternative solutions
-												Add more introductory pages (#791)


											
										
										
											2023-12-08 12:46:40 +00:00
+								~~~~~~~~~~~~~~~~~~~~~
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								Install Copier with Pip or Conda:
 								.. code:: console
 								    pip install copier
 								or
 								.. code:: console
 								    conda install -c conda-forge copier
 								Generate your environment
 								------------------------------
 								You can check that ``Copier`` has been correctly installed by running the following command, which should output a version number:
 								.. code:: console
 								    copier --version
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								Then you can just run the following command and replace the string ``path/to/directory`` by the path to the directory where you want to create your new project.
 								.. code:: console
 								    copier copy https://github.com/Farama-Foundation/gymnasium-env-template.git "path/to/directory"
 								Answer the questions, and when it's finished you should get a project structure like the following:
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
 								.. code:: sh
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								    .
 								    ├── gymnasium_env
-												Update tutorial ci (#1344)


											
										
										
											2025-03-31 16:25:33 +01:00
+								    │        ├── envs
 								    │        │       ├── grid_world.py
 								    │        │       └── __init__.py
 								    │        ├── __init__.py
 								    │        └── wrappers
 								    │            ├── clip_reward.py
 								    │            ├── discrete_actions.py
 								    │            ├── __init__.py
 								    │            ├── reacher_weighted_reward.py
 								    │            └── relative_position.py
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								    ├── LICENSE
 								    ├── pyproject.toml
 								    └── README.md
 								Subclassing gymnasium.Env
 								-------------------------
 								Before learning how to create your own environment you should check out
 								`the documentation of Gymnasium’s API </api/env>`__.
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
 								To illustrate the process of subclassing ``gymnasium.Env``, we will
 								implement a very simplistic game, called ``GridWorldEnv``. We will write
 								the code for our custom environment in
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								``gymnasium_env/envs/grid_world.py``. The environment
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								consists of a 2-dimensional square grid of fixed size (specified via the
 								``size`` parameter during construction). The agent can move vertically
 								or horizontally between grid cells in each timestep. The goal of the
 								agent is to navigate to a target on the grid that has been placed
 								randomly at the beginning of the episode.
 								-  Observations provide the location of the target and agent.
 								-  There are 4 actions in our environment, corresponding to the
 								   movements “right”, “up”, “left”, and “down”.
 								-  A done signal is issued as soon as the agent has navigated to the
 								   grid cell where the target is located.
 								-  Rewards are binary and sparse, meaning that the immediate reward is
 								   always zero, unless the agent has reached the target, then it is 1.
 								An episode in this environment (with ``size=5``) might look like this:
-												Add more introductory pages (#791)


											
										
										
											2023-12-08 12:46:40 +00:00
+								 .. image:: /_static/videos/tutorials/environment-creation-example-episode.gif
 								    :width: 400
 								    :alt: Example episode of the custom environment
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								where the blue dot is the agent and the red square represents the
 								target.
 								Let us look at the source code of ``GridWorldEnv`` piece by piece:
 								"""
 								# %%
 								# Declaration and Initialization
 								# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 								#
 								# Our custom environment will inherit from the abstract class
 								# ``gymnasium.Env``. You shouldn’t forget to add the ``metadata``
 								# attribute to your class. There, you should specify the render-modes that
-												Add more introductory pages (#791)


											
										
										
											2023-12-08 12:46:40 +00:00
+								# are supported by your environment (e.g., ``"human"``, ``"rgb_array"``,
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								# ``"ansi"``) and the framerate at which your environment should be
 								# rendered. Every environment should support ``None`` as render-mode; you
 								# don’t need to add it in the metadata. In ``GridWorldEnv``, we will
 								# support the modes “rgb_array” and “human” and render at 4 FPS.
 								#
 								# The ``__init__`` method of our environment will accept the integer
 								# ``size``, that determines the size of the square grid. We will set up
 								# some variables for rendering and define ``self.observation_space`` and
 								# ``self.action_space``. In our case, observations should provide
 								# information about the location of the agent and target on the
 								# 2-dimensional grid. We will choose to represent observations in the form
 								# of dictionaries with keys ``"agent"`` and ``"target"``. An observation
 								# may look like ``{"agent": array([1, 0]), "target": array([0, 3])}``.
 								# Since we have 4 actions in our environment (“right”, “up”, “left”,
 								# “down”), we will use ``Discrete(4)`` as an action space. Here is the
 								# declaration of ``GridWorldEnv`` and the implementation of ``__init__``:
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
 								# gymnasium_env/envs/grid_world.py
 								from enum import Enum
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								import numpy as np
 								import pygame
 								import gymnasium as gym
 								from gymnasium import spaces
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								class Actions(Enum):
-												Add more introductory pages (#791)


											
										
										
											2023-12-08 12:46:40 +00:00
+								    RIGHT = 0
 								    UP = 1
 								    LEFT = 2
 								    DOWN = 3
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								class GridWorldEnv(gym.Env):
-												Revert "ENH: allow metadata["render_modes"] to be a set" (#251)


											
										
										
											2023-01-09 13:12:07 +00:00
+								    metadata = {"render_modes": ["human", "rgb_array"], "render_fps": 4}
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
 								    def __init__(self, render_mode=None, size=5):
 								        self.size = size  # The size of the square grid
 								        self.window_size = 512  # The size of the PyGame window
 								        # Observations are dictionaries with the agent's and the target's location.
 								        # Each location is encoded as an element of {0, ..., `size`}^2, i.e. MultiDiscrete([size, size]).
 								        self.observation_space = spaces.Dict(
 								            {
 								                "agent": spaces.Box(0, size - 1, shape=(2,), dtype=int),
 								                "target": spaces.Box(0, size - 1, shape=(2,), dtype=int),
 								            }
 								        )
-												Add more introductory pages (#791)


											
										
										
											2023-12-08 12:46:40 +00:00
+								        self._agent_location = np.array([-1, -1], dtype=int)
 								        self._target_location = np.array([-1, -1], dtype=int)
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
 								        # We have 4 actions, corresponding to "right", "up", "left", "down"
 								        self.action_space = spaces.Discrete(4)
 								        """
 								        The following dictionary maps abstract actions from `self.action_space` to
 								        the direction we will walk in if that action is taken.
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								        i.e. 0 corresponds to "right", 1 to "up" etc.
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								        """
 								        self._action_to_direction = {
-												Add more introductory pages (#791)


											
										
										
											2023-12-08 12:46:40 +00:00
+								            Actions.RIGHT.value: np.array([1, 0]),
 								            Actions.UP.value: np.array([0, 1]),
 								            Actions.LEFT.value: np.array([-1, 0]),
 								            Actions.DOWN.value: np.array([0, -1]),
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								        }
-												Revert "ENH: allow metadata["render_modes"] to be a set" (#251)


											
										
										
											2023-01-09 13:12:07 +00:00
+								        assert render_mode is None or render_mode in self.metadata["render_modes"]
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								        self.render_mode = render_mode
 								        """
 								        If human-rendering is used, `self.window` will be a reference
 								        to the window that we draw to. `self.clock` will be a clock that is used
 								        to ensure that the environment is rendered at the correct framerate in
 								        human-mode. They will remain `None` until human-mode is used for the
 								        first time.
 								        """
 								        self.window = None
 								        self.clock = None
-												Fix #118 (#121)


											
										
										
											2022-11-10 12:18:57 +00:00
+								# %%
 								# Constructing Observations From Environment States
 								# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 								#
 								# Since we will need to compute observations both in ``reset`` and
 								# ``step``, it is often convenient to have a (private) method ``_get_obs``
 								# that translates the environment’s state into an observation. However,
 								# this is not mandatory and you may as well compute observations in
 								# ``reset`` and ``step`` separately:
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
 								    def _get_obs(self):
 								        return {"agent": self._agent_location, "target": self._target_location}
-												Fix #118 (#121)


											
										
										
											2022-11-10 12:18:57 +00:00
+								# %%
 								# We can also implement a similar method for the auxiliary information
 								# that is returned by ``step`` and ``reset``. In our case, we would like
 								# to provide the manhattan distance between the agent and the target:
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
 								    def _get_info(self):
 								        return {
 								            "distance": np.linalg.norm(
 								                self._agent_location - self._target_location, ord=1
 								            )
 								        }
-												Fix #118 (#121)


											
										
										
											2022-11-10 12:18:57 +00:00
+								# %%
 								# Oftentimes, info will also contain some data that is only available
-												Add more introductory pages (#791)


											
										
										
											2023-12-08 12:46:40 +00:00
+								# inside the ``step`` method (e.g., individual reward terms). In that case,
-												Fix #118 (#121)


											
										
										
											2022-11-10 12:18:57 +00:00
+								# we would have to update the dictionary that is returned by ``_get_info``
 								# in ``step``.
 								# %%
 								# Reset
 								# ~~~~~
 								#
 								# The ``reset`` method will be called to initiate a new episode. You may
 								# assume that the ``step`` method will not be called before ``reset`` has
 								# been called. Moreover, ``reset`` should be called whenever a done signal
 								# has been issued. Users may pass the ``seed`` keyword to ``reset`` to
 								# initialize any random number generator that is used by the environment
 								# to a deterministic state. It is recommended to use the random number
 								# generator ``self.np_random`` that is provided by the environment’s base
 								# class, ``gymnasium.Env``. If you only use this RNG, you do not need to
 								# worry much about seeding, *but you need to remember to call
 								# ``super().reset(seed=seed)``* to make sure that ``gymnasium.Env``
 								# correctly seeds the RNG. Once this is done, we can randomly set the
 								# state of our environment. In our case, we randomly choose the agent’s
 								# location and the random sample target positions, until it does not
 								# coincide with the agent’s position.
 								#
 								# The ``reset`` method should return a tuple of the initial observation
 								# and some auxiliary information. We can use the methods ``_get_obs`` and
 								# ``_get_info`` that we implemented earlier for that:
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
 								    def reset(self, seed=None, options=None):
 								        # We need the following line to seed self.np_random
 								        super().reset(seed=seed)
 								        # Choose the agent's location uniformly at random
 								        self._agent_location = self.np_random.integers(0, self.size, size=2, dtype=int)
 								        # We will sample the target's location randomly until it does not coincide with the agent's location
 								        self._target_location = self._agent_location
 								        while np.array_equal(self._target_location, self._agent_location):
 								            self._target_location = self.np_random.integers(
 , self.size, size=2, dtype=int
 								            )
 								        observation = self._get_obs()
 								        info = self._get_info()
 								        if self.render_mode == "human":
 								            self._render_frame()
 								        return observation, info
-												Fix #118 (#121)


											
										
										
											2022-11-10 12:18:57 +00:00
+								# %%
 								# Step
 								# ~~~~
 								#
 								# The ``step`` method usually contains most of the logic of your
 								# environment. It accepts an ``action``, computes the state of the
-												docs: update custom env step return :memo: (#565)


											
										
										
											2023-06-23 12:14:12 +02:00
+								# environment after applying that action and returns the 5-tuple
 								# ``(observation, reward, terminated, truncated, info)``. See
 								# :meth:`gymnasium.Env.step`. Once the new state of the environment has
 								# been computed, we can check whether it is a terminal state and we set
 								# ``done`` accordingly. Since we are using sparse binary rewards in
 								# ``GridWorldEnv``, computing ``reward`` is trivial once we know
 								# ``done``.To gather ``observation`` and ``info``, we can again make
-												Fix #118 (#121)


											
										
										
											2022-11-10 12:18:57 +00:00
+								# use of ``_get_obs`` and ``_get_info``:
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
 								    def step(self, action):
 								        # Map the action (element of {0,1,2,3}) to the direction we walk in
 								        direction = self._action_to_direction[action]
 								        # We use `np.clip` to make sure we don't leave the grid
 								        self._agent_location = np.clip(
 								            self._agent_location + direction, 0, self.size - 1
 								        )
 								        # An episode is done iff the agent has reached the target
 								        terminated = np.array_equal(self._agent_location, self._target_location)
 								        reward = 1 if terminated else 0  # Binary sparse rewards
 								        observation = self._get_obs()
 								        info = self._get_info()
 								        if self.render_mode == "human":
 								            self._render_frame()
 								        return observation, reward, terminated, False, info
-												Fix #118 (#121)


											
										
										
											2022-11-10 12:18:57 +00:00
+								# %%
 								# Rendering
 								# ~~~~~~~~~
 								#
 								# Here, we are using PyGame for rendering. A similar approach to rendering
 								# is used in many environments that are included with Gymnasium and you
 								# can use it as a skeleton for your own environments:
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
 								    def render(self):
 								        if self.render_mode == "rgb_array":
 								            return self._render_frame()
-												Fix #118 (#121)


											
										
										
											2022-11-10 12:18:57 +00:00
+								    def _render_frame(self):
 								        if self.window is None and self.render_mode == "human":
 								            pygame.init()
 								            pygame.display.init()
 								            self.window = pygame.display.set_mode(
 								                (self.window_size, self.window_size)
 								            )
 								        if self.clock is None and self.render_mode == "human":
 								            self.clock = pygame.time.Clock()
 								        canvas = pygame.Surface((self.window_size, self.window_size))
 								        canvas.fill((255, 255, 255))
 								        pix_square_size = (
 								            self.window_size / self.size
 								        )  # The size of a single grid square in pixels
 								        # First we draw the target
 								        pygame.draw.rect(
 								            canvas,
 								            (255, 0, 0),
 								            pygame.Rect(
 								                pix_square_size * self._target_location,
 								                (pix_square_size, pix_square_size),
 								            ),
 								        )
 								        # Now we draw the agent
 								        pygame.draw.circle(
 								            canvas,
 								            (0, 0, 255),
 								            (self._agent_location + 0.5) * pix_square_size,
 								            pix_square_size / 3,
 								        )
 								        # Finally, add some gridlines
 								        for x in range(self.size + 1):
 								            pygame.draw.line(
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								                canvas,
-												Fix #118 (#121)


											
										
										
											2022-11-10 12:18:57 +00:00
+,
 								                (0, pix_square_size * x),
 								                (self.window_size, pix_square_size * x),
 								                width=3,
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								            )
-												Fix #118 (#121)


											
										
										
											2022-11-10 12:18:57 +00:00
+								            pygame.draw.line(
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								                canvas,
-												Fix #118 (#121)


											
										
										
											2022-11-10 12:18:57 +00:00
+,
 								                (pix_square_size * x, 0),
 								                (pix_square_size * x, self.window_size),
 								                width=3,
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								            )
-												Fix #118 (#121)


											
										
										
											2022-11-10 12:18:57 +00:00
+								        if self.render_mode == "human":
 								            # The following line copies our drawings from `canvas` to the visible window
 								            self.window.blit(canvas, canvas.get_rect())
 								            pygame.event.pump()
 								            pygame.display.update()
 								            # We need to ensure that human-rendering occurs at the predefined framerate.
 								            # The following line will automatically add a delay to keep the framerate stable.
 								            self.clock.tick(self.metadata["render_fps"])
 								        else:  # rgb_array
 								            return np.transpose(
 								                np.array(pygame.surfarray.pixels3d(canvas)), axes=(1, 0, 2)
 								            )
 								# %%
 								# Close
 								# ~~~~~
 								#
 								# The ``close`` method should close any open resources that were used by
 								# the environment. In many cases, you don’t actually have to bother to
 								# implement this method. However, in our example ``render_mode`` may be
 								# ``"human"`` and we might need to close the window that has been opened:
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
 								    def close(self):
 								        if self.window is not None:
 								            pygame.display.quit()
 								            pygame.quit()
 								# %%
 								# In other environments ``close`` might also close files that were opened
 								# or release other resources. You shouldn’t interact with the environment
 								# after having called ``close``.
 								# %%
 								# Registering Envs
 								# ----------------
 								#
 								# In order for the custom environments to be detected by Gymnasium, they
 								# must be registered as follows. We will choose to put this code in
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								# ``gymnasium_env/__init__.py``.
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#
 								# .. code:: python
 								#
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								#    from gymnasium.envs.registration import register
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								#    register(
 								#        id="gymnasium_env/GridWorld-v0",
 								#        entry_point="gymnasium_env.envs:GridWorldEnv",
 								#    )
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
 								# %%
 								# The environment ID consists of three components, two of which are
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								# optional: an optional namespace (here: ``gymnasium_env``), a mandatory
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								# name (here: ``GridWorld``) and an optional but recommended version
 								# (here: v0). It might have also been registered as ``GridWorld-v0`` (the
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								# recommended approach), ``GridWorld`` or ``gymnasium_env/GridWorld``, and
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								# the appropriate ID should then be used during environment creation.
 								#
 								# The keyword argument ``max_episode_steps=300`` will ensure that
 								# GridWorld environments that are instantiated via ``gymnasium.make`` will
 								# be wrapped in a ``TimeLimit`` wrapper (see `the wrapper
 								# documentation </api/wrappers>`__ for more information). A done signal
 								# will then be produced if the agent has reached the target *or* 300 steps
 								# have been executed in the current episode. To distinguish truncation and
 								# termination, you can check ``info["TimeLimit.truncated"]``.
 								#
 								# Apart from ``id`` and ``entrypoint``, you may pass the following
 								# additional keyword arguments to ``register``:
 								#
 								# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
 								# | Name                 | Type      | Default   | Description                                                                                                   |
 								# +======================+===========+===========+===============================================================================================================+
 								# | ``reward_threshold`` | ``float`` | ``None``  | The reward threshold before the task is  considered solved                                                    |
 								# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
 								# | ``nondeterministic`` | ``bool``  | ``False`` | Whether this environment is non-deterministic even after seeding                                              |
 								# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
 								# | ``max_episode_steps``| ``int``   | ``None``  | The maximum number of steps that an episode can consist of. If not ``None``, a ``TimeLimit`` wrapper is added |
 								# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
 								# | ``order_enforce``    | ``bool``  | ``True``  | Whether to wrap the environment in an  ``OrderEnforcing`` wrapper                                             |
 								# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
 								# | ``kwargs``           | ``dict``  | ``{}``    | The default kwargs to pass to the environment class                                                           |
 								# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
 								#
 								# Most of these keywords (except for ``max_episode_steps``,
 								# ``order_enforce`` and ``kwargs``) do not alter the behavior of
 								# environment instances but merely provide some extra information about
 								# your environment. After registration, our custom ``GridWorldEnv``
 								# environment can be created with
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								# ``env = gymnasium.make('gymnasium_env/GridWorld-v0')``.
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								# ``gymnasium_env/envs/__init__.py`` should have:
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#
 								# .. code:: python
 								#
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								#    from gymnasium_env.envs.grid_world import GridWorldEnv
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#
 								# If your environment is not registered, you may optionally pass a module
 								# to import, that would register your environment before creating it like
 								# this - ``env = gymnasium.make('module:Env-v0')``, where ``module``
 								# contains the registration code. For the GridWorld env, the registration
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								# code is run by importing ``gymnasium_env`` so if it were not possible to
 								# import gymnasium_env explicitly, you could register while making by
-												Fix missing closing single quote in environment creation tutorial (#1023)


											
										
										
											2024-04-16 11:31:53 +02:00
+								# ``env = gymnasium.make('gymnasium_env:gymnasium_env/GridWorld-v0')``. This
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								# is especially useful when you’re allowed to pass only the environment ID
 								# into a third-party codebase (eg. learning library). This lets you
 								# register your environment without needing to edit the library’s source
 								# code.
 								# %%
 								# Creating a Package
 								# ------------------
 								#
 								# The last step is to structure our code as a Python package. This
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								# involves configuring ``pyproject.toml``. A minimal example of how
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								# to do so is as follows:
 								#
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								# .. code:: toml
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								#    [build-system]
 								#    requires = ["hatchling"]
 								#    build-backend = "hatchling.build"
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								#    [project]
 								#    name = "gymnasium_env"
 								#    version = "0.0.1"
 								#    dependencies = [
 								#      "gymnasium",
 								#      "pygame==2.1.3",
 								#      "pre-commit",
 								#    ]
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#
 								# Creating Environment Instances
 								# ------------------------------
 								#
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								# Now you can install your package locally with:
 								#
 								# .. code:: console
 								#
 								#    pip install -e .
 								#
 								# And you can create an instance of the environment via:
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#
 								# .. code:: python
 								#
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								#    # run_gymnasium_env.py
 								#
 								#    import gymnasium
 								#    import gymnasium_env
 								#    env = gymnasium.make('gymnasium_env/GridWorld-v0')
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#
 								# You can also pass keyword arguments of your environment’s constructor to
 								# ``gymnasium.make`` to customize the environment. In our case, we could
 								# do:
 								#
 								# .. code:: python
 								#
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								#    env = gymnasium.make('gymnasium_env/GridWorld-v0', size=10)
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#
 								# Sometimes, you may find it more convenient to skip registration and call
 								# the environment’s constructor yourself. Some may find this approach more
 								# pythonic and environments that are instantiated like this are also
 								# perfectly fine (but remember to add wrappers as well!).
 								#
 								# Using Wrappers
 								# --------------
 								#
 								# Oftentimes, we want to use different variants of a custom environment,
 								# or we want to modify the behavior of an environment that is provided by
 								# Gymnasium or some other party. Wrappers allow us to do this without
 								# changing the environment implementation or adding any boilerplate code.
 								# Check out the `wrapper documentation </api/wrappers/>`__ for details on
 								# how to use wrappers and instructions for implementing your own. In our
 								# example, observations cannot be used directly in learning code because
 								# they are dictionaries. However, we don’t actually need to touch our
 								# environment implementation to fix this! We can simply add a wrapper on
 								# top of environment instances to flatten observations into a single
 								# array:
 								#
 								# .. code:: python
 								#
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								#    import gymnasium
 								#    import gymnasium_env
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#    from gymnasium.wrappers import FlattenObservation
 								#
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								#    env = gymnasium.make('gymnasium_env/GridWorld-v0')
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#    wrapped_env = FlattenObservation(env)
 								#    print(wrapped_env.reset())     # E.g.  [3 0 3 3], {}
 								#
 								# Wrappers have the big advantage that they make environments highly
 								# modular. For instance, instead of flattening the observations from
 								# GridWorld, you might only want to look at the relative position of the
 								# target and the agent. In the section on
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								# `ObservationWrappers </api/wrappers/observation_wrappers/#observation-wrappers>`__ we have
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								# implemented a wrapper that does this job. This wrapper is also available
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								# in ``gymnasium_env/wrappers/relative_position.py``:
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#
 								# .. code:: python
 								#
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								#    import gymnasium
 								#    import gymnasium_env
 								#    from gymnasium_env.wrappers import RelativePosition
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#
-												[Docs] Custom environment tutorial refactoring (#709)


											
										
										
											2023-09-11 09:25:48 +00:00
+								#    env = gymnasium.make('gymnasium_env/GridWorld-v0')
-												Updating tutorials (#63)


											
										
										
											2022-10-21 16:36:36 +01:00
+								#    wrapped_env = RelativePosition(env)
 								#    print(wrapped_env.reset())     # E.g.  [-3  3], {}