Updating tutorials (#63)

2025-07-31 13:54:31 +00:00 · 2022-10-21 16:36:36 +01:00
parent 8b81b7dcc2
commit 08dacf2f7b
7 changed files with 623 additions and 27 deletions
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -15,7 +15,7 @@ repos:
    hooks:
      - id: flake8
        args:
-          - '--per-file-ignores=*/__init__.py:F401 gymnasium/envs/registration.py:E704'
+          - '--per-file-ignores=*/__init__.py:F401 gymnasium/envs/registration.py:E704 docs/tutorials/*.py:E402'
          - --ignore=E203,W503,E741
          - --max-complexity=30
          - --max-line-length=456
--- a/docs/README.md
+++ b/docs/README.md
@@ -8,7 +8,7 @@ If you are modifying a non-environment page or an atari environment page, please
 ### Editing an environment page
-If you are editing an Atari environment, directly edit the Markdown file in this repository. 
+If you are editing an Atari environment, directly edit the Markdown file in this repository.
 Otherwise, fork Gymnasium and edit the docstring in the environment's Python file. Then, pip install your Gymnasium fork and run `docs/scripts/gen_mds.py` in this repo. This will automatically generate a Markdown documentation file for the environment.
@@ -49,3 +49,11 @@ To rebuild the documentation automatically every time a change is made:
 cd docs
 sphinx-autobuild -b dirhtml . _build
 ```
 ## Writing Tutorials
 We use Sphinx-Gallery to build the tutorials inside the `docs/tutorials` directory. Check `docs/tutorials/demo.py` to see an example of a tutorial and [Sphinx-Gallery documentation](https://sphinx-gallery.github.io/stable/syntax.html) for more information.
 To convert Jupyer Notebooks to the python tutorials you can use [this script](https://gist.github.com/mgoulao/f07f5f79f6cd9a721db8a34bba0a19a7).
 If you want Sphinx-Gallery to execute the tutorial (which adds outputs and plots) then the file name should start with `run_`. Note that this adds to the build time so make sure the script doesn't take more than a few seconds to execute.
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -41,6 +41,7 @@ extensions = [
    "sphinx.ext.autodoc",
    "sphinx.ext.githubpages",
    "myst_parser",
    "furo.gen_tutorials",
 ]
 # Add any paths that contain templates here, relative to this directory.
@@ -91,5 +92,6 @@ html_css_files = []
 # -- Generate Tutorials -------------------------------------------------
 gen_tutorials.generate(
    os.path.dirname(__file__),
    os.path.join(os.path.dirname(__file__), "tutorials"),
 )
--- a/docs/content/basic_usage.md
+++ b/docs/content/basic_usage.md
@@ -8,7 +8,7 @@ firstpage:
 ## Initializing Environments
-Initializing environments is very easy in Gymnasium and can be done via: 
+Initializing environments is very easy in Gymnasium and can be done via:
 ```python
 import gymnasium as gym
@@ -32,11 +32,11 @@ Gymnasium implements the classic "agent-environment loop":
 ```
 The agent performs some actions in the environment (usually by passing some control inputs to the environment, e.g. torque inputs of motors) and observes
-how the environment's state changes. One such action-observation exchange is referred to as a *timestep*. 
+how the environment's state changes. One such action-observation exchange is referred to as a *timestep*.
 The goal in RL is to manipulate the environment in some specific way. For instance, we want the agent to navigate a robot
 to a specific point in space. If it succeeds in doing this (or makes some progress towards that goal), it will receive a positive reward
-alongside the observation for this timestep. The reward may also be negative or 0, if the agent did not yet succeed (or did not make any progress). 
+alongside the observation for this timestep. The reward may also be negative or 0, if the agent did not yet succeed (or did not make any progress).
 The agent will then be trained to maximize the reward it accumulates over many timesteps.
 After some timesteps, the environment may enter a terminal state. For instance, the robot may have crashed, or the agent may have succeeded in completing a task. In that case, we want to reset the environment to a new initial state. The environment issues a terminated signal to the agent if it enters such a terminal state. Sometimes we also want to end the episode after a fixed number of timesteps, in this case, the environment issues a truncated signal.
@@ -71,41 +71,41 @@ The output should look something like this:
 Every environment specifies the format of valid actions by providing an `env.action_space` attribute. Similarly,
 the format of valid observations is specified by `env.observation_space`.
-In the example above we sampled random actions via `env.action_space.sample()`. Note that we need to seed the action space separately from the 
+In the example above we sampled random actions via `env.action_space.sample()`. Note that we need to seed the action space separately from the
 environment to ensure reproducible samples.
 ### Change in env.step API
-Previously, the step method returned only one boolean - `done`. This is being deprecated in favour of returning two booleans `terminated` and `truncated` (v0.26 onwards). 
+Previously, the step method returned only one boolean - `done`. This is being deprecated in favour of returning two booleans `terminated` and `truncated` (v0.26 onwards).
-`terminated` signal is set to `True` when the core environment terminates inherently because of task completion, failure etc. a condition defined in the MDP.   
+`terminated` signal is set to `True` when the core environment terminates inherently because of task completion, failure etc. a condition defined in the MDP.
-`truncated` signal is set to `True` when the episode ends specifically because of a time-limit or a condition not inherent to the environment (not defined in the MDP). 
+`truncated` signal is set to `True` when the episode ends specifically because of a time-limit or a condition not inherent to the environment (not defined in the MDP).
-It is possible for `terminated=True` and `truncated=True` to occur at the same time when termination and truncation occur at the same step. 
+It is possible for `terminated=True` and `truncated=True` to occur at the same time when termination and truncation occur at the same step.
-This is explained in detail in the `Handling Time Limits` section. 
+This is explained in detail in the `Handling Time Limits` section.
 #### Backward compatibility
-Gym will retain support for the old API through compatibility wrappers. 
+Gym will retain support for the old API through compatibility wrappers.
-Users can toggle the old API through `make` by setting `apply_api_compatibility=True`. 
+Users can toggle the old API through `make` by setting `apply_api_compatibility=True`.
 ```python
 env = gym.make("CartPole-v1", apply_api_compatibility=True)
 ```
-This can also be done explicitly through a wrapper: 
+This can also be done explicitly through a wrapper:
 ```python
 from gymnasium.wrappers import StepAPICompatibility
 env = StepAPICompatibility(CustomEnv(), output_truncation_bool=False)
 ```
-For more details see the wrappers section. 
+For more details see the wrappers section.
 ## Checking API-Conformity
-If you have implemented a custom environment and would like to perform a sanity check to make sure that it conforms to 
+If you have implemented a custom environment and would like to perform a sanity check to make sure that it conforms to
-the API, you can run: 
+the API, you can run:
 ```python
 >>> from gymnasium.utils.env_checker import check_env
@@ -113,8 +113,8 @@ the API, you can run:
 ```
 This function will throw an exception if it seems like your environment does not follow the Gymnasium API. It will also produce
-warnings if it looks like you made a mistake or do not follow a best practice (e.g. if `observation_space` looks like 
+warnings if it looks like you made a mistake or do not follow a best practice (e.g. if `observation_space` looks like
-an image but does not have the right dtype). Warnings can be turned off by passing `warn=False`. By default, `check_env` will 
+an image but does not have the right dtype). Warnings can be turned off by passing `warn=False`. By default, `check_env` will
 not check the `render` method. To change this behavior, you can pass `skip_render_check=False`.
 > After running `check_env` on an environment, you should not reuse the instance that was checked, as it may have already
@@ -136,7 +136,7 @@ There are multiple `Space` types available in Gymnasium:
 ```python
 >>> from gymnasium.spaces import Box, Discrete, Dict, Tuple, MultiBinary, MultiDiscrete
->>> import numpy as np 
+>>> import numpy as np
 >>>
 >>> observation_space = Box(low=-1.0, high=2.0, shape=(3,), dtype=np.float32)
 >>> observation_space.sample()
@@ -145,11 +145,11 @@ There are multiple `Space` types available in Gymnasium:
 >>> observation_space = Discrete(4)
 >>> observation_space.sample()
 1
->>> 
+>>>
 >>> observation_space = Discrete(5, start=-2)
 >>> observation_space.sample()
 -2
->>> 
+>>>
 >>> observation_space = Dict({"position": Discrete(2), "velocity": Discrete(3)})
 >>> observation_space.sample()
 OrderedDict([('position', 0), ('velocity', 1)])
@@ -170,7 +170,7 @@ OrderedDict([('position', 0), ('velocity', 1)])
 ## Wrappers
 Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly.
-Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can 
+Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can
 also be chained to combine their effects. Most environments that are generated via `gymnasium.make` will already be wrapped by default.
 In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along
@@ -217,7 +217,7 @@ If you have a wrapped environment, and you want to get the unwrapped environment
 ## Playing within an environment
-You can also play the environment using your keyboard using the `play` function in `gymnasium.utils.play`. 
+You can also play the environment using your keyboard using the `play` function in `gymnasium.utils.play`.
 ```python
 from gymnasium.utils.play import play
 play(gymnasium.make('Pong-v0'))
--- a/docs/index.md
+++ b/docs/index.md
@@ -66,9 +66,6 @@ environments/third_party_environments
 :glob:
 :caption: Tutorials
 content/environment_creation
 content/vectorising
 content/handling_timelimits
 tutorials/*
 ```
--- a/docs/tutorials/environment_creation.py
+++ b/docs/tutorials/environment_creation.py
@@ -0,0 +1,509 @@
 """
 Make your own custom environment
 ================================
 This documentation overviews creating new environments and relevant
 useful wrappers, utilities and tests included in Gymnasium designed for
 the creation of new environments. You can clone gym-examples to play
 with the code that is presented here. We recommend that you use a virtual environment:
 .. code:: console
   git clone https://github.com/Farama-Foundation/gym-examples
   cd gym-examples
   python -m venv .env
   source .env/bin/activate
   pip install -e .
 Subclassing gymnasium.Env
 -------------------------
 Before learning how to create your own environment you should check out
 `the documentation of Gymnasium’s API </api/core>`__.
 We will be concerned with a subset of gym-examples that looks like this:
 .. code:: sh
   gym-examples/
     README.md
     setup.py
     gym_examples/
       __init__.py
       envs/
         __init__.py
         grid_world.py
       wrappers/
         __init__.py
         relative_position.py
         reacher_weighted_reward.py
         discrete_action.py
         clip_reward.py
 To illustrate the process of subclassing ``gymnasium.Env``, we will
 implement a very simplistic game, called ``GridWorldEnv``. We will write
 the code for our custom environment in
 ``gym-examples/gym_examples/envs/grid_world.py``. The environment
 consists of a 2-dimensional square grid of fixed size (specified via the
 ``size`` parameter during construction). The agent can move vertically
 or horizontally between grid cells in each timestep. The goal of the
 agent is to navigate to a target on the grid that has been placed
 randomly at the beginning of the episode.
 -  Observations provide the location of the target and agent.
 -  There are 4 actions in our environment, corresponding to the
   movements “right”, “up”, “left”, and “down”.
 -  A done signal is issued as soon as the agent has navigated to the
   grid cell where the target is located.
 -  Rewards are binary and sparse, meaning that the immediate reward is
   always zero, unless the agent has reached the target, then it is 1.
 An episode in this environment (with ``size=5``) might look like this:
 where the blue dot is the agent and the red square represents the
 target.
 Let us look at the source code of ``GridWorldEnv`` piece by piece:
 """
 # %%
 # Declaration and Initialization
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 #
 # Our custom environment will inherit from the abstract class
 # ``gymnasium.Env``. You shouldn’t forget to add the ``metadata``
 # attribute to your class. There, you should specify the render-modes that
 # are supported by your environment (e.g. ``"human"``, ``"rgb_array"``,
 # ``"ansi"``) and the framerate at which your environment should be
 # rendered. Every environment should support ``None`` as render-mode; you
 # don’t need to add it in the metadata. In ``GridWorldEnv``, we will
 # support the modes “rgb_array” and “human” and render at 4 FPS.
 #
 # The ``__init__`` method of our environment will accept the integer
 # ``size``, that determines the size of the square grid. We will set up
 # some variables for rendering and define ``self.observation_space`` and
 # ``self.action_space``. In our case, observations should provide
 # information about the location of the agent and target on the
 # 2-dimensional grid. We will choose to represent observations in the form
 # of dictionaries with keys ``"agent"`` and ``"target"``. An observation
 # may look like ``{"agent": array([1, 0]), "target": array([0, 3])}``.
 # Since we have 4 actions in our environment (“right”, “up”, “left”,
 # “down”), we will use ``Discrete(4)`` as an action space. Here is the
 # declaration of ``GridWorldEnv`` and the implementation of ``__init__``:
 import numpy as np
 import pygame
 import gymnasium as gym
 from gymnasium import spaces
 class GridWorldEnv(gym.Env):
    metadata = {"render_modes": ["human", "rgb_array"], "render_fps": 4}
    def __init__(self, render_mode=None, size=5):
        self.size = size  # The size of the square grid
        self.window_size = 512  # The size of the PyGame window
        # Observations are dictionaries with the agent's and the target's location.
        # Each location is encoded as an element of {0, ..., `size`}^2, i.e. MultiDiscrete([size, size]).
        self.observation_space = spaces.Dict(
            {
                "agent": spaces.Box(0, size - 1, shape=(2,), dtype=int),
                "target": spaces.Box(0, size - 1, shape=(2,), dtype=int),
            }
        )
        # We have 4 actions, corresponding to "right", "up", "left", "down"
        self.action_space = spaces.Discrete(4)
        """
        The following dictionary maps abstract actions from `self.action_space` to
        the direction we will walk in if that action is taken.
        I.e. 0 corresponds to "right", 1 to "up" etc.
        """
        self._action_to_direction = {
            0: np.array([1, 0]),
            1: np.array([0, 1]),
            2: np.array([-1, 0]),
            3: np.array([0, -1]),
        }
        assert render_mode is None or render_mode in self.metadata["render_modes"]
        self.render_mode = render_mode
        """
        If human-rendering is used, `self.window` will be a reference
        to the window that we draw to. `self.clock` will be a clock that is used
        to ensure that the environment is rendered at the correct framerate in
        human-mode. They will remain `None` until human-mode is used for the
        first time.
        """
        self.window = None
        self.clock = None
    # %%
    # Constructing Observations From Environment States
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    #
    # Since we will need to compute observations both in ``reset`` and
    # ``step``, it is often convenient to have a (private) method ``_get_obs``
    # that translates the environment’s state into an observation. However,
    # this is not mandatory and you may as well compute observations in
    # ``reset`` and ``step`` separately:
    def _get_obs(self):
        return {"agent": self._agent_location, "target": self._target_location}
    # %%
    # We can also implement a similar method for the auxiliary information
    # that is returned by ``step`` and ``reset``. In our case, we would like
    # to provide the manhattan distance between the agent and the target:
    def _get_info(self):
        return {
            "distance": np.linalg.norm(
                self._agent_location - self._target_location, ord=1
            )
        }
    # %%
    # Oftentimes, info will also contain some data that is only available
    # inside the ``step`` method (e.g. individual reward terms). In that case,
    # we would have to update the dictionary that is returned by ``_get_info``
    # in ``step``.
    # %%
    # Reset
    # ~~~~~
    #
    # The ``reset`` method will be called to initiate a new episode. You may
    # assume that the ``step`` method will not be called before ``reset`` has
    # been called. Moreover, ``reset`` should be called whenever a done signal
    # has been issued. Users may pass the ``seed`` keyword to ``reset`` to
    # initialize any random number generator that is used by the environment
    # to a deterministic state. It is recommended to use the random number
    # generator ``self.np_random`` that is provided by the environment’s base
    # class, ``gymnasium.Env``. If you only use this RNG, you do not need to
    # worry much about seeding, *but you need to remember to call
    # ``super().reset(seed=seed)``* to make sure that ``gymnasium.Env``
    # correctly seeds the RNG. Once this is done, we can randomly set the
    # state of our environment. In our case, we randomly choose the agent’s
    # location and the random sample target positions, until it does not
    # coincide with the agent’s position.
    #
    # The ``reset`` method should return a tuple of the initial observation
    # and some auxiliary information. We can use the methods ``_get_obs`` and
    # ``_get_info`` that we implemented earlier for that:
    def reset(self, seed=None, options=None):
        # We need the following line to seed self.np_random
        super().reset(seed=seed)
        # Choose the agent's location uniformly at random
        self._agent_location = self.np_random.integers(0, self.size, size=2, dtype=int)
        # We will sample the target's location randomly until it does not coincide with the agent's location
        self._target_location = self._agent_location
        while np.array_equal(self._target_location, self._agent_location):
            self._target_location = self.np_random.integers(
                0, self.size, size=2, dtype=int
            )
        observation = self._get_obs()
        info = self._get_info()
        if self.render_mode == "human":
            self._render_frame()
        return observation, info
    # %%
    # Step
    # ~~~~
    #
    # The ``step`` method usually contains most of the logic of your
    # environment. It accepts an ``action``, computes the state of the
    # environment after applying that action and returns the 4-tuple
    # ``(observation, reward, done, info)``. Once the new state of the
    # environment has been computed, we can check whether it is a terminal
    # state and we set ``done`` accordingly. Since we are using sparse binary
    # rewards in ``GridWorldEnv``, computing ``reward`` is trivial once we
    # know ``done``. To gather ``observation`` and ``info``, we can again make
    # use of ``_get_obs`` and ``_get_info``:
    def step(self, action):
        # Map the action (element of {0,1,2,3}) to the direction we walk in
        direction = self._action_to_direction[action]
        # We use `np.clip` to make sure we don't leave the grid
        self._agent_location = np.clip(
            self._agent_location + direction, 0, self.size - 1
        )
        # An episode is done iff the agent has reached the target
        terminated = np.array_equal(self._agent_location, self._target_location)
        reward = 1 if terminated else 0  # Binary sparse rewards
        observation = self._get_obs()
        info = self._get_info()
        if self.render_mode == "human":
            self._render_frame()
        return observation, reward, terminated, False, info
    # %%
    # Rendering
    # ~~~~~~~~~
    #
    # Here, we are using PyGame for rendering. A similar approach to rendering
    # is used in many environments that are included with Gymnasium and you
    # can use it as a skeleton for your own environments:
    def render(self):
        if self.render_mode == "rgb_array":
            return self._render_frame()
        def _render_frame(self):
            if self.window is None and self.render_mode == "human":
                pygame.init()
                pygame.display.init()
                self.window = pygame.display.set_mode(
                    (self.window_size, self.window_size)
                )
            if self.clock is None and self.render_mode == "human":
                self.clock = pygame.time.Clock()
            canvas = pygame.Surface((self.window_size, self.window_size))
            canvas.fill((255, 255, 255))
            pix_square_size = (
                self.window_size / self.size
            )  # The size of a single grid square in pixels
            # First we draw the target
            pygame.draw.rect(
                canvas,
                (255, 0, 0),
                pygame.Rect(
                    pix_square_size * self._target_location,
                    (pix_square_size, pix_square_size),
                ),
            )
            # Now we draw the agent
            pygame.draw.circle(
                canvas,
                (0, 0, 255),
                (self._agent_location + 0.5) * pix_square_size,
                pix_square_size / 3,
            )
            # Finally, add some gridlines
            for x in range(self.size + 1):
                pygame.draw.line(
                    canvas,
                    0,
                    (0, pix_square_size * x),
                    (self.window_size, pix_square_size * x),
                    width=3,
                )
                pygame.draw.line(
                    canvas,
                    0,
                    (pix_square_size * x, 0),
                    (pix_square_size * x, self.window_size),
                    width=3,
                )
            if self.render_mode == "human":
                # The following line copies our drawings from `canvas` to the visible window
                self.window.blit(canvas, canvas.get_rect())
                pygame.event.pump()
                pygame.display.update()
                # We need to ensure that human-rendering occurs at the predefined framerate.
                # The following line will automatically add a delay to keep the framerate stable.
                self.clock.tick(self.metadata["render_fps"])
            else:  # rgb_array
                return np.transpose(
                    np.array(pygame.surfarray.pixels3d(canvas)), axes=(1, 0, 2)
                )
    # %%
    # Close
    # ~~~~~
    #
    # The ``close`` method should close any open resources that were used by
    # the environment. In many cases, you don’t actually have to bother to
    # implement this method. However, in our example ``render_mode`` may be
    # ``"human"`` and we might need to close the window that has been opened:
    def close(self):
        if self.window is not None:
            pygame.display.quit()
            pygame.quit()
 # %%
 # In other environments ``close`` might also close files that were opened
 # or release other resources. You shouldn’t interact with the environment
 # after having called ``close``.
 # %%
 # Registering Envs
 # ----------------
 #
 # In order for the custom environments to be detected by Gymnasium, they
 # must be registered as follows. We will choose to put this code in
 # ``gym-examples/gym_examples/__init__.py``.
 #
 # .. code:: python
 #
 #   from gymnasium.envs.registration import register
 #
 #   register(
 #        id="gym_examples/GridWorld-v0",
 #        entry_point="gym_examples.envs:GridWorldEnv",
 #        max_episode_steps=300,
 #   )
 # %%
 # The environment ID consists of three components, two of which are
 # optional: an optional namespace (here: ``gym_examples``), a mandatory
 # name (here: ``GridWorld``) and an optional but recommended version
 # (here: v0). It might have also been registered as ``GridWorld-v0`` (the
 # recommended approach), ``GridWorld`` or ``gym_examples/GridWorld``, and
 # the appropriate ID should then be used during environment creation.
 #
 # The keyword argument ``max_episode_steps=300`` will ensure that
 # GridWorld environments that are instantiated via ``gymnasium.make`` will
 # be wrapped in a ``TimeLimit`` wrapper (see `the wrapper
 # documentation </api/wrappers>`__ for more information). A done signal
 # will then be produced if the agent has reached the target *or* 300 steps
 # have been executed in the current episode. To distinguish truncation and
 # termination, you can check ``info["TimeLimit.truncated"]``.
 #
 # Apart from ``id`` and ``entrypoint``, you may pass the following
 # additional keyword arguments to ``register``:
 #
 # +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
 # | Name                 | Type      | Default   | Description                                                                                                   |
 # +======================+===========+===========+===============================================================================================================+
 # | ``reward_threshold`` | ``float`` | ``None``  | The reward threshold before the task is  considered solved                                                    |
 # +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
 # | ``nondeterministic`` | ``bool``  | ``False`` | Whether this environment is non-deterministic even after seeding                                              |
 # +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
 # | ``max_episode_steps``| ``int``   | ``None``  | The maximum number of steps that an episode can consist of. If not ``None``, a ``TimeLimit`` wrapper is added |
 # +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
 # | ``order_enforce``    | ``bool``  | ``True``  | Whether to wrap the environment in an  ``OrderEnforcing`` wrapper                                             |
 # +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
 # | ``autoreset``        | ``bool``  | ``False`` | Whether to wrap the environment in an ``AutoResetWrapper``                                                    |
 # +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
 # | ``kwargs``           | ``dict``  | ``{}``    | The default kwargs to pass to the environment class                                                           |
 # +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
 #
 # Most of these keywords (except for ``max_episode_steps``,
 # ``order_enforce`` and ``kwargs``) do not alter the behavior of
 # environment instances but merely provide some extra information about
 # your environment. After registration, our custom ``GridWorldEnv``
 # environment can be created with
 # ``env = gymnasium.make('gym_examples/GridWorld-v0')``.
 #
 # ``gym-examples/gym_examples/envs/__init__.py`` should have:
 #
 # .. code:: python
 #
 #    from gym_examples.envs.grid_world import GridWorldEnv
 #
 # If your environment is not registered, you may optionally pass a module
 # to import, that would register your environment before creating it like
 # this - ``env = gymnasium.make('module:Env-v0')``, where ``module``
 # contains the registration code. For the GridWorld env, the registration
 # code is run by importing ``gym_examples`` so if it were not possible to
 # import gym_examples explicitly, you could register while making by
 # ``env = gymnasium.make('gym_examples:gym_examples/GridWorld-v0)``. This
 # is especially useful when you’re allowed to pass only the environment ID
 # into a third-party codebase (eg. learning library). This lets you
 # register your environment without needing to edit the library’s source
 # code.
 # %%
 # Creating a Package
 # ------------------
 #
 # The last step is to structure our code as a Python package. This
 # involves configuring ``gym-examples/setup.py``. A minimal example of how
 # to do so is as follows:
 #
 # .. code:: python
 #
 #    from setuptools import setup
 #
 #    setup(
 #        name="gym_examples",
 #        version="0.0.1",
 #        install_requires=["gymnasium==0.26.0", "pygame==2.1.0"],
 #    )
 #
 # Creating Environment Instances
 # ------------------------------
 #
 # After you have installed your package locally with
 # ``pip install -e gym-examples``, you can create an instance of the
 # environment via:
 #
 # .. code:: python
 #
 #    import gym_examples
 #    env = gymnasium.make('gym_examples/GridWorld-v0')
 #
 # You can also pass keyword arguments of your environment’s constructor to
 # ``gymnasium.make`` to customize the environment. In our case, we could
 # do:
 #
 # .. code:: python
 #
 #    env = gymnasium.make('gym_examples/GridWorld-v0', size=10)
 #
 # Sometimes, you may find it more convenient to skip registration and call
 # the environment’s constructor yourself. Some may find this approach more
 # pythonic and environments that are instantiated like this are also
 # perfectly fine (but remember to add wrappers as well!).
 #
 # Using Wrappers
 # --------------
 #
 # Oftentimes, we want to use different variants of a custom environment,
 # or we want to modify the behavior of an environment that is provided by
 # Gymnasium or some other party. Wrappers allow us to do this without
 # changing the environment implementation or adding any boilerplate code.
 # Check out the `wrapper documentation </api/wrappers/>`__ for details on
 # how to use wrappers and instructions for implementing your own. In our
 # example, observations cannot be used directly in learning code because
 # they are dictionaries. However, we don’t actually need to touch our
 # environment implementation to fix this! We can simply add a wrapper on
 # top of environment instances to flatten observations into a single
 # array:
 #
 # .. code:: python
 #
 #    import gym_examples
 #    from gymnasium.wrappers import FlattenObservation
 #
 #    env = gymnasium.make('gym_examples/GridWorld-v0')
 #    wrapped_env = FlattenObservation(env)
 #    print(wrapped_env.reset())     # E.g.  [3 0 3 3], {}
 #
 # Wrappers have the big advantage that they make environments highly
 # modular. For instance, instead of flattening the observations from
 # GridWorld, you might only want to look at the relative position of the
 # target and the agent. In the section on
 # `ObservationWrappers </api/wrappers/#observationwrapper>`__ we have
 # implemented a wrapper that does this job. This wrapper is also available
 # in gym-examples:
 #
 # .. code:: python
 #
 #    import gym_examples
 #    from gym_examples.wrappers import RelativePosition
 #
 #    env = gymnasium.make('gym_examples/GridWorld-v0')
 #    wrapped_env = RelativePosition(env)
 #    print(wrapped_env.reset())     # E.g.  [-3  3], {}
--- a/docs/tutorials/handling_time_limits.py
+++ b/docs/tutorials/handling_time_limits.py
@@ -0,0 +1,80 @@
 """
 Handling Time Limits
 ====================
 In using Gymnasium environments with reinforcement learning code, a common problem observed is how time limits are incorrectly handled. The ``done`` signal received (in previous versions of OpenAI Gym < 0.26) from ``env.step`` indicated whether an episode has ended. However, this signal did not distinguish whether the episode ended due to ``termination`` or ``truncation``.
 Termination
 -----------
 Termination refers to the episode ending after reaching a terminal state that is defined as part of the environment
 definition. Examples are - task success, task failure, robot falling down etc. Notably, this also includes episodes
 ending in finite-horizon environments due to a time-limit inherent to the environment. Note that to preserve Markov
 property, a representation of the remaining time must be present in the agent's observation in finite-horizon environments.
 `(Reference) <https://arxiv.org/abs/1712.00378>`_
 Truncation
 ----------
 Truncation refers to the episode ending after an externally defined condition (that is outside the scope of the Markov
 Decision Process). This could be a time-limit, a robot going out of bounds etc.
 An infinite-horizon environment is an obvious example of where this is needed. We cannot wait forever for the episode
 to complete, so we set a practical time-limit after which we forcibly halt the episode. The last state in this case is
 not a terminal state since it has a non-zero transition probability of moving to another state as per the Markov
 Decision Process that defines the RL problem. This is also different from time-limits in finite horizon environments
 as the agent in this case has no idea about this time-limit.
 """
 # %%
 # Importance in learning code
 # ---------------------------
 # Bootstrapping (using one or more estimated values of a variable to update estimates of the same variable) is a key
 # aspect of Reinforcement Learning. A value function will tell you how much discounted reward you will get from a
 # particular state if you follow a given policy. When an episode stops at any given point, by looking at the value of
 # the final state, the agent is able to estimate how much discounted reward could have been obtained if the episode has
 # continued. This is an example of handling truncation.
 #
 # More formally, a common example of bootstrapping in RL is updating the estimate of the Q-value function,
 #
 # .. math::
 #   Q_{target}(o_t, a_t) = r_t + \gamma . \max_a(Q(o_{t+1}, a_{t+1}))
 #
 #
 # In classical RL, the new ``Q`` estimate is a weighted average of the previous ``Q`` estimate and ``Q_target`` while in Deep
 # Q-Learning, the error between ``Q_target`` and the previous ``Q`` estimate is minimized.
 #
 # However, at the terminal state, bootstrapping is not done,
 #
 # .. math::
 #   Q_{target}(o_t, a_t) = r_t
 #
 # This is where the distinction between termination and truncation becomes important. When an episode ends due to
 # termination we don't bootstrap, when it ends due to truncation, we bootstrap.
 #
 # While using gymnasium environments, the ``done`` signal (default for < v0.26) is frequently used to determine whether to
 # bootstrap or not. However, this is incorrect since it does not differentiate between termination and truncation.
 #
 # A simple example of value functions is shown below. This is an illustrative example and not part of any specific algorithm.
 #
 # .. code:: python
 #
 #   # INCORRECT
 #   vf_target = rew + gamma * (1 - done) * vf_next_state
 #
 # This is incorrect in the case of episode ending due to a truncation, where bootstrapping needs to happen but it doesn't.
 # %%
 # Solution
 # ----------
 #
 # From v0.26 onwards, Gymnasium's ``env.step`` API returns both termination and truncation information explicitly.
 # In the previous version truncation information was supplied through the info key ``TimeLimit.truncated``.
 # The correct way to handle terminations and truncations now is,
 #
 # .. code:: python
 #
 #   # terminated = done and 'TimeLimit.truncated' not in info
 #   # This was needed in previous versions.
 #
 #   vf_target = rew + gamma * (1 - terminated) * vf_next_state