Updating tutorials (#63)

2025-07-31 13:54:31 +00:00 · 2022-10-21 16:36:36 +01:00
parent 8b81b7dcc2
commit 08dacf2f7b
7 changed files with 623 additions and 27 deletions
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -15,7 +15,7 @@ repos:
    hooks:
      - id: flake8
        args:
-          - '--per-file-ignores=*/__init__.py:F401 gymnasium/envs/registration.py:E704'
+          - '--per-file-ignores=*/__init__.py:F401 gymnasium/envs/registration.py:E704 docs/tutorials/*.py:E402'
          - --ignore=E203,W503,E741
          - --max-complexity=30
          - --max-line-length=456
--- a/docs/README.md
+++ b/docs/README.md
@@ -49,3 +49,11 @@ To rebuild the documentation automatically every time a change is made:
 cd docs
 sphinx-autobuild -b dirhtml . _build
 ```
+
+## Writing Tutorials
+
+We use Sphinx-Gallery to build the tutorials inside the `docs/tutorials` directory. Check `docs/tutorials/demo.py` to see an example of a tutorial and [Sphinx-Gallery documentation](https://sphinx-gallery.github.io/stable/syntax.html) for more information.
+
+To convert Jupyer Notebooks to the python tutorials you can use [this script](https://gist.github.com/mgoulao/f07f5f79f6cd9a721db8a34bba0a19a7).
+
+If you want Sphinx-Gallery to execute the tutorial (which adds outputs and plots) then the file name should start with `run_`. Note that this adds to the build time so make sure the script doesn't take more than a few seconds to execute.
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -41,6 +41,7 @@ extensions = [
    "sphinx.ext.autodoc",
    "sphinx.ext.githubpages",
    "myst_parser",
+    "furo.gen_tutorials",
 ]

 # Add any paths that contain templates here, relative to this directory.
@@ -91,5 +92,6 @@ html_css_files = []
 # -- Generate Tutorials -------------------------------------------------

 gen_tutorials.generate(
+    os.path.dirname(__file__),
    os.path.join(os.path.dirname(__file__), "tutorials"),
 )
--- a/docs/index.md
+++ b/docs/index.md
@@ -66,9 +66,6 @@ environments/third_party_environments
 :glob:
 :caption: Tutorials

-content/environment_creation
-content/vectorising
-content/handling_timelimits
 tutorials/*
 ```

--- a/docs/tutorials/environment_creation.py
+++ b/docs/tutorials/environment_creation.py
@@ -0,0 +1,509 @@
+"""
+Make your own custom environment
+================================
+
+This documentation overviews creating new environments and relevant
+useful wrappers, utilities and tests included in Gymnasium designed for
+the creation of new environments. You can clone gym-examples to play
+with the code that is presented here. We recommend that you use a virtual environment:
+
+.. code:: console
+
+   git clone https://github.com/Farama-Foundation/gym-examples
+   cd gym-examples
+   python -m venv .env
+   source .env/bin/activate
+   pip install -e .
+
+Subclassing gymnasium.Env
+-------------------------
+
+Before learning how to create your own environment you should check out
+`the documentation of Gymnasium’s API </api/core>`__.
+
+We will be concerned with a subset of gym-examples that looks like this:
+
+.. code:: sh
+
+   gym-examples/
+     README.md
+     setup.py
+     gym_examples/
+       __init__.py
+       envs/
+         __init__.py
+         grid_world.py
+       wrappers/
+         __init__.py
+         relative_position.py
+         reacher_weighted_reward.py
+         discrete_action.py
+         clip_reward.py
+
+To illustrate the process of subclassing ``gymnasium.Env``, we will
+implement a very simplistic game, called ``GridWorldEnv``. We will write
+the code for our custom environment in
+``gym-examples/gym_examples/envs/grid_world.py``. The environment
+consists of a 2-dimensional square grid of fixed size (specified via the
+``size`` parameter during construction). The agent can move vertically
+or horizontally between grid cells in each timestep. The goal of the
+agent is to navigate to a target on the grid that has been placed
+randomly at the beginning of the episode.
+
+-  Observations provide the location of the target and agent.
+-  There are 4 actions in our environment, corresponding to the
+   movements “right”, “up”, “left”, and “down”.
+-  A done signal is issued as soon as the agent has navigated to the
+   grid cell where the target is located.
+-  Rewards are binary and sparse, meaning that the immediate reward is
+   always zero, unless the agent has reached the target, then it is 1.
+
+An episode in this environment (with ``size=5``) might look like this:
+
+where the blue dot is the agent and the red square represents the
+target.
+
+Let us look at the source code of ``GridWorldEnv`` piece by piece:
+"""
+
+# %%
+# Declaration and Initialization
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+#
+# Our custom environment will inherit from the abstract class
+# ``gymnasium.Env``. You shouldn’t forget to add the ``metadata``
+# attribute to your class. There, you should specify the render-modes that
+# are supported by your environment (e.g. ``"human"``, ``"rgb_array"``,
+# ``"ansi"``) and the framerate at which your environment should be
+# rendered. Every environment should support ``None`` as render-mode; you
+# don’t need to add it in the metadata. In ``GridWorldEnv``, we will
+# support the modes “rgb_array” and “human” and render at 4 FPS.
+#
+# The ``__init__`` method of our environment will accept the integer
+# ``size``, that determines the size of the square grid. We will set up
+# some variables for rendering and define ``self.observation_space`` and
+# ``self.action_space``. In our case, observations should provide
+# information about the location of the agent and target on the
+# 2-dimensional grid. We will choose to represent observations in the form
+# of dictionaries with keys ``"agent"`` and ``"target"``. An observation
+# may look like ``{"agent": array([1, 0]), "target": array([0, 3])}``.
+# Since we have 4 actions in our environment (“right”, “up”, “left”,
+# “down”), we will use ``Discrete(4)`` as an action space. Here is the
+# declaration of ``GridWorldEnv`` and the implementation of ``__init__``:
+
+import numpy as np
+import pygame
+
+import gymnasium as gym
+from gymnasium import spaces
+
+
+class GridWorldEnv(gym.Env):
+    metadata = {"render_modes": ["human", "rgb_array"], "render_fps": 4}
+
+    def __init__(self, render_mode=None, size=5):
+        self.size = size  # The size of the square grid
+        self.window_size = 512  # The size of the PyGame window
+
+        # Observations are dictionaries with the agent's and the target's location.
+        # Each location is encoded as an element of {0, ..., `size`}^2, i.e. MultiDiscrete([size, size]).
+        self.observation_space = spaces.Dict(
+            {
+                "agent": spaces.Box(0, size - 1, shape=(2,), dtype=int),
+                "target": spaces.Box(0, size - 1, shape=(2,), dtype=int),
+            }
+        )
+
+        # We have 4 actions, corresponding to "right", "up", "left", "down"
+        self.action_space = spaces.Discrete(4)
+
+        """
+        The following dictionary maps abstract actions from `self.action_space` to
+        the direction we will walk in if that action is taken.
+        I.e. 0 corresponds to "right", 1 to "up" etc.
+        """
+        self._action_to_direction = {
+            0: np.array([1, 0]),
+            1: np.array([0, 1]),
+            2: np.array([-1, 0]),
+            3: np.array([0, -1]),
+        }
+
+        assert render_mode is None or render_mode in self.metadata["render_modes"]
+        self.render_mode = render_mode
+
+        """
+        If human-rendering is used, `self.window` will be a reference
+        to the window that we draw to. `self.clock` will be a clock that is used
+        to ensure that the environment is rendered at the correct framerate in
+        human-mode. They will remain `None` until human-mode is used for the
+        first time.
+        """
+        self.window = None
+        self.clock = None
+
+    # %%
+    # Constructing Observations From Environment States
+    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    #
+    # Since we will need to compute observations both in ``reset`` and
+    # ``step``, it is often convenient to have a (private) method ``_get_obs``
+    # that translates the environment’s state into an observation. However,
+    # this is not mandatory and you may as well compute observations in
+    # ``reset`` and ``step`` separately:
+
+    def _get_obs(self):
+        return {"agent": self._agent_location, "target": self._target_location}
+
+    # %%
+    # We can also implement a similar method for the auxiliary information
+    # that is returned by ``step`` and ``reset``. In our case, we would like
+    # to provide the manhattan distance between the agent and the target:
+
+    def _get_info(self):
+        return {
+            "distance": np.linalg.norm(
+                self._agent_location - self._target_location, ord=1
+            )
+        }
+
+    # %%
+    # Oftentimes, info will also contain some data that is only available
+    # inside the ``step`` method (e.g. individual reward terms). In that case,
+    # we would have to update the dictionary that is returned by ``_get_info``
+    # in ``step``.
+
+    # %%
+    # Reset
+    # ~~~~~
+    #
+    # The ``reset`` method will be called to initiate a new episode. You may
+    # assume that the ``step`` method will not be called before ``reset`` has
+    # been called. Moreover, ``reset`` should be called whenever a done signal
+    # has been issued. Users may pass the ``seed`` keyword to ``reset`` to
+    # initialize any random number generator that is used by the environment
+    # to a deterministic state. It is recommended to use the random number
+    # generator ``self.np_random`` that is provided by the environment’s base
+    # class, ``gymnasium.Env``. If you only use this RNG, you do not need to
+    # worry much about seeding, *but you need to remember to call
+    # ``super().reset(seed=seed)``* to make sure that ``gymnasium.Env``
+    # correctly seeds the RNG. Once this is done, we can randomly set the
+    # state of our environment. In our case, we randomly choose the agent’s
+    # location and the random sample target positions, until it does not
+    # coincide with the agent’s position.
+    #
+    # The ``reset`` method should return a tuple of the initial observation
+    # and some auxiliary information. We can use the methods ``_get_obs`` and
+    # ``_get_info`` that we implemented earlier for that:
+
+    def reset(self, seed=None, options=None):
+        # We need the following line to seed self.np_random
+        super().reset(seed=seed)
+
+        # Choose the agent's location uniformly at random
+        self._agent_location = self.np_random.integers(0, self.size, size=2, dtype=int)
+
+        # We will sample the target's location randomly until it does not coincide with the agent's location
+        self._target_location = self._agent_location
+        while np.array_equal(self._target_location, self._agent_location):
+            self._target_location = self.np_random.integers(
+                0, self.size, size=2, dtype=int
+            )
+
+        observation = self._get_obs()
+        info = self._get_info()
+
+        if self.render_mode == "human":
+            self._render_frame()
+
+        return observation, info
+
+    # %%
+    # Step
+    # ~~~~
+    #
+    # The ``step`` method usually contains most of the logic of your
+    # environment. It accepts an ``action``, computes the state of the
+    # environment after applying that action and returns the 4-tuple
+    # ``(observation, reward, done, info)``. Once the new state of the
+    # environment has been computed, we can check whether it is a terminal
+    # state and we set ``done`` accordingly. Since we are using sparse binary
+    # rewards in ``GridWorldEnv``, computing ``reward`` is trivial once we
+    # know ``done``. To gather ``observation`` and ``info``, we can again make
+    # use of ``_get_obs`` and ``_get_info``:
+
+    def step(self, action):
+        # Map the action (element of {0,1,2,3}) to the direction we walk in
+        direction = self._action_to_direction[action]
+        # We use `np.clip` to make sure we don't leave the grid
+        self._agent_location = np.clip(
+            self._agent_location + direction, 0, self.size - 1
+        )
+        # An episode is done iff the agent has reached the target
+        terminated = np.array_equal(self._agent_location, self._target_location)
+        reward = 1 if terminated else 0  # Binary sparse rewards
+        observation = self._get_obs()
+        info = self._get_info()
+
+        if self.render_mode == "human":
+            self._render_frame()
+
+        return observation, reward, terminated, False, info
+
+    # %%
+    # Rendering
+    # ~~~~~~~~~
+    #
+    # Here, we are using PyGame for rendering. A similar approach to rendering
+    # is used in many environments that are included with Gymnasium and you
+    # can use it as a skeleton for your own environments:
+
+    def render(self):
+        if self.render_mode == "rgb_array":
+            return self._render_frame()
+
+        def _render_frame(self):
+            if self.window is None and self.render_mode == "human":
+                pygame.init()
+                pygame.display.init()
+                self.window = pygame.display.set_mode(
+                    (self.window_size, self.window_size)
+                )
+            if self.clock is None and self.render_mode == "human":
+                self.clock = pygame.time.Clock()
+
+            canvas = pygame.Surface((self.window_size, self.window_size))
+            canvas.fill((255, 255, 255))
+            pix_square_size = (
+                self.window_size / self.size
+            )  # The size of a single grid square in pixels
+
+            # First we draw the target
+            pygame.draw.rect(
+                canvas,
+                (255, 0, 0),
+                pygame.Rect(
+                    pix_square_size * self._target_location,
+                    (pix_square_size, pix_square_size),
+                ),
+            )
+            # Now we draw the agent
+            pygame.draw.circle(
+                canvas,
+                (0, 0, 255),
+                (self._agent_location + 0.5) * pix_square_size,
+                pix_square_size / 3,
+            )
+
+            # Finally, add some gridlines
+            for x in range(self.size + 1):
+                pygame.draw.line(
+                    canvas,
+                    0,
+                    (0, pix_square_size * x),
+                    (self.window_size, pix_square_size * x),
+                    width=3,
+                )
+                pygame.draw.line(
+                    canvas,
+                    0,
+                    (pix_square_size * x, 0),
+                    (pix_square_size * x, self.window_size),
+                    width=3,
+                )
+
+            if self.render_mode == "human":
+                # The following line copies our drawings from `canvas` to the visible window
+                self.window.blit(canvas, canvas.get_rect())
+                pygame.event.pump()
+                pygame.display.update()
+
+                # We need to ensure that human-rendering occurs at the predefined framerate.
+                # The following line will automatically add a delay to keep the framerate stable.
+                self.clock.tick(self.metadata["render_fps"])
+            else:  # rgb_array
+                return np.transpose(
+                    np.array(pygame.surfarray.pixels3d(canvas)), axes=(1, 0, 2)
+                )
+
+    # %%
+    # Close
+    # ~~~~~
+    #
+    # The ``close`` method should close any open resources that were used by
+    # the environment. In many cases, you don’t actually have to bother to
+    # implement this method. However, in our example ``render_mode`` may be
+    # ``"human"`` and we might need to close the window that has been opened:
+
+    def close(self):
+        if self.window is not None:
+            pygame.display.quit()
+            pygame.quit()
+
+
+# %%
+# In other environments ``close`` might also close files that were opened
+# or release other resources. You shouldn’t interact with the environment
+# after having called ``close``.
+
+# %%
+# Registering Envs
+# ----------------
+#
+# In order for the custom environments to be detected by Gymnasium, they
+# must be registered as follows. We will choose to put this code in
+# ``gym-examples/gym_examples/__init__.py``.
+#
+# .. code:: python
+#
+#   from gymnasium.envs.registration import register
+#
+#   register(
+#        id="gym_examples/GridWorld-v0",
+#        entry_point="gym_examples.envs:GridWorldEnv",
+#        max_episode_steps=300,
+#   )
+
+# %%
+# The environment ID consists of three components, two of which are
+# optional: an optional namespace (here: ``gym_examples``), a mandatory
+# name (here: ``GridWorld``) and an optional but recommended version
+# (here: v0). It might have also been registered as ``GridWorld-v0`` (the
+# recommended approach), ``GridWorld`` or ``gym_examples/GridWorld``, and
+# the appropriate ID should then be used during environment creation.
+#
+# The keyword argument ``max_episode_steps=300`` will ensure that
+# GridWorld environments that are instantiated via ``gymnasium.make`` will
+# be wrapped in a ``TimeLimit`` wrapper (see `the wrapper
+# documentation </api/wrappers>`__ for more information). A done signal
+# will then be produced if the agent has reached the target *or* 300 steps
+# have been executed in the current episode. To distinguish truncation and
+# termination, you can check ``info["TimeLimit.truncated"]``.
+#
+# Apart from ``id`` and ``entrypoint``, you may pass the following
+# additional keyword arguments to ``register``:
+#
+# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
+# | Name                 | Type      | Default   | Description                                                                                                   |
+# +======================+===========+===========+===============================================================================================================+
+# | ``reward_threshold`` | ``float`` | ``None``  | The reward threshold before the task is  considered solved                                                    |
+# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
+# | ``nondeterministic`` | ``bool``  | ``False`` | Whether this environment is non-deterministic even after seeding                                              |
+# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
+# | ``max_episode_steps``| ``int``   | ``None``  | The maximum number of steps that an episode can consist of. If not ``None``, a ``TimeLimit`` wrapper is added |
+# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
+# | ``order_enforce``    | ``bool``  | ``True``  | Whether to wrap the environment in an  ``OrderEnforcing`` wrapper                                             |
+# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
+# | ``autoreset``        | ``bool``  | ``False`` | Whether to wrap the environment in an ``AutoResetWrapper``                                                    |
+# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
+# | ``kwargs``           | ``dict``  | ``{}``    | The default kwargs to pass to the environment class                                                           |
+# +----------------------+-----------+-----------+---------------------------------------------------------------------------------------------------------------+
+#
+# Most of these keywords (except for ``max_episode_steps``,
+# ``order_enforce`` and ``kwargs``) do not alter the behavior of
+# environment instances but merely provide some extra information about
+# your environment. After registration, our custom ``GridWorldEnv``
+# environment can be created with
+# ``env = gymnasium.make('gym_examples/GridWorld-v0')``.
+#
+# ``gym-examples/gym_examples/envs/__init__.py`` should have:
+#
+# .. code:: python
+#
+#    from gym_examples.envs.grid_world import GridWorldEnv
+#
+# If your environment is not registered, you may optionally pass a module
+# to import, that would register your environment before creating it like
+# this - ``env = gymnasium.make('module:Env-v0')``, where ``module``
+# contains the registration code. For the GridWorld env, the registration
+# code is run by importing ``gym_examples`` so if it were not possible to
+# import gym_examples explicitly, you could register while making by
+# ``env = gymnasium.make('gym_examples:gym_examples/GridWorld-v0)``. This
+# is especially useful when you’re allowed to pass only the environment ID
+# into a third-party codebase (eg. learning library). This lets you
+# register your environment without needing to edit the library’s source
+# code.
+
+# %%
+# Creating a Package
+# ------------------
+#
+# The last step is to structure our code as a Python package. This
+# involves configuring ``gym-examples/setup.py``. A minimal example of how
+# to do so is as follows:
+#
+# .. code:: python
+#
+#    from setuptools import setup
+#
+#    setup(
+#        name="gym_examples",
+#        version="0.0.1",
+#        install_requires=["gymnasium==0.26.0", "pygame==2.1.0"],
+#    )
+#
+# Creating Environment Instances
+# ------------------------------
+#
+# After you have installed your package locally with
+# ``pip install -e gym-examples``, you can create an instance of the
+# environment via:
+#
+# .. code:: python
+#
+#    import gym_examples
+#    env = gymnasium.make('gym_examples/GridWorld-v0')
+#
+# You can also pass keyword arguments of your environment’s constructor to
+# ``gymnasium.make`` to customize the environment. In our case, we could
+# do:
+#
+# .. code:: python
+#
+#    env = gymnasium.make('gym_examples/GridWorld-v0', size=10)
+#
+# Sometimes, you may find it more convenient to skip registration and call
+# the environment’s constructor yourself. Some may find this approach more
+# pythonic and environments that are instantiated like this are also
+# perfectly fine (but remember to add wrappers as well!).
+#
+# Using Wrappers
+# --------------
+#
+# Oftentimes, we want to use different variants of a custom environment,
+# or we want to modify the behavior of an environment that is provided by
+# Gymnasium or some other party. Wrappers allow us to do this without
+# changing the environment implementation or adding any boilerplate code.
+# Check out the `wrapper documentation </api/wrappers/>`__ for details on
+# how to use wrappers and instructions for implementing your own. In our
+# example, observations cannot be used directly in learning code because
+# they are dictionaries. However, we don’t actually need to touch our
+# environment implementation to fix this! We can simply add a wrapper on
+# top of environment instances to flatten observations into a single
+# array:
+#
+# .. code:: python
+#
+#    import gym_examples
+#    from gymnasium.wrappers import FlattenObservation
+#
+#    env = gymnasium.make('gym_examples/GridWorld-v0')
+#    wrapped_env = FlattenObservation(env)
+#    print(wrapped_env.reset())     # E.g.  [3 0 3 3], {}
+#
+# Wrappers have the big advantage that they make environments highly
+# modular. For instance, instead of flattening the observations from
+# GridWorld, you might only want to look at the relative position of the
+# target and the agent. In the section on
+# `ObservationWrappers </api/wrappers/#observationwrapper>`__ we have
+# implemented a wrapper that does this job. This wrapper is also available
+# in gym-examples:
+#
+# .. code:: python
+#
+#    import gym_examples
+#    from gym_examples.wrappers import RelativePosition
+#
+#    env = gymnasium.make('gym_examples/GridWorld-v0')
+#    wrapped_env = RelativePosition(env)
+#    print(wrapped_env.reset())     # E.g.  [-3  3], {}
--- a/docs/tutorials/handling_time_limits.py
+++ b/docs/tutorials/handling_time_limits.py
@@ -0,0 +1,80 @@
+"""
+Handling Time Limits
+====================
+
+In using Gymnasium environments with reinforcement learning code, a common problem observed is how time limits are incorrectly handled. The ``done`` signal received (in previous versions of OpenAI Gym < 0.26) from ``env.step`` indicated whether an episode has ended. However, this signal did not distinguish whether the episode ended due to ``termination`` or ``truncation``.
+
+Termination
+-----------
+
+Termination refers to the episode ending after reaching a terminal state that is defined as part of the environment
+definition. Examples are - task success, task failure, robot falling down etc. Notably, this also includes episodes
+ending in finite-horizon environments due to a time-limit inherent to the environment. Note that to preserve Markov
+property, a representation of the remaining time must be present in the agent's observation in finite-horizon environments.
+`(Reference) <https://arxiv.org/abs/1712.00378>`_
+
+Truncation
+----------
+
+Truncation refers to the episode ending after an externally defined condition (that is outside the scope of the Markov
+Decision Process). This could be a time-limit, a robot going out of bounds etc.
+
+An infinite-horizon environment is an obvious example of where this is needed. We cannot wait forever for the episode
+to complete, so we set a practical time-limit after which we forcibly halt the episode. The last state in this case is
+not a terminal state since it has a non-zero transition probability of moving to another state as per the Markov
+Decision Process that defines the RL problem. This is also different from time-limits in finite horizon environments
+as the agent in this case has no idea about this time-limit.
+"""
+
+# %%
+# Importance in learning code
+# ---------------------------
+# Bootstrapping (using one or more estimated values of a variable to update estimates of the same variable) is a key
+# aspect of Reinforcement Learning. A value function will tell you how much discounted reward you will get from a
+# particular state if you follow a given policy. When an episode stops at any given point, by looking at the value of
+# the final state, the agent is able to estimate how much discounted reward could have been obtained if the episode has
+# continued. This is an example of handling truncation.
+#
+# More formally, a common example of bootstrapping in RL is updating the estimate of the Q-value function,
+#
+# .. math::
+#   Q_{target}(o_t, a_t) = r_t + \gamma . \max_a(Q(o_{t+1}, a_{t+1}))
+#
+#
+# In classical RL, the new ``Q`` estimate is a weighted average of the previous ``Q`` estimate and ``Q_target`` while in Deep
+# Q-Learning, the error between ``Q_target`` and the previous ``Q`` estimate is minimized.
+#
+# However, at the terminal state, bootstrapping is not done,
+#
+# .. math::
+#   Q_{target}(o_t, a_t) = r_t
+#
+# This is where the distinction between termination and truncation becomes important. When an episode ends due to
+# termination we don't bootstrap, when it ends due to truncation, we bootstrap.
+#
+# While using gymnasium environments, the ``done`` signal (default for < v0.26) is frequently used to determine whether to
+# bootstrap or not. However, this is incorrect since it does not differentiate between termination and truncation.
+#
+# A simple example of value functions is shown below. This is an illustrative example and not part of any specific algorithm.
+#
+# .. code:: python
+#
+#   # INCORRECT
+#   vf_target = rew + gamma * (1 - done) * vf_next_state
+#
+# This is incorrect in the case of episode ending due to a truncation, where bootstrapping needs to happen but it doesn't.
+
+# %%
+# Solution
+# ----------
+#
+# From v0.26 onwards, Gymnasium's ``env.step`` API returns both termination and truncation information explicitly.
+# In the previous version truncation information was supplied through the info key ``TimeLimit.truncated``.
+# The correct way to handle terminations and truncations now is,
+#
+# .. code:: python
+#
+#   # terminated = done and 'TimeLimit.truncated' not in info
+#   # This was needed in previous versions.
+#
+#   vf_target = rew + gamma * (1 - terminated) * vf_next_state