Fix and Update Basic Usage's and Core page (#41)

2025-08-01 22:11:25 +00:00 · 2022-10-10 14:19:17 +02:00
parent 62732993b8
commit c2e2df2164
2 changed files with 27 additions and 27 deletions
--- a/docs/content/basic_usage.md
+++ b/docs/content/basic_usage.md
@@ -11,7 +11,7 @@ Initializing environments is very easy in Gymnasium and can be done via:

 ```python
 import gymnasium as gym
-env = gym.make('CartPole-v0')
+env = gym.make('CartPole-v1')
 ```

 ## Interacting with the Environment
@@ -38,7 +38,7 @@ alongside the observation for this timestep. The reward may also be negative or
 The agent will then be trained to maximize the reward it accumulates over many timesteps.

 After some timesteps, the environment may enter a terminal state. For instance, the robot may have crashed, or the agent may have succeeded in completing a task. In that case, we want to reset the environment to a new initial state. The environment issues a terminated signal to the agent if it enters such a terminal state. Sometimes we also want to end the episode after a fixed number of timesteps, in this case, the environment issues a truncated signal.
-This is a new change in API (v0.26 onwards). Earlier a common done signal was issued for an episode ending via any means. This is now changed in favour of issuing two signals - terminated and truncated.
+This is a new change in API (v0.26 onwards). Earlier a commonly done signal was issued for an episode ending via any means. This is now changed in favour of issuing two signals - terminated and truncated.

 Let's see what the agent-environment loop looks like in Gymnasium.
 This example will run an instance of `LunarLander-v2` environment for 1000 timesteps. Since we pass `render_mode="human"`, you should see a window pop up rendering the environment.
@@ -60,7 +60,7 @@ for _ in range(1000):
 env.close()
 ```

-The output should look something like this
+The output should look something like this:

 ```{figure} https://user-images.githubusercontent.com/15806078/153222406-af5ce6f0-4696-4a24-a683-46ad4939170c.gif
 :width: 50%
@@ -93,8 +93,8 @@ env = gym.make("CartPole-v1", apply_api_compatibility=True)
 ```
 This can also be done explicitly through a wrapper: 
 ```python
-from gymasium.wrappers import StepCompatibility
-env = StepCompatibility(CustomEnv(), output_truncation_bool=False)
+from gymnasium.wrappers import StepAPICompatibility
+env = StepAPICompatibility(CustomEnv(), output_truncation_bool=False)
 ```
 For more details see the wrappers section. 

@@ -131,7 +131,8 @@ There are multiple `Space` types available in Gymnasium:

 ```python
 >>> from gymnasium.spaces import Box, Discrete, Dict, Tuple, MultiBinary, MultiDiscrete
->>> 
+>>> import numpy as np 
+>>>
 >>> observation_space = Box(low=-1.0, high=2.0, shape=(3,), dtype=np.float32)
 >>> observation_space.sample()
 [ 1.6952509 -0.4399011 -0.7981693]
@@ -217,7 +218,7 @@ play(gymnasium.make('Pong-v0'))
 This opens a window of the environment and allows you to control the agent using your keyboard.

 Playing using the keyboard requires a key-action map. This map should have type `dict[tuple[int], int | None]`, which maps the keys pressed to action performed.
-For example, if pressing the keys `w` and `space` at the same time is supposed to perform action `2`, then the `key_to_action` dict should look like:
+For example, if pressing the keys `w` and `space` at the same time is supposed to perform action `2`, then the `key_to_action` dict should look like this:
 ```python
 {
    # ...
@@ -230,16 +231,23 @@ As a more complete example, let's say we wish to play with `CartPole-v0` using o
 import gymnasium as gym
 import pygame
 from gymnasium.utils.play import play
+
 mapping = {(pygame.K_LEFT,): 0, (pygame.K_RIGHT,): 1}
-play(gymnasium.make("CartPole-v0"), keys_to_action=mapping)
+play(gym.make("CartPole-v1",render_mode="rgb_array"), keys_to_action=mapping)
 ```
 where we obtain the corresponding key ID constants from pygame. If the `key_to_action` argument is not specified, then the default `key_to_action` mapping for that env is used, if provided.

 Furthermore, if you wish to plot real time statistics as you play, you can use `gymnasium.utils.play.PlayPlot`. Here's some sample code for plotting the reward for last 5 second of gameplay:
 ```python
+import gymnasium as gym
+import pygame
+from gymnasium.utils.play import PlayPlot, play
+
 def callback(obs_t, obs_tp1, action, rew, terminated, truncated, info):
-    return [rew,]
+    return [rew, ]
+
 plotter = PlayPlot(callback, 30 * 5, ["reward"])
-env = gymnasium.make("Pong-v0")
-play(env, callback=plotter.callback)
+mapping = {(pygame.K_LEFT,): 0, (pygame.K_RIGHT,): 1}
+env = gym.make("CartPole-v1", render_mode="rgb_array")
+play(env, callback=plotter.callback, keys_to_action=mapping)
 ```
--- a/gymnasium/core.py
+++ b/gymnasium/core.py
@@ -51,7 +51,7 @@ class Env(Generic[ObsType, ActType]):
    - :attr:`action_space` - The Space object corresponding to valid actions
    - :attr:`observation_space` - The Space object corresponding to valid observations
    - :attr:`reward_range` - A tuple corresponding to the minimum and maximum possible rewards
-    - :attr:`spec` - An environment spec that contains the information used to initialise the environment from `gym.make`
+    - :attr:`spec` - An environment spec that contains the information used to initialize the environment from `gymnasium.make`
    - :attr:`metadata` - The metadata of the environment, i.e. render modes
    - :attr:`np_random` - The random number generator for the environment

@@ -74,7 +74,7 @@ class Env(Generic[ObsType, ActType]):

    @property
    def np_random(self) -> np.random.Generator:
-        """Returns the environment's internal :attr:`_np_random` that if not set will initialise with a random seed."""
+        """Returns the environment's internal :attr:`_np_random` that if not set will initialize with a random seed."""
        if self._np_random is None:
            self._np_random, seed = seeding.np_random()
        return self._np_random
@@ -99,17 +99,13 @@ class Env(Generic[ObsType, ActType]):
            terminated (bool): whether a `terminal state` (as defined under the MDP of the task) is reached.
                In this case further step() calls could return undefined results.
            truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.
-                Typically a timelimit, but could also be used to indicate agent physically going out of bounds.
+                Typically a timelimit, but could also be used to indicate an agent physically going out of bounds.
                Can be used to end the episode prematurely before a `terminal state` is reached.
            info (dictionary): `info` contains auxiliary diagnostic information (helpful for debugging, learning, and logging).
                This might, for instance, contain: metrics that describe the agent's performance state, variables that are
                hidden from observations, or individual reward terms that are combined to produce the total reward.
-                It also can contain information that distinguishes truncation and termination, however this is deprecated in favour
+                It also can contain information that distinguishes truncation and termination, however, this is deprecated in favor
                of returning two booleans, and will be removed in a future version.
-            done (bool): (Deprecated) A boolean value for if the episode has ended, in which case further :meth:`step` calls will
-                return undefined results.
-                A done signal may be emitted for different reasons: Maybe the task underlying the environment was solved successfully,
-                a certain timelimit was exceeded, or the physics simulation has entered an invalid state.
        """
        raise NotImplementedError

@@ -175,11 +171,7 @@ class Env(Generic[ObsType, ActType]):
        raise NotImplementedError

    def close(self):
-        """Override close in your subclass to perform any necessary cleanup.
-
-        Environments will automatically :meth:`close()` themselves when
-        garbage collected or when the program exits.
-        """
+        """Override close in your subclass to perform any necessary cleanup."""
        pass

    @property
@@ -187,7 +179,7 @@ class Env(Generic[ObsType, ActType]):
        """Returns the base non-wrapped environment.

        Returns:
-            Env: The base non-wrapped gym.Env instance
+            Env: The base non-wrapped gymnasium.Env instance
        """
        return self

@@ -349,7 +341,7 @@ class ObservationWrapper(Wrapper):
    """Superclass of wrappers that can modify observations using :meth:`observation` for :meth:`reset` and :meth:`step`.

    If you would like to apply a function to the observation that is returned by the base environment before
-    passing it to learning code, you can simply inherit from :class:`ObservationWrapper` and overwrite the method
+    passing it to the learning code, you can simply inherit from :class:`ObservationWrapper` and overwrite the method
    :meth:`observation` to implement that transformation. The transformation defined in that method must be
    defined on the base environment’s observation space. However, it may take values in a different space.
    In that case, you need to specify the new observation space of the wrapper by setting :attr:`self.observation_space`
@@ -401,7 +393,7 @@ class RewardWrapper(Wrapper):
    because it is intrinsic), we want to clip the reward to a range to gain some numerical stability.
    To do that, we could, for instance, implement the following wrapper::

-        class ClipReward(gymnasium.RewardWrapper):
+        class ClipReward(gym.RewardWrapper):
            def __init__(self, env, min_reward, max_reward):
                super().__init__(env)
                self.min_reward = min_reward