mirror of
https://github.com/Farama-Foundation/Gymnasium.git
synced 2025-08-01 22:11:25 +00:00
Fix and Update Basic Usage's and Core page (#41)
This commit is contained in:
committed by
GitHub
parent
62732993b8
commit
c2e2df2164
@@ -11,7 +11,7 @@ Initializing environments is very easy in Gymnasium and can be done via:
|
||||
|
||||
```python
|
||||
import gymnasium as gym
|
||||
env = gym.make('CartPole-v0')
|
||||
env = gym.make('CartPole-v1')
|
||||
```
|
||||
|
||||
## Interacting with the Environment
|
||||
@@ -38,7 +38,7 @@ alongside the observation for this timestep. The reward may also be negative or
|
||||
The agent will then be trained to maximize the reward it accumulates over many timesteps.
|
||||
|
||||
After some timesteps, the environment may enter a terminal state. For instance, the robot may have crashed, or the agent may have succeeded in completing a task. In that case, we want to reset the environment to a new initial state. The environment issues a terminated signal to the agent if it enters such a terminal state. Sometimes we also want to end the episode after a fixed number of timesteps, in this case, the environment issues a truncated signal.
|
||||
This is a new change in API (v0.26 onwards). Earlier a common done signal was issued for an episode ending via any means. This is now changed in favour of issuing two signals - terminated and truncated.
|
||||
This is a new change in API (v0.26 onwards). Earlier a commonly done signal was issued for an episode ending via any means. This is now changed in favour of issuing two signals - terminated and truncated.
|
||||
|
||||
Let's see what the agent-environment loop looks like in Gymnasium.
|
||||
This example will run an instance of `LunarLander-v2` environment for 1000 timesteps. Since we pass `render_mode="human"`, you should see a window pop up rendering the environment.
|
||||
@@ -60,7 +60,7 @@ for _ in range(1000):
|
||||
env.close()
|
||||
```
|
||||
|
||||
The output should look something like this
|
||||
The output should look something like this:
|
||||
|
||||
```{figure} https://user-images.githubusercontent.com/15806078/153222406-af5ce6f0-4696-4a24-a683-46ad4939170c.gif
|
||||
:width: 50%
|
||||
@@ -93,8 +93,8 @@ env = gym.make("CartPole-v1", apply_api_compatibility=True)
|
||||
```
|
||||
This can also be done explicitly through a wrapper:
|
||||
```python
|
||||
from gymasium.wrappers import StepCompatibility
|
||||
env = StepCompatibility(CustomEnv(), output_truncation_bool=False)
|
||||
from gymnasium.wrappers import StepAPICompatibility
|
||||
env = StepAPICompatibility(CustomEnv(), output_truncation_bool=False)
|
||||
```
|
||||
For more details see the wrappers section.
|
||||
|
||||
@@ -131,7 +131,8 @@ There are multiple `Space` types available in Gymnasium:
|
||||
|
||||
```python
|
||||
>>> from gymnasium.spaces import Box, Discrete, Dict, Tuple, MultiBinary, MultiDiscrete
|
||||
>>>
|
||||
>>> import numpy as np
|
||||
>>>
|
||||
>>> observation_space = Box(low=-1.0, high=2.0, shape=(3,), dtype=np.float32)
|
||||
>>> observation_space.sample()
|
||||
[ 1.6952509 -0.4399011 -0.7981693]
|
||||
@@ -217,7 +218,7 @@ play(gymnasium.make('Pong-v0'))
|
||||
This opens a window of the environment and allows you to control the agent using your keyboard.
|
||||
|
||||
Playing using the keyboard requires a key-action map. This map should have type `dict[tuple[int], int | None]`, which maps the keys pressed to action performed.
|
||||
For example, if pressing the keys `w` and `space` at the same time is supposed to perform action `2`, then the `key_to_action` dict should look like:
|
||||
For example, if pressing the keys `w` and `space` at the same time is supposed to perform action `2`, then the `key_to_action` dict should look like this:
|
||||
```python
|
||||
{
|
||||
# ...
|
||||
@@ -230,16 +231,23 @@ As a more complete example, let's say we wish to play with `CartPole-v0` using o
|
||||
import gymnasium as gym
|
||||
import pygame
|
||||
from gymnasium.utils.play import play
|
||||
|
||||
mapping = {(pygame.K_LEFT,): 0, (pygame.K_RIGHT,): 1}
|
||||
play(gymnasium.make("CartPole-v0"), keys_to_action=mapping)
|
||||
play(gym.make("CartPole-v1",render_mode="rgb_array"), keys_to_action=mapping)
|
||||
```
|
||||
where we obtain the corresponding key ID constants from pygame. If the `key_to_action` argument is not specified, then the default `key_to_action` mapping for that env is used, if provided.
|
||||
|
||||
Furthermore, if you wish to plot real time statistics as you play, you can use `gymnasium.utils.play.PlayPlot`. Here's some sample code for plotting the reward for last 5 second of gameplay:
|
||||
```python
|
||||
import gymnasium as gym
|
||||
import pygame
|
||||
from gymnasium.utils.play import PlayPlot, play
|
||||
|
||||
def callback(obs_t, obs_tp1, action, rew, terminated, truncated, info):
|
||||
return [rew,]
|
||||
return [rew, ]
|
||||
|
||||
plotter = PlayPlot(callback, 30 * 5, ["reward"])
|
||||
env = gymnasium.make("Pong-v0")
|
||||
play(env, callback=plotter.callback)
|
||||
mapping = {(pygame.K_LEFT,): 0, (pygame.K_RIGHT,): 1}
|
||||
env = gym.make("CartPole-v1", render_mode="rgb_array")
|
||||
play(env, callback=plotter.callback, keys_to_action=mapping)
|
||||
```
|
||||
|
@@ -51,7 +51,7 @@ class Env(Generic[ObsType, ActType]):
|
||||
- :attr:`action_space` - The Space object corresponding to valid actions
|
||||
- :attr:`observation_space` - The Space object corresponding to valid observations
|
||||
- :attr:`reward_range` - A tuple corresponding to the minimum and maximum possible rewards
|
||||
- :attr:`spec` - An environment spec that contains the information used to initialise the environment from `gym.make`
|
||||
- :attr:`spec` - An environment spec that contains the information used to initialize the environment from `gymnasium.make`
|
||||
- :attr:`metadata` - The metadata of the environment, i.e. render modes
|
||||
- :attr:`np_random` - The random number generator for the environment
|
||||
|
||||
@@ -74,7 +74,7 @@ class Env(Generic[ObsType, ActType]):
|
||||
|
||||
@property
|
||||
def np_random(self) -> np.random.Generator:
|
||||
"""Returns the environment's internal :attr:`_np_random` that if not set will initialise with a random seed."""
|
||||
"""Returns the environment's internal :attr:`_np_random` that if not set will initialize with a random seed."""
|
||||
if self._np_random is None:
|
||||
self._np_random, seed = seeding.np_random()
|
||||
return self._np_random
|
||||
@@ -99,17 +99,13 @@ class Env(Generic[ObsType, ActType]):
|
||||
terminated (bool): whether a `terminal state` (as defined under the MDP of the task) is reached.
|
||||
In this case further step() calls could return undefined results.
|
||||
truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.
|
||||
Typically a timelimit, but could also be used to indicate agent physically going out of bounds.
|
||||
Typically a timelimit, but could also be used to indicate an agent physically going out of bounds.
|
||||
Can be used to end the episode prematurely before a `terminal state` is reached.
|
||||
info (dictionary): `info` contains auxiliary diagnostic information (helpful for debugging, learning, and logging).
|
||||
This might, for instance, contain: metrics that describe the agent's performance state, variables that are
|
||||
hidden from observations, or individual reward terms that are combined to produce the total reward.
|
||||
It also can contain information that distinguishes truncation and termination, however this is deprecated in favour
|
||||
It also can contain information that distinguishes truncation and termination, however, this is deprecated in favor
|
||||
of returning two booleans, and will be removed in a future version.
|
||||
done (bool): (Deprecated) A boolean value for if the episode has ended, in which case further :meth:`step` calls will
|
||||
return undefined results.
|
||||
A done signal may be emitted for different reasons: Maybe the task underlying the environment was solved successfully,
|
||||
a certain timelimit was exceeded, or the physics simulation has entered an invalid state.
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
@@ -175,11 +171,7 @@ class Env(Generic[ObsType, ActType]):
|
||||
raise NotImplementedError
|
||||
|
||||
def close(self):
|
||||
"""Override close in your subclass to perform any necessary cleanup.
|
||||
|
||||
Environments will automatically :meth:`close()` themselves when
|
||||
garbage collected or when the program exits.
|
||||
"""
|
||||
"""Override close in your subclass to perform any necessary cleanup."""
|
||||
pass
|
||||
|
||||
@property
|
||||
@@ -187,7 +179,7 @@ class Env(Generic[ObsType, ActType]):
|
||||
"""Returns the base non-wrapped environment.
|
||||
|
||||
Returns:
|
||||
Env: The base non-wrapped gym.Env instance
|
||||
Env: The base non-wrapped gymnasium.Env instance
|
||||
"""
|
||||
return self
|
||||
|
||||
@@ -349,7 +341,7 @@ class ObservationWrapper(Wrapper):
|
||||
"""Superclass of wrappers that can modify observations using :meth:`observation` for :meth:`reset` and :meth:`step`.
|
||||
|
||||
If you would like to apply a function to the observation that is returned by the base environment before
|
||||
passing it to learning code, you can simply inherit from :class:`ObservationWrapper` and overwrite the method
|
||||
passing it to the learning code, you can simply inherit from :class:`ObservationWrapper` and overwrite the method
|
||||
:meth:`observation` to implement that transformation. The transformation defined in that method must be
|
||||
defined on the base environment’s observation space. However, it may take values in a different space.
|
||||
In that case, you need to specify the new observation space of the wrapper by setting :attr:`self.observation_space`
|
||||
@@ -401,7 +393,7 @@ class RewardWrapper(Wrapper):
|
||||
because it is intrinsic), we want to clip the reward to a range to gain some numerical stability.
|
||||
To do that, we could, for instance, implement the following wrapper::
|
||||
|
||||
class ClipReward(gymnasium.RewardWrapper):
|
||||
class ClipReward(gym.RewardWrapper):
|
||||
def __init__(self, env, min_reward, max_reward):
|
||||
super().__init__(env)
|
||||
self.min_reward = min_reward
|
||||
|
Reference in New Issue
Block a user