Updates the Env, Wrapper and Vector API documentation (#48)

This commit is contained in:
Mark Towers
2022-10-12 15:58:01 +01:00
committed by GitHub
parent 9070295cc9
commit 2a7ebc4271
122 changed files with 1412 additions and 1179 deletions

View File

@@ -1,85 +0,0 @@
# Core
## gymnasium.Env
```{eval-rst}
.. autofunction:: gymnasium.Env.step
```
```{eval-rst}
.. autofunction:: gymnasium.Env.reset
```
```{eval-rst}
.. autofunction:: gymnasium.Env.render
```
### Attributes
```{eval-rst}
.. autoattribute:: gymnasium.Env.action_space
This attribute gives the format of valid actions. It is of datatype `Space` provided by Gymnasium. For example, if the action space is of type `Discrete` and gives the value `Discrete(2)`, this means there are two valid discrete actions: 0 & 1.
.. code::
>>> env.action_space
Discrete(2)
>>> env.observation_space
Box(-3.4028234663852886e+38, 3.4028234663852886e+38, (4,), float32)
```
```{eval-rst}
.. autoattribute:: gymnasium.Env.observation_space
This attribute gives the format of valid observations. It is of datatype :class:`Space` provided by Gymnasium. For example, if the observation space is of type :class:`Box` and the shape of the object is ``(4,)``, this denotes a valid observation will be an array of 4 numbers. We can check the box bounds as well with attributes.
.. code::
>>> env.observation_space.high
array([4.8000002e+00, 3.4028235e+38, 4.1887903e-01, 3.4028235e+38], dtype=float32)
>>> env.observation_space.low
array([-4.8000002e+00, -3.4028235e+38, -4.1887903e-01, -3.4028235e+38], dtype=float32)
```
```{eval-rst}
.. autoattribute:: gymnasium.Env.reward_range
This attribute is a tuple corresponding to min and max possible rewards. Default range is set to ``(-inf,+inf)``. You can set it if you want a narrower range.
```
### Additional Methods
```{eval-rst}
.. autofunction:: gymnasium.Env.close
```
```{eval-rst}
.. autofunction:: gymnasium.Env.seed
```
## gymnasium.Wrapper
```{eval-rst}
.. autoclass:: gymnasium.Wrapper
```
## gymnasium.ObservationWrapper
```{eval-rst}
.. autoclass:: gymnasium.ObservationWrapper
```
## gymnasium.RewardWrapper
```{eval-rst}
.. autoclass:: gymnasium.RewardWrapper
```
## gymnasium.ActionWrapper
```{eval-rst}
.. autoclass:: gymnasium.ActionWrapper
```

79
docs/api/env.md Normal file
View File

@@ -0,0 +1,79 @@
---
title: Utils
---
# Env
## gymnasium.Env
```{eval-rst}
.. autoclass:: gymnasium.Env
```
### Methods
```{eval-rst}
.. autofunction:: gymnasium.Env.step
.. autofunction:: gymnasium.Env.reset
.. autofunction:: gymnasium.Env.render
```
### Attributes
```{eval-rst}
.. autoattribute:: gymnasium.Env.action_space
The Space object corresponding to valid actions, all valid actions should be contained with the space. For example, if the action space is of type `Discrete` and gives the value `Discrete(2)`, this means there are two valid discrete actions: 0 & 1.
.. code::
>>> env.action_space
Discrete(2)
>>> env.observation_space
Box(-3.4028234663852886e+38, 3.4028234663852886e+38, (4,), float32)
.. autoattribute:: gymnasium.Env.observation_space
The Space object corresponding to valid observations, all valid observations should be contained with the space. For example, if the observation space is of type :class:`Box` and the shape of the object is ``(4,)``, this denotes a valid observation will be an array of 4 numbers. We can check the box bounds as well with attributes.
.. code::
>>> env.observation_space.high
array([4.8000002e+00, 3.4028235e+38, 4.1887903e-01, 3.4028235e+38], dtype=float32)
>>> env.observation_space.low
array([-4.8000002e+00, -3.4028235e+38, -4.1887903e-01, -3.4028235e+38], dtype=float32)
.. autoattribute:: gymnasium.Env.metadata
The metadata of the environment containing rendering modes, rendering fps, etc
.. autoattribute:: gymnasium.Env.render_mode
The render mode of the environment determined at initialisation
.. autoattribute:: gymnasium.Env.reward_range
A tuple corresponding to the minimum and maximum possible rewards for an agent over an episode. The default reward range is set to :math:`(-\infty,+\infty)`.
.. autoattribute:: gymnasium.Env.spec
The ``EnvSpec`` of the environment normally set during :py:meth:`gymnasium.make`
```
### Additional Methods
```{eval-rst}
.. autofunction:: gymnasium.Env.close
.. autoproperty:: gymnasium.Env.unwrapped
.. autoproperty:: gymnasium.Env.np_random
```
### Implementing environments
```{eval-rst}
.. py:currentmodule:: gymnasium
When implementing an environment, the :meth:Env.reset and :meth:`Env.step` functions much be created describing the
dynamics of the environment.
For more information see the environment creation tutorial.
```

31
docs/api/registry.md Normal file
View File

@@ -0,0 +1,31 @@
---
title: Registry
---
# Registry
Gymnasium allows users to automatically load environments, pre-wrapped with several important wrappers.
Environments can also be created through python imports.
## Make
```{eval-rst}
.. autofunction:: gymnasium.make
```
## Register
```{eval-rst}
.. autofunction:: gymnasium.register
```
## All registered environments
To find all the registered Gymnasium environments, use the `gymnasium.envs.registry.keys()`.
This will not include environments registered only in OpenAI Gym however can be loaded by `gymnasium.make`.
## Spec
```{eval-rst}
.. autofunction:: gymnasium.spec
```

View File

@@ -1,3 +1,8 @@
---
title: Spaces
---
# Spaces # Spaces
```{toctree} ```{toctree}
@@ -5,37 +10,38 @@
spaces/fundamental spaces/fundamental
spaces/composite spaces/composite
spaces/utils spaces/utils
spaces/vector_utils
``` ```
```{eval-rst} ```{eval-rst}
.. autoclass:: gymnasium.spaces.Space .. autoclass:: gymnasium.spaces.Space
``` ```
## General Functions ## Attributes
```{eval-rst}
.. autoproperty:: gymnasium.spaces.space.Space.shape
.. property:: Space.dtype
Return the data type of this space.
```
## Methods
Each space implements the following functions: Each space implements the following functions:
```{eval-rst} ```{eval-rst}
.. autofunction:: gymnasium.spaces.Space.sample .. autofunction:: gymnasium.spaces.space.Space.sample
.. autofunction:: gymnasium.spaces.space.Space.contains
.. autofunction:: gymnasium.spaces.Space.contains .. autofunction:: gymnasium.spaces.space.Space.seed
.. autofunction:: gymnasium.spaces.space.Space.to_jsonable
.. autoproperty:: gymnasium.spaces.Space.shape .. autofunction:: gymnasium.spaces.space.Space.from_jsonable
.. property:: gymnasium.spaces.Space.dtype
Return the data type of this space.
.. autofunction:: gymnasium.spaces.Space.seed
.. autofunction:: gymnasium.spaces.Space.to_jsonable
.. autofunction:: gymnasium.spaces.Space.from_jsonable
``` ```
## Fundamental Spaces ## Fundamental Spaces
Gymnasium has a number of fundamental spaces that are used as building boxes for more complex spaces.
```{eval-rst} ```{eval-rst}
.. currentmodule:: gymnasium.spaces .. currentmodule:: gymnasium.spaces
@@ -48,6 +54,8 @@ Each space implements the following functions:
## Composite Spaces ## Composite Spaces
Often environment spaces require joining fundamental spaces together for vectorised environments, separate agents or readability of the space.
```{eval-rst} ```{eval-rst}
* :py:class:`Dict` - Supports a dictionary of keys and subspaces, used for a fixed number of unordered spaces * :py:class:`Dict` - Supports a dictionary of keys and subspaces, used for a fixed number of unordered spaces
* :py:class:`Tuple` - Supports a tuple of subspaces, used for multiple for a fixed number of ordered spaces * :py:class:`Tuple` - Supports a tuple of subspaces, used for multiple for a fixed number of ordered spaces
@@ -57,9 +65,29 @@ Each space implements the following functions:
## Utils ## Utils
Gymnasium contains a number of helpful utility functions for flattening and unflattening spaces.
This can be important for passing information to neural networks.
```{eval-rst} ```{eval-rst}
* :py:class:`utils.flatdim` - The number of dimensions the flattened space will contain * :py:class:`utils.flatdim` - The number of dimensions the flattened space will contain
* :py:class:`utils.flatten_space` - Flattens a space for which the `flattened` space instances will contain * :py:class:`utils.flatten_space` - Flattens a space for which the `flattened` space instances will contain
* :py:class:`utils.flatten` - Flattens an instance of a space that is contained within the flattened version of the space * :py:class:`utils.flatten` - Flattens an instance of a space that is contained within the flattened version of the space
* :py:class:`utils.unflatten` - The reverse of the `flatten_space` function * :py:class:`utils.unflatten` - The reverse of the `flatten_space` function
``` ```
## Vector Utils
When vectorizing environments, it is necessary to modify the observation and action spaces for new batched spaces sizes.
Therefore, Gymnasium provides a number of additional functions used when using a space with a Vector environment.
```{eval-rst}
.. currentmodule:: gymnasium
* :py:class:`vector.utils.batch_space`
* :py:class:`vector.utils.concatenate`
* :py:class:`vector.utils.iterate`
* :py:class:`vector.utils.create_empty_array`
* :py:class:`vector.utils.create_shared_memory`
* :py:class:`vector.utils.read_from_shared_memory`
* :py:class:`vector.utils.write_to_shared_memory`
```

View File

@@ -5,7 +5,8 @@
```{eval-rst} ```{eval-rst}
.. autoclass:: gymnasium.spaces.Dict .. autoclass:: gymnasium.spaces.Dict
.. automethod:: sample .. automethod:: gymnasium.spaces.Dict.sample
.. automethod:: gymnasium.spaces.Dict.seed
``` ```
## Tuple ## Tuple
@@ -13,7 +14,8 @@
```{eval-rst} ```{eval-rst}
.. autoclass:: gymnasium.spaces.Tuple .. autoclass:: gymnasium.spaces.Tuple
.. automethod:: sample .. automethod:: gymnasium.spaces.Tuple.sample
.. automethod:: gymnasium.spaces.Tuple.seed
``` ```
## Sequence ## Sequence
@@ -21,7 +23,8 @@
```{eval-rst} ```{eval-rst}
.. autoclass:: gymnasium.spaces.Sequence .. autoclass:: gymnasium.spaces.Sequence
.. automethod:: sample .. automethod:: gymnasium.spaces.Sequence.sample
.. automethod:: gymnasium.spaces.Sequence.seed
``` ```
## Graph ## Graph
@@ -29,5 +32,6 @@
```{eval-rst} ```{eval-rst}
.. autoclass:: gymnasium.spaces.Graph .. autoclass:: gymnasium.spaces.Graph
.. automethod:: sample .. automethod:: gymnasium.spaces.Graph.sample
.. automethod:: gymnasium.spaces.Graph.seed
``` ```

View File

@@ -9,24 +9,25 @@ title: Fundamental Spaces
```{eval-rst} ```{eval-rst}
.. autoclass:: gymnasium.spaces.Box .. autoclass:: gymnasium.spaces.Box
.. automethod:: is_bounded .. automethod:: gymnasium.spaces.Box.sample
.. automethod:: sample .. automethod:: gymnasium.spaces.Box.seed
.. automethod:: gymnasium.spaces.Box.is_bounded
``` ```
## Discrete ## Discrete
```{eval-rst} ```{eval-rst}
.. autoclass:: gymnasium.spaces.Discrete .. autoclass:: gymnasium.spaces.Discrete
.. automethod:: gymnasium.spaces.Discrete.sample
.. automethod:: sample .. automethod:: gymnasium.spaces.Discrete.seed
``` ```
## MultiBinary ## MultiBinary
```{eval-rst} ```{eval-rst}
.. autoclass:: gymnasium.spaces.MultiBinary .. autoclass:: gymnasium.spaces.MultiBinary
.. automethod:: gymnasium.spaces.MultiBinary.sample
.. automethod:: sample .. automethod:: gymnasium.spaces.MultiBinary.seed
``` ```
## MultiDiscrete ## MultiDiscrete
@@ -34,7 +35,8 @@ title: Fundamental Spaces
```{eval-rst} ```{eval-rst}
.. autoclass:: gymnasium.spaces.MultiDiscrete .. autoclass:: gymnasium.spaces.MultiDiscrete
.. automethod:: sample .. automethod:: gymnasium.spaces.MultiDiscrete.sample
.. automethod:: gymnasium.spaces.MultiDiscrete.seed
``` ```
## Text ## Text
@@ -42,5 +44,6 @@ title: Fundamental Spaces
```{eval-rst} ```{eval-rst}
.. autoclass:: gymnasium.spaces.Text .. autoclass:: gymnasium.spaces.Text
.. automethod:: sample .. automethod:: gymnasium.spaces.Text.sample
.. automethod:: gymnasium.spaces.Text.seed
``` ```

View File

@@ -6,10 +6,7 @@ title: Utils
```{eval-rst} ```{eval-rst}
.. autofunction:: gymnasium.spaces.utils.flatdim .. autofunction:: gymnasium.spaces.utils.flatdim
.. autofunction:: gymnasium.spaces.utils.flatten_space .. autofunction:: gymnasium.spaces.utils.flatten_space
.. autofunction:: gymnasium.spaces.utils.flatten .. autofunction:: gymnasium.spaces.utils.flatten
.. autofunction:: gymnasium.spaces.utils.unflatten .. autofunction:: gymnasium.spaces.utils.unflatten
``` ```

View File

@@ -0,0 +1,20 @@
---
title: Vector Utils
---
# Spaces Vector Utils
```{eval-rst}
.. autofunction:: gymnasium.vector.utils.batch_space
.. autofunction:: gymnasium.vector.utils.concatenate
.. autofunction:: gymnasium.vector.utils.iterate
```
## Shared Memory Utils
```{eval-rst}
.. autofunction:: gymnasium.vector.utils.create_empty_array
.. autofunction:: gymnasium.vector.utils.create_shared_memory
.. autofunction:: gymnasium.vector.utils.read_from_shared_memory
.. autofunction:: gymnasium.vector.utils.write_to_shared_memory
```

View File

@@ -7,32 +7,29 @@ title: Utils
## Visualization ## Visualization
```{eval-rst} ```{eval-rst}
.. autoclass:: gymnasium.utils.play.PlayableGame .. autofunction:: gymnasium.utils.play.play
.. automethod:: process_event
.. autoclass:: gymnasium.utils.play.PlayPlot .. autoclass:: gymnasium.utils.play.PlayPlot
.. automethod:: callback .. automethod:: callback
.. autofunction:: gymnasium.utils.play.display_arr .. autoclass:: gymnasium.utils.play.PlayableGame
.. autofunction:: gymnasium.utils.play.play
.. automethod:: process_event
``` ```
## Save Rendering Videos ## Save Rendering Videos
```{eval-rst} ```{eval-rst}
.. autofunction:: gymnasium.utils.save_video.capped_cubic_video_schedule
.. autofunction:: gymnasium.utils.save_video.save_video .. autofunction:: gymnasium.utils.save_video.save_video
.. autofunction:: gymnasium.utils.save_video.capped_cubic_video_schedule
``` ```
## Old to New Step API Compatibility ## Old to New Step API Compatibility
```{eval-rst} ```{eval-rst}
.. autofunction:: gymnasium.utils.step_api_compatibility.step_api_compatibility
.. autofunction:: gymnasium.utils.step_api_compatibility.convert_to_terminated_truncated_step_api .. autofunction:: gymnasium.utils.step_api_compatibility.convert_to_terminated_truncated_step_api
.. autofunction:: gymnasium.utils.step_api_compatibility.convert_to_done_step_api .. autofunction:: gymnasium.utils.step_api_compatibility.convert_to_done_step_api
.. autofunction:: gymnasium.utils.step_api_compatibility.step_api_compatibility
``` ```
## Seeding ## Seeding
@@ -43,16 +40,6 @@ title: Utils
## Environment Checking ## Environment Checking
### Invasive
```{eval-rst} ```{eval-rst}
.. autofunction:: gymnasium.utils.env_checker.check_env .. autofunction:: gymnasium.utils.env_checker.check_env
.. autofunction:: gymnasium.utils.env_checker.data_equivalence
.. autofunction:: gymnasium.utils.env_checker.check_reset_seed
.. autofunction:: gymnasium.utils.env_checker.check_reset_options
.. autofunction:: gymnasium.utils.env_checker.check_reset_return_info_deprecation
.. autofunction:: gymnasium.utils.env_checker.check_seed_deprecation
.. autofunction:: gymnasium.utils.env_checker.check_reset_return_type
.. autofunction:: gymnasium.utils.env_checker.check_space_limit
``` ```

View File

@@ -4,15 +4,26 @@ title: Vector
# Vector # Vector
```{eval-rst} ## Gymnasium.vector.VectorEnv
.. autofunction:: gymnasium.vector.make
```
## VectorEnv
```{eval-rst} ```{eval-rst}
.. attribute:: gymnasium.vector.VectorEnv.action_space .. autoclass:: gymnasium.vector.VectorEnv
```
### Methods
```{eval-rst}
.. automethod:: gymnasium.vector.VectorEnv.reset
.. automethod:: gymnasium.vector.VectorEnv.step
.. automethod:: gymnasium.vector.VectorEnv.close
```
### Attributes
```{eval-rst}
.. attribute:: action_space
The (batched) action space. The input actions of `step` must be valid elements of `action_space`.:: The (batched) action space. The input actions of `step` must be valid elements of `action_space`.::
@@ -20,7 +31,7 @@ title: Vector
>>> envs.action_space >>> envs.action_space
MultiDiscrete([2 2 2]) MultiDiscrete([2 2 2])
.. attribute:: gymnasium.vector.VectorEnv.observation_space .. attribute:: observation_space
The (batched) observation space. The observations returned by `reset` and `step` are valid elements of `observation_space`.:: The (batched) observation space. The observations returned by `reset` and `step` are valid elements of `observation_space`.::
@@ -28,7 +39,7 @@ title: Vector
>>> envs.observation_space >>> envs.observation_space
Box([[-4.8 ...]], [[4.8 ...]], (3, 4), float32) Box([[-4.8 ...]], [[4.8 ...]], (3, 4), float32)
.. attribute:: gymnasium.vector.VectorEnv.single_action_space .. attribute:: single_action_space
The action space of an environment copy.:: The action space of an environment copy.::
@@ -36,55 +47,29 @@ title: Vector
>>> envs.single_action_space >>> envs.single_action_space
Discrete(2) Discrete(2)
.. attribute:: gymnasium.vector.VectorEnv.single_observation_space .. attribute:: single_observation_space
The observation space of an environment copy.:: The observation space of an environment copy.::
>>> envs = gymnasium.vector.make("CartPole-v1", num_envs=3) >>> envs = gymnasium.vector.make("CartPole-v1", num_envs=3)
>>> envs.single_action_space >>> envs.single_action_space
Box([-4.8 ...], [4.8 ...], (4,), float32) Box([-4.8 ...], [4.8 ...], (4,), float32)
``` ```
## Making Vector Environments
### Reset
```{eval-rst} ```{eval-rst}
.. automethod:: gymnasium.vector.VectorEnv.reset .. autofunction:: gymnasium.vector.make
```
```python
>>> import gymnasium as gym
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.reset()
(array([[-0.02240574, -0.03439831, -0.03904812, 0.02810693],
[ 0.01586068, 0.01929009, 0.02394426, 0.04016077],
[-0.01314174, 0.03893502, -0.02400815, 0.0038326 ]],
dtype=float32), {})
``` ```
### Step
## Async Vector Env
```{eval-rst} ```{eval-rst}
.. automethod:: gymnasium.vector.VectorEnv.step .. autoclass:: gymnasium.vector.AsyncVectorEnv
``` ```
```python ## Sync Vector Env
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.reset() ```{eval-rst}
>>> actions = np.array([1, 0, 1]) .. autoclass:: gymnasium.vector.SyncVectorEnv
>>> observations, rewards, termination, truncation, infos = envs.step(actions)
>>> observations
array([[ 0.00122802, 0.16228443, 0.02521779, -0.23700266],
[ 0.00788269, -0.17490888, 0.03393489, 0.31735462],
[ 0.04918966, 0.19421194, 0.02938497, -0.29495203]],
dtype=float32)
>>> rewards
array([1., 1., 1.])
>>> termination
array([False, False, False])
>>> termination
array([False, False, False])
>>> infos
{}
``` ```

View File

@@ -1,196 +1,136 @@
--- ---
title: Wrappers title: Wrapper
lastpage:
--- ---
# Wrappers # Wrappers
Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly. ```{toctree}
Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can :hidden:
also be chained to combine their effects. Most environments that are generated via `gymnasium.make` will already be wrapped by default. wrappers/misc_wrappers
wrappers/action_wrappers
In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along wrappers/observation_wrappers
with (possibly optional) parameters to the wrapper's constructor: wrappers/reward_wrappers
```python
>>> import gymnasium as gym
>>> from gymnasium.wrappers import RescaleAction
>>> base_env = gym.make("BipedalWalker-v3")
>>> base_env.action_space
Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
>>> wrapped_env = RescaleAction(base_env, min_action=0, max_action=1)
>>> wrapped_env.action_space
Box([0. 0. 0. 0.], [1. 1. 1. 1.], (4,), float32)
```
You can access the environment underneath the **first** wrapper by using
the `.env` attribute:
```python
>>> wrapped_env
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
>>> wrapped_env.env
<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>
``` ```
If you want to get to the environment underneath **all** of the layers of wrappers, ## gymnasium.Wrapper
you can use the `.unwrapped` attribute.
If the environment is already a bare environment, the `.unwrapped` attribute will just return itself.
```python ```{eval-rst}
>>> wrapped_env .. autoclass:: gymnasium.Wrapper
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
>>> wrapped_env.unwrapped
<gymnasium.envs.box2d.bipedal_walker.BipedalWalker object at 0x7f87d70712d0>
``` ```
There are three common things you might want a wrapper to do: ### Methods
- Transform actions before applying them to the base environment ```{eval-rst}
- Transform observations that are returned by the base environment .. autofunction:: gymnasium.Wrapper.step
- Transform rewards that are returned by the base environment .. autofunction:: gymnasium.Wrapper.reset
.. autofunction:: gymnasium.Wrapper.close
```
Such wrappers can be easily implemented by inheriting from `ActionWrapper`, `ObservationWrapper`, or `RewardWrapper` and implementing the ### Attributes
respective transformation. If you need a wrapper to do more complicated tasks, you can inherit from the `Wrapper` class directly.
The code that is presented in the following sections can also be found in
the [gym-examples](https://github.com/Farama-Foundation/gym-examples) repository
## ActionWrapper ```{eval-rst}
If you would like to apply a function to the action before passing it to the base environment, .. autoproperty:: gymnasium.Wrapper.action_space
you can simply inherit from `ActionWrapper` and overwrite the method `action` to implement that transformation. .. autoproperty:: gymnasium.Wrapper.observation_space
The transformation defined in that method must take values in the base environment's action space. .. autoproperty:: gymnasium.Wrapper.reward_range
However, its domain might differ from the original action space. In that case, you need to specify the new .. autoproperty:: gymnasium.Wrapper.spec
action space of the wrapper by setting `self.action_space` in the `__init__` method of your wrapper. .. autoproperty:: gymnasium.Wrapper.metadata
.. autoproperty:: gymnasium.Wrapper.np_random
.. autoproperty:: gymnasium.Wrapper.unwrapped
```
Let's say you have an environment with action space of type `Box`, but you would ## Gymnasium Wrappers
only like to use a finite subset of actions. Then, you might want to implement the following wrapper
```python Gymnasium provides a number of commonly used wrappers listed below. More information can be found on the particular
class DiscreteActions(gym.ActionWrapper): wrapper in the page on the wrapper type
def __init__(self, env, disc_to_cont):
super().__init__(env) ```{eval-rst}
self.disc_to_cont = disc_to_cont .. py:currentmodule:: gymnasium.wrappers
self.action_space = Discrete(len(disc_to_cont))
.. list-table::
:header-rows: 1
def action(self, act): * - Name
return self.disc_to_cont[act] - Type
- Description
if __name__ == "__main__": * - :class:`AtariPreprocessing`
env = gym.make("LunarLanderContinuous-v2") - Misc Wrapper
wrapped_env = DiscreteActions(env, [np.array([1,0]), np.array([-1,0]), - Implements the common preprocessing applied tp Atari environments
np.array([0,1]), np.array([0,-1])]) * - :class:`AutoResetWrapper`
print(wrapped_env.action_space) #Discrete(4) - Misc Wrapper
- The wrapped environment will automatically reset when the terminated or truncated state is reached.
* - :class:`ClipAction`
- Action Wrapper
- Clip the continuous action to the valid bound specified by the environment's `action_space`
* - :class:`EnvCompatibility`
- Misc Wrapper
- Provides compatibility for environments written in the OpenAI Gym v0.21 API to look like Gymnasium environments
* - :class:`FilterObservation`
- Observation Wrapper
- Filters a dictionary observation spaces to only requested keys
* - :class:`FlattenObservation`
- Observation Wrapper
- An Observation wrapper that flattens the observation
* - :class:`FrameStack`
- Observation Wrapper
- AnObservation wrapper that stacks the observations in a rolling manner.
* - :class:`GrayScaleObservation`
- Observation Wrapper
- Convert the image observation from RGB to gray scale.
* - :class:`HumanRendering`
- Misc Wrapper
- Allows human like rendering for environments that support "rgb_array" rendering
* - :class:`NormalizeObservation`
- Observation Wrapper
- This wrapper will normalize observations s.t. each coordinate is centered with unit variance.
* - :class:`NormalizeReward`
- Reward Wrapper
- This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance.
* - :class:`OrderEnforcing`
- Misc Wrapper
- This will produce an error if `step` or `render` is called before `reset`
* - :class:`PixelObservationWrapper`
- Observation Wrapper
- Augment observations by pixel values obtained via `render` that can be added to or replaces the environments observation.
* - :class:`RecordEpisodeStatistics`
- Misc Wrapper
- This will keep track of cumulative rewards and episode lengths returning them at the end.
* - :class:`RecordVideo`
- Misc Wrapper
- This wrapper will record videos of rollouts.
* - :class:`RenderCollection`
- Misc Wrapper
- Enable list versions of render modes, i.e. "rgb_array_list" for "rgb_array" such that the rendering for each step are saved in a list until `render` is called.
* - :class:`RescaleAction`
- Action Wrapper
- Rescales the continuous action space of the environment to a range \[`min_action`, `max_action`], where `min_action` and `max_action` are numpy arrays or floats.
* - :class:`ResizeObservation`
- Observation Wrapper
- This wrapper works on environments with image observations (or more generally observations of shape AxBxC) and resizes the observation to the shape given by the tuple `shape`.
* - :class:`StepAPICompatibility`
- Misc Wrapper
- Modifies an environment step function from (old) done to the (new) termination / truncation API.
* - :class:`TimeAwareObservation`
- Observation Wrapper
- Augment the observation with current time step in the trajectory (by appending it to the observation).
* - :class:`TimeLimit`
- Misc Wrapper
- This wrapper will emit a truncated signal if the specified number of steps is exceeded in an episode.
* - :class:`TransformObservation`
- Observation Wrapper
- This wrapper will apply function to observations
* - :class:`TransformReward`
- Reward Wrapper
- This wrapper will apply function to rewards
* - :class:`VectorListInfo`
- Misc Wrapper
- This wrapper will convert the info of a vectorized environment from the `dict` format to a `list` of dictionaries where the i-th dictionary contains info of the i-th environment.
``` ```
Among others, Gymnasium provides the action wrappers `ClipAction` and `RescaleAction`. ## Implementing a custom wrapper
## ObservationWrapper
If you would like to apply a function to the observation that is returned by the base environment before passing
it to learning code, you can simply inherit from `ObservationWrapper` and overwrite the method `observation` to
implement that transformation. The transformation defined in that method must be defined on the base environment's
observation space. However, it may take values in a different space. In that case, you need to specify the new
observation space of the wrapper by setting `self.observation_space` in the `__init__` method of your wrapper.
For example, you might have a 2D navigation task where the environment returns dictionaries as observations with keys `"agent_position"`
and `"target_position"`. A common thing to do might be to throw away some degrees of freedom and only consider
the position of the target relative to the agent, i.e. `observation["target_position"] - observation["agent_position"]`.
For this, you could implement an observation wrapper like this:
```python
class RelativePosition(gym.ObservationWrapper):
def __init__(self, env):
super().__init__(env)
self.observation_space = Box(shape=(2,), low=-np.inf, high=np.inf)
def observation(self, obs):
return obs["target"] - obs["agent"]
```
Among others, Gymnasium provides the observation wrapper `TimeAwareObservation`, which adds information about the index of the timestep
to the observation.
## RewardWrapper
If you would like to apply a function to the reward that is returned by the base environment before passing
it to learning code, you can simply inherit from `RewardWrapper` and overwrite the method `reward` to
implement that transformation. This transformation might change the reward range; to specify the reward range of
your wrapper, you can simply define `self.reward_range` in `__init__`.
Let us look at an example: Sometimes (especially when we do not have control over the reward because it is intrinsic), we want to clip the reward
to a range to gain some numerical stability. To do that, we could, for instance, implement the following wrapper:
```python
class ClipReward(gym.RewardWrapper):
def __init__(self, env, min_reward, max_reward):
super().__init__(env)
self.min_reward = min_reward
self.max_reward = max_reward
self.reward_range = (min_reward, max_reward)
def reward(self, reward):
return np.clip(reward, self.min_reward, self.max_reward)
```
## AutoResetWrapper
Some users may want a wrapper which will automatically reset its wrapped environment when its wrapped environment reaches the done state. An advantage of this environment is that it will never produce undefined behavior as standard gymnasium environments do when stepping beyond the done state.
When calling step causes `self.env.step()` to return `(terminated or truncated)=True`,
`self.env.reset()` is called,
and the return format of `self.step()` is as follows:
```python
new_obs, final_reward, final_terminated, final_truncated, info
```
`new_obs` is the first observation after calling `self.env.reset()`,
`final_reward` is the reward after calling `self.env.step()`,
prior to calling `self.env.reset()`
The expression `(final_terminated or final_truncated)` is always `True`
`info` is a dict containing all the keys from the info dict returned by
the call to `self.env.reset()`, with additional keys `final_observation`
containing the observation returned by the last call to `self.env.step()`
and `final_info` containing the info dict returned by the last call
to `self.env.step()`.
If `(terminated or truncated)` is not true when `self.env.step()` is called, `self.step()` returns
```python
obs, reward, terminated, truncated, info
```
as normal.
The AutoResetWrapper is not applied by default when calling `gymnasium.make()`, but can be applied by setting the optional `autoreset` argument to `True`:
```python
env = gym.make("CartPole-v1", autoreset=True)
```
The AutoResetWrapper can also be applied using its constructor:
```python
env = gym.make("CartPole-v1")
env = AutoResetWrapper(env)
```
```{note}
When using the AutoResetWrapper to collect rollouts, note
that the when `self.env.step()` returns `done`, a
new observation from after calling `self.env.reset()` is returned
by `self.step()` alongside the terminal reward and done state from the
previous episode . If you need the terminal state from the previous
episode, you need to retrieve it via the the `final_observation` key
in the info dict. Make sure you know what you're doing if you
use this wrapper!
```
## General Wrappers
Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the Sometimes you might need to implement a wrapper that does some more complicated modifications (e.g. modify the
reward based on data in `info` or change the rendering behavior). reward based on data in `info` or change the rendering behavior).
Such wrappers can be implemented by inheriting from `Wrapper`. Such wrappers can be implemented by inheriting from Misc Wrapper.
- You can set a new action or observation space by defining `self.action_space` or `self.observation_space` in `__init__`, respectively - You can set a new action or observation space by defining `self.action_space` or `self.observation_space` in `__init__`, respectively
- You can set new metadata and reward range by defining `self.metadata` and `self.reward_range` in `__init__`, respectively - You can set new metadata and reward range by defining `self.metadata` and `self.reward_range` in `__init__`, respectively
@@ -204,6 +144,8 @@ initialization of the environment. However, *Reacher* does not allow you to do t
of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms: of the reward are returned in `info`, so let us build a wrapper for Reacher that allows us to weight those terms:
```python ```python
import gymnasium as gym
class ReacherRewardWrapper(gym.Wrapper): class ReacherRewardWrapper(gym.Wrapper):
def __init__(self, env, reward_dist_weight, reward_ctrl_weight): def __init__(self, env, reward_dist_weight, reward_ctrl_weight):
super().__init__(env) super().__init__(env)
@@ -221,29 +163,4 @@ class ReacherRewardWrapper(gym.Wrapper):
```{note} ```{note}
It is *not* sufficient to use a `RewardWrapper` in this case! It is *not* sufficient to use a `RewardWrapper` in this case!
``` ```
## Available Wrappers
| Name | Type | Arguments | Description |
|---------------------------|--------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `AtariPreprocessing` | `gymnasium.Wrapper` | `env: gymnasium.Env`, `noop_max: int = 30`, `frame_skip: int = 4`, `screen_size: int = 84`, `terminal_on_life_loss: bool = False`, `grayscale_obs: bool = True`, `grayscale_newaxis: bool = False`, `scale_obs: bool = False` | Implements the best practices from Machado et al. (2018), "Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents" but will be deprecated soon. |
| `AutoResetWrapper` | `gymnasium.Wrapper` | `env` | The wrapped environment will automatically reset when the done state is reached. Make sure you read the documentation before using this wrapper! |
| `ClipAction` | `gymnasium.ActionWrapper` | `env` | Clip the continuous action to the valid bound specified by the environment's `action_space` |
| `FilterObservation` | `gymnasium.ObservationWrapper` | `env`, `filter_keys=None` | If you have an environment that returns dictionaries as observations, but you would like to only keep a subset of the entries, you can use this wrapper. `filter_keys` should be an iterable that contains the keys that are kept in the new observation. If it is `None`, all keys will be kept and the wrapper has no effect. |
| `FlattenObservation` | `gymnasium.ObservationWrapper` | `env` | Observation wrapper that flattens the observation |
| `FrameStack` | `gymnasium.ObservationWrapper` | `env`, `num_stack`, `lz4_compress=False` | Observation wrapper that stacks the observations in a rolling manner. For example, if the number of stacks is 4, then the returned observation contains the most recent 4 observations. Observations will be objects of type `LazyFrames`. This object can be cast to a numpy array via `np.asarray(obs)`. You can also access single frames or slices via the usual `__getitem__` syntax. If `lz4_compress` is set to true, the `LazyFrames` object will compress the frames internally (losslessly). The first observation (i.e. the one returned by `reset`) will consist of `num_stack` repitions of the first frame. |
| `GrayScaleObservation` | `gymnasium.ObservationWrapper` | `env`, `keep_dim=False` | Convert the image observation from RGB to gray scale. By default, the resulting observation will be 2-dimensional. If `keep_dim` is set to true, a singleton dimension will be added (i.e. the observations are of shape AxBx1). |
| `NormalizeReward` | `gymnasium.Wrapper` | `env`, `gamma=0.99`, `epsilon=1e-8` | This wrapper will normalize immediate rewards s.t. their exponential moving average has a fixed variance. `epsilon` is a stability parameter and `gamma` is the discount factor that is used in the exponential moving average. The exponential moving average will have variance `(1 - gamma)**2`. The scaling depends on past trajectories and rewards will not be scaled correctly if the wrapper was newly instantiated or the policy was changed recently. |
| `NormalizeObservation` | `gymnasium.Wrapper` | `env`, `epsilon=1e-8` | This wrapper will normalize observations s.t. each coordinate is centered with unit variance. The normalization depends on past trajectories and observations will not be normalized correctly if the wrapper was newly instantiated or the policy was changed recently. `epsilon` is a stability parameter that is used when scaling the observations. |
| `OrderEnforcing` | `gymnasium.Wrapper` | `env` | This will produce an error if `step` is called before an initial `reset` |
| `PixelObservationWrapper` | `gymnasium.ObservationWrapper` | `env`, `pixels_only=True`, `render_kwargs=None`, `pixel_keys=("pixels",)` | Augment observations by pixel values obtained via `render`. You can specify whether the original observations should be discarded entirely or be augmented by setting `pixels_only`. Also, you can provide keyword arguments for `render`. |
| `RecordEpisodeStatistics` | `gymnasium.Wrapper` | `env`, `deque_size=100` | This will keep track of cumulative rewards and episode lengths. At the end of an episode, the statistics of the episode will be added to `info`. Moreover, the rewards and episode lengths are stored in buffers that can be accessed via `wrapped_env.return_queue` and `wrapped_env.length_queue` respectively. The size of these buffers can be set via `deque_size`. |
| `RecordVideo` | `gymnasium.Wrapper` | `env`, `video_folder: str`, `episode_trigger: Callable[[int], bool] = None`, `step_trigger: Callable[[int], bool] = None`, `video_length: int = 0`, `name_prefix: str = "rl-video"` | This wrapper will record videos of rollouts. The results will be saved in the folder specified via `video_folder`. You can specify a prefix for the filenames via `name_prefix`. Usually, you only want to record the environment intermittently, say every hundreth episode. To allow this, you can pass `episode_trigger` or `step_trigger`. At most one of these should be passed. These functions will accept an episode index or step index, respectively. They should return a boolean that indicates whether a recording should be started at this point. If neither `episode_trigger`, nor `step_trigger` is passed, a default `episode_trigger` will be used. By default, the recording will be stopped once a done signal has been emitted by the environment. However, you can also create recordings of fixed length (possibly spanning several episodes) by passing a strictly positive value for `video_length`. |
| `RescaleAction` | `gymnasium.ActionWrapper` | `env`, `min_action`, `max_action` | Rescales the continuous action space of the environment to a range \[`min_action`, `max_action`], where `min_action` and `max_action` are numpy arrays or floats. |
| `ResizeObservation` | `gymnasium.ObservationWrapper` | `env`, `shape` | This wrapper works on environments with image observations (or more generally observations of shape AxBxC) and resizes the observation to the shape given by the tuple `shape`. The argument `shape` may also be an integer. In that case, the observation is scaled to a square of sidelength `shape` |
| `TimeAwareObservation` | `gymnasium.ObservationWrapper` | `env` | Augment the observation with current time step in the trajectory (by appending it to the observation). This can be useful to ensure that things stay Markov. Currently it only works with one-dimensional observation spaces. |
| `TimeLimit` | `gymnasium.Wrapper` | `env`, `max_episode_steps=None` | Probably the most useful wrapper in Gymnasium. This wrapper will emit a done signal if the specified number of steps is exceeded in an episode. In order to be able to distinguish termination and truncation, you need to check `info`. If it does not contain the key `"TimeLimit.truncated"`, the environment did not reach the timelimit. Otherwise, `info["TimeLimit.truncated"]` will be true if the episode was terminated because of the time limit. |
| `TransformObservation` | `gymnasium.ObservationWrapper` | `env`, `f` | This wrapper will apply `f` to observations |
| `TransformReward` | `gymnasium.RewardWrapper` | `env`, `f` | This wrapper will apply `f` to rewards |
| `VectorListInfo` | `gymnasium.Wrapper` | `env` | This wrapper will convert the info of a vectorized environment from the `dict` format to a `list` of dictionaries where the _i-th_ dictionary contains info of the _i-th_ environment. If using other wrappers that perform operation on info like `RecordEpisodeStatistics`, this need to be the outermost wrapper. |

View File

@@ -0,0 +1,22 @@
# Action Wrappers
## Action Wrapper
```{eval-rst}
.. autoclass:: gymnasium.ActionWrapper
.. autofunction:: gymnasium.ActionWrapper.action
```
## Clip Action
```{eval-rst}
.. autoclass:: gymnasium.wrappers.ClipAction
```
## Rescale Action
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RescaleAction
```

View File

@@ -0,0 +1,68 @@
# Misc Wrappers
## Atari Preprocessing
```{eval-rst}
.. autoclass:: gymnasium.wrappers.AtariPreprocessing
```
## Autoreset
```{eval-rst}
.. autoclass:: gymnasium.wrappers.AutoResetWrapper
```
## Compatibility
```{eval-rst}
.. autoclass:: gymnasium.wrappers.EnvCompatibility
.. autoclass:: gymnasium.wrappers.StepAPICompatibility
```
## Passive Environment Checker
```{eval-rst}
.. autoclass:: gymnasium.wrappers.PassiveEnvChecker
```
## Human Rendering
```{eval-rst}
.. autoclass:: gymnasium.wrappers.HumanRendering
```
## Order Enforcing
```{eval-rst}
.. autoclass:: gymnasium.wrappers.OrderEnforcing
```
## Record Episode Statistics
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RecordEpisodeStatistics
```
## Record Video
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RecordVideo
```
## Render Collection
```{eval-rst}
.. autoclass:: gymnasium.wrappers.RenderCollection
```
## Time Limit
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TimeLimit
```
## Vector List Info
```{eval-rst}
.. autoclass:: gymnasium.wrappers.VectorListInfo
```

View File

@@ -0,0 +1,62 @@
# Observation Wrappers
## Observation Wrapper
```{eval-rst}
.. autoclass:: gymnasium.ObservationWrapper
.. autofunction:: gymnasium.ObservationWrapper.observation
```
## Transform Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TransformObservation
```
## Filter Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.FilterObservation
```
## Flatten Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.FlattenObservation
```
## Framestack Observations
```{eval-rst}
.. autoclass:: gymnasium.wrappers.FrameStack
```
## Gray Scale Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.GrayScaleObservation
```
## Normalize Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.NormalizeObservation
```
## Pixel Observation Wrapper
```{eval-rst}
.. autoclass:: gymnasium.wrappers.PixelObservationWrapper
```
## Resize Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.ResizeObservation
```
## Time Aware Observation
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TimeAwareObservation
```

View File

@@ -0,0 +1,22 @@
# Reward Wrappers
## Reward Wrapper
```{eval-rst}
.. autoclass:: gymnasium.RewardWrapper
.. autofunction:: gymnasium.RewardWrapper.reward
```
## Transform Reward
```{eval-rst}
.. autoclass:: gymnasium.wrappers.TransformReward
```
## Normalize Reward
```{eval-rst}
.. autoclass:: gymnasium.wrappers.NormalizeReward
```

View File

@@ -7,6 +7,7 @@ firstpage:
# Basic Usage # Basic Usage
## Initializing Environments ## Initializing Environments
Initializing environments is very easy in Gymnasium and can be done via: Initializing environments is very easy in Gymnasium and can be done via:
```python ```python
@@ -15,6 +16,7 @@ env = gym.make('CartPole-v1')
``` ```
## Interacting with the Environment ## Interacting with the Environment
Gymnasium implements the classic "agent-environment loop": Gymnasium implements the classic "agent-environment loop":
```{image} /_static/diagrams/AE_loop.png ```{image} /_static/diagrams/AE_loop.png
@@ -84,6 +86,7 @@ It is possible for `terminated=True` and `truncated=True` to occur at the same t
This is explained in detail in the `Handling Time Limits` section. This is explained in detail in the `Handling Time Limits` section.
#### Backward compatibility #### Backward compatibility
Gym will retain support for the old API through compatibility wrappers. Gym will retain support for the old API through compatibility wrappers.
Users can toggle the old API through `make` by setting `apply_api_compatibility=True`. Users can toggle the old API through `make` by setting `apply_api_compatibility=True`.
@@ -100,6 +103,7 @@ For more details see the wrappers section.
## Checking API-Conformity ## Checking API-Conformity
If you have implemented a custom environment and would like to perform a sanity check to make sure that it conforms to If you have implemented a custom environment and would like to perform a sanity check to make sure that it conforms to
the API, you can run: the API, you can run:
@@ -117,6 +121,7 @@ not check the `render` method. To change this behavior, you can pass `skip_rende
been closed! been closed!
## Spaces ## Spaces
Spaces are usually used to specify the format of valid actions and observations. Spaces are usually used to specify the format of valid actions and observations.
Every environment should have the attributes `action_space` and `observation_space`, both of which should be instances Every environment should have the attributes `action_space` and `observation_space`, both of which should be instances
of classes that inherit from `Space`. of classes that inherit from `Space`.
@@ -163,6 +168,7 @@ OrderedDict([('position', 0), ('velocity', 1)])
``` ```
## Wrappers ## Wrappers
Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly. Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly.
Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can
also be chained to combine their effects. Most environments that are generated via `gymnasium.make` will already be wrapped by default. also be chained to combine their effects. Most environments that are generated via `gymnasium.make` will already be wrapped by default.
@@ -210,6 +216,7 @@ If you have a wrapped environment, and you want to get the unwrapped environment
``` ```
## Playing within an environment ## Playing within an environment
You can also play the environment using your keyboard using the `play` function in `gymnasium.utils.play`. You can also play the environment using your keyboard using the `play` function in `gymnasium.utils.play`.
```python ```python
from gymnasium.utils.play import play from gymnasium.utils.play import play

View File

@@ -2,6 +2,7 @@
layout: "contents" layout: "contents"
title: Environment Creation title: Environment Creation
--- ---
# Make your own custom environment # Make your own custom environment
This documentation overviews creating new environments and relevant useful wrappers, utilities and tests included in Gymnasium designed for the creation of new environments. This documentation overviews creating new environments and relevant useful wrappers, utilities and tests included in Gymnasium designed for the creation of new environments.
@@ -59,6 +60,7 @@ where the blue dot is the agent and the red square represents the target.
Let us look at the source code of `GridWorldEnv` piece by piece: Let us look at the source code of `GridWorldEnv` piece by piece:
### Declaration and Initialization ### Declaration and Initialization
Our custom environment will inherit from the abstract class `gymnasium.Env`. You shouldn't forget to add the `metadata` attribute to your class. Our custom environment will inherit from the abstract class `gymnasium.Env`. You shouldn't forget to add the `metadata` attribute to your class.
There, you should specify the render-modes that are supported by your environment (e.g. `"human"`, `"rgb_array"`, `"ansi"`) There, you should specify the render-modes that are supported by your environment (e.g. `"human"`, `"rgb_array"`, `"ansi"`)
and the framerate at which your environment should be rendered. Every environment should support `None` as render-mode; you don't need to add it in the metadata. and the framerate at which your environment should be rendered. Every environment should support `None` as render-mode; you don't need to add it in the metadata.
@@ -71,6 +73,7 @@ We will choose to represent observations in the form of dictionaries with keys `
may look like ` {"agent": array([1, 0]), "target": array([0, 3])}`. may look like ` {"agent": array([1, 0]), "target": array([0, 3])}`.
Since we have 4 actions in our environment ("right", "up", "left", "down"), we will use `Discrete(4)` as an action space. Since we have 4 actions in our environment ("right", "up", "left", "down"), we will use `Discrete(4)` as an action space.
Here is the declaration of `GridWorldEnv` and the implementation of `__init__`: Here is the declaration of `GridWorldEnv` and the implementation of `__init__`:
```python ```python
import gymnasium as gym import gymnasium as gym
from gymnasium import spaces from gymnasium import spaces
@@ -125,6 +128,7 @@ class GridWorldEnv(gym.Env):
``` ```
### Constructing Observations From Environment States ### Constructing Observations From Environment States
Since we will need to compute observations both in `reset` and `step`, it is often convenient to have Since we will need to compute observations both in `reset` and `step`, it is often convenient to have
a (private) method `_get_obs` that translates the environment's state into an observation. However, this is not mandatory a (private) method `_get_obs` that translates the environment's state into an observation. However, this is not mandatory
and you may as well compute observations in `reset` and `step` separately: and you may as well compute observations in `reset` and `step` separately:
@@ -142,6 +146,7 @@ Oftentimes, info will also contain some data that is only available inside the `
terms). In that case, we would have to update the dictionary that is returned by `_get_info` in `step`. terms). In that case, we would have to update the dictionary that is returned by `_get_info` in `step`.
### Reset ### Reset
The `reset` method will be called to initiate a new episode. You may assume that the `step` method will not The `reset` method will be called to initiate a new episode. You may assume that the `step` method will not
be called before `reset` has been called. Moreover, `reset` should be called whenever a done signal has been issued. be called before `reset` has been called. Moreover, `reset` should be called whenever a done signal has been issued.
Users may pass the `seed` keyword to `reset` to initialize any random number generator that is used by the environment Users may pass the `seed` keyword to `reset` to initialize any random number generator that is used by the environment
@@ -180,6 +185,7 @@ and `_get_info` that we implemented earlier for that:
``` ```
### Step ### Step
The `step` method usually contains most of the logic of your environment. It accepts an `action`, computes the state of The `step` method usually contains most of the logic of your environment. It accepts an `action`, computes the state of
the environment after applying that action and returns the 4-tuple `(observation, reward, done, info)`. the environment after applying that action and returns the 4-tuple `(observation, reward, done, info)`.
Once the new state of the environment has been computed, we can check whether it is a terminal state and we set `done` Once the new state of the environment has been computed, we can check whether it is a terminal state and we set `done`
@@ -207,6 +213,7 @@ accordingly. Since we are using sparse binary rewards in `GridWorldEnv`, computi
``` ```
### Rendering ### Rendering
Here, we are using PyGame for rendering. A similar approach to rendering is used in many environments that are included Here, we are using PyGame for rendering. A similar approach to rendering is used in many environments that are included
with Gymnasium and you can use it as a skeleton for your own environments: with Gymnasium and you can use it as a skeleton for your own environments:
@@ -279,6 +286,7 @@ with Gymnasium and you can use it as a skeleton for your own environments:
``` ```
### Close ### Close
The `close` method should close any open resources that were used by the environment. In many cases, The `close` method should close any open resources that were used by the environment. In many cases,
you don't actually have to bother to implement this method. However, in our example `render_mode` may you don't actually have to bother to implement this method. However, in our example `render_mode` may
be `"human"` and we might need to close the window that has been opened: be `"human"` and we might need to close the window that has been opened:
@@ -353,6 +361,7 @@ setup(
``` ```
## Creating Environment Instances ## Creating Environment Instances
After you have installed your package locally with `pip install -e gym-examples`, you can create an instance of the environment via: After you have installed your package locally with `pip install -e gym-examples`, you can create an instance of the environment via:
```python ```python
@@ -372,6 +381,7 @@ constructor yourself. Some may find this approach more pythonic and environments
also perfectly fine (but remember to add wrappers as well!). also perfectly fine (but remember to add wrappers as well!).
## Using Wrappers ## Using Wrappers
Oftentimes, we want to use different variants of a custom environment, or we want to Oftentimes, we want to use different variants of a custom environment, or we want to
modify the behavior of an environment that is provided by Gymnasium or some other party. modify the behavior of an environment that is provided by Gymnasium or some other party.
Wrappers allow us to do this without changing the environment implementation or adding any boilerplate code. Wrappers allow us to do this without changing the environment implementation or adding any boilerplate code.

View File

@@ -1,27 +1,45 @@
# Handling Time Limits # Handling Time Limits
In using Gymnasium environments with reinforcement learning code, a common problem observed is how time limits are incorrectly handled. The `done` signal received (in previous versions of OpenAI Gym < 0.26) from `env.step` indicated whether an episode has ended. However, this signal did not distinguish whether the episode ended due to `termination` or `truncation`. In using Gymnasium environments with reinforcement learning code, a common problem observed is how time limits are incorrectly handled. The `done` signal received (in previous versions of OpenAI Gym < 0.26) from `env.step` indicated whether an episode has ended. However, this signal did not distinguish whether the episode ended due to `termination` or `truncation`.
### Termination In using Gymnasium environments with reinforcement learning code, a common problem observed is how time limits are
Termination refers to the episode ending after reaching a terminal state that is defined as part of the environment definition. Examples are - task success, task failure, robot falling down etc. Notably, this also includes episodes ending in finite-horizon environments due to a time-limit inherent to the environment. Note that to preserve Markov property, a representation of the remaining time must be present in the agent's observation in finite-horizon environments. [(Reference)](https://arxiv.org/abs/1712.00378) incorrectly handled. The `done` signal received (in previous versions of gymnasium < 0.26) from `env.step` indicated
whether an episode has ended. However, this signal did not distinguish whether the episode ended due to `termination` or `truncation`.
## Termination
### Truncation Termination refers to the episode ending after reaching a terminal state that is defined as part of the environment
Truncation refers to the episode ending after an externally defined condition (that is outside the scope of the Markov Decision Process). This could be a time-limit, a robot going out of bounds etc. definition. Examples are - task success, task failure, robot falling down etc. Notably, this also includes episodes
ending in finite-horizon environments due to a time-limit inherent to the environment. Note that to preserve Markov
property, a representation of the remaining time must be present in the agent's observation in finite-horizon environments.
[(Reference)](https://arxiv.org/abs/1712.00378)
An infinite-horizon environment is an obvious example of where this is needed. We cannot wait forever for the episode to complete, so we set a practical time-limit after which we forcibly halt the episode. The last state in this case is not a terminal state since it has a non-zero transition probability of moving to another state as per the Markov Decision Process that defines the RL problem. This is also different from time-limits in finite horizon environments as the agent in this case has no idea about this time-limit. ## Truncation
Truncation refers to the episode ending after an externally defined condition (that is outside the scope of the Markov
Decision Process). This could be a time-limit, a robot going out of bounds etc.
### Importance in learning code An infinite-horizon environment is an obvious example of where this is needed. We cannot wait forever for the episode
to complete, so we set a practical time-limit after which we forcibly halt the episode. The last state in this case is
not a terminal state since it has a non-zero transition probability of moving to another state as per the Markov
Decision Process that defines the RL problem. This is also different from time-limits in finite horizon environments
as the agent in this case has no idea about this time-limit.
Bootstrapping (using one or more estimated values of a variable to update estimates of the same variable) is a key aspect of Reinforcement Learning. A value function will tell you how much discounted reward you will get from a particular state if you follow a given policy. When an episode stops at any given point, by looking at the value of the final state, the agent is able to estimate how much discounted reward could have been obtained if the episode has continued. This is an example of handling truncation. ## Importance in learning code
Bootstrapping (using one or more estimated values of a variable to update estimates of the same variable) is a key
aspect of Reinforcement Learning. A value function will tell you how much discounted reward you will get from a
particular state if you follow a given policy. When an episode stops at any given point, by looking at the value of
the final state, the agent is able to estimate how much discounted reward could have been obtained if the episode has
continued. This is an example of handling truncation.
More formally, a common example of bootstrapping in RL is updating the estimate of the Q-value function, More formally, a common example of bootstrapping in RL is updating the estimate of the Q-value function,
```math ```math
Q_{target}(o_t, a_t) = r_t + \gamma . \max_a(Q(o_{t+1}, a_{t+1})) Q_{target}(o_t, a_t) = r_t + \gamma . \max_a(Q(o_{t+1}, a_{t+1}))
``` ```
In classical RL, the new `Q` estimate is a weighted average of the previous `Q` estimate and `Q_target` while in Deep Q-Learning, the error between `Q_target` and the previous `Q` estimate is minimized. In classical RL, the new `Q` estimate is a weighted average of the previous `Q` estimate and `Q_target` while in Deep
Q-Learning, the error between `Q_target` and the previous `Q` estimate is minimized.
However, at the terminal state, bootstrapping is not done, However, at the terminal state, bootstrapping is not done,
@@ -29,9 +47,11 @@ However, at the terminal state, bootstrapping is not done,
Q_{target}(o_t, a_t) = r_t Q_{target}(o_t, a_t) = r_t
``` ```
This is where the distinction between termination and truncation becomes important. When an episode ends due to termination we don't bootstrap, when it ends due to truncation, we bootstrap. This is where the distinction between termination and truncation becomes important. When an episode ends due to
termination we don't bootstrap, when it ends due to truncation, we bootstrap.
While using gymnasium environments, the `done` signal (default for < v0.26) is frequently used to determine whether to bootstrap or not. However, this is incorrect since it does not differentiate between termination and truncation. While using gymnasium environments, the `done` signal (default for < v0.26) is frequently used to determine whether to
bootstrap or not. However, this is incorrect since it does not differentiate between termination and truncation.
A simple example of value functions is shown below. This is an illustrative example and not part of any specific algorithm. A simple example of value functions is shown below. This is an illustrative example and not part of any specific algorithm.
@@ -42,9 +62,11 @@ vf_target = rew + gamma * (1-done)* vf_next_state
This is incorrect in the case of episode ending due to a truncation, where bootstrapping needs to happen but it doesn't. This is incorrect in the case of episode ending due to a truncation, where bootstrapping needs to happen but it doesn't.
### Solution ## Solution
From v0.26 onwards, Gymnasium's `env.step` API returns both termination and truncation information explicitly. In the previous version truncation information was supplied through the info key `TimeLimit.truncated`. The correct way to handle terminations and truncations now is, From v0.26 onwards, Gymnasium's `env.step` API returns both termination and truncation information explicitly.
In the previous version truncation information was supplied through the info key `TimeLimit.truncated`.
The correct way to handle terminations and truncations now is,
```python ```python
# terminated = done and 'TimeLimit.truncated' not in info # This was needed in previous versions. # terminated = done and 'TimeLimit.truncated' not in info # This was needed in previous versions.

View File

@@ -3,9 +3,10 @@ layout: "contents"
title: Vectorising your environments title: Vectorising your environments
--- ---
# Vectorising your environments # Vectorizing your environments
## Vectorized Environments ## Vectorized Environments
*Vectorized environments* are environments that run multiple independent copies of the same environment in parallel using [multiprocessing](https://docs.python.org/3/library/multiprocessing.html). Vectorized environments take as input a batch of actions, and return a batch of observations. This is particularly useful, for example, when the policy is defined as a neural network that operates over a batch of observations. *Vectorized environments* are environments that run multiple independent copies of the same environment in parallel using [multiprocessing](https://docs.python.org/3/library/multiprocessing.html). Vectorized environments take as input a batch of actions, and return a batch of observations. This is particularly useful, for example, when the policy is defined as a neural network that operates over a batch of observations.
Gymnasium provides two types of vectorized environments: Gymnasium provides two types of vectorized environments:
@@ -48,6 +49,7 @@ The function `gymnasium.vector.make` is meant to be used only in basic cases (e.
### Creating a vectorized environment ### Creating a vectorized environment
To create a vectorized environment that runs multiple environment copies, you can wrap your parallel environments inside `gymnasium.vector.SyncVectorEnv` (for sequential execution), or `gymnasium.vector.AsyncVectorEnv` (for parallel execution, with [multiprocessing](https://docs.python.org/3/library/multiprocessing.html)). These vectorized environments take as input a list of callables specifying how the copies are created. To create a vectorized environment that runs multiple environment copies, you can wrap your parallel environments inside `gymnasium.vector.SyncVectorEnv` (for sequential execution), or `gymnasium.vector.AsyncVectorEnv` (for parallel execution, with [multiprocessing](https://docs.python.org/3/library/multiprocessing.html)). These vectorized environments take as input a list of callables specifying how the copies are created.
```python ```python
@@ -81,7 +83,9 @@ When using `AsyncVectorEnv` with either the ``spawn`` or ``forkserver`` start me
if __name__ == "__main__": if __name__ == "__main__":
envs = gymnasium.vector.make("CartPole-v1", num_envs=3, context="spawn") envs = gymnasium.vector.make("CartPole-v1", num_envs=3, context="spawn")
``` ```
### Working with vectorized environments ### Working with vectorized environments
While standard Gymnasium environments take a single action and return a single observation (with a reward, and boolean indicating termination), vectorized environments take a *batch of actions* as input, and return a *batch of observations*, together with an array of rewards and booleans indicating if the episode ended in each environment copy. While standard Gymnasium environments take a single action and return a single observation (with a reward, and boolean indicating termination), vectorized environments take a *batch of actions* as input, and return a *batch of observations*, together with an array of rewards and booleans indicating if the episode ended in each environment copy.
@@ -192,8 +196,8 @@ If the _dtype_ of the returned info is whether `int`, `float`, `bool` or any _dt
None], dtype=object), '_final_observation': array([False, True, False])} None], dtype=object), '_final_observation': array([False, True, False])}
``` ```
## Observation & Action spaces ## Observation & Action spaces
Like any Gymnasium environment, vectorized environments contain the two properties `VectorEnv.observation_space` and `VectorEnv.action_space` to specify the observation and action spaces of the environments. Since vectorized environments operate on multiple environment copies, where the actions taken and observations returned by all of the copies are batched together, the observation and action *spaces* are batched as well so that the input actions are valid elements of `VectorEnv.action_space`, and the observations are valid elements of `VectorEnv.observation_space`. Like any Gymnasium environment, vectorized environments contain the two properties `VectorEnv.observation_space` and `VectorEnv.action_space` to specify the observation and action spaces of the environments. Since vectorized environments operate on multiple environment copies, where the actions taken and observations returned by all of the copies are batched together, the observation and action *spaces* are batched as well so that the input actions are valid elements of `VectorEnv.action_space`, and the observations are valid elements of `VectorEnv.observation_space`.
```python ```python
@@ -246,6 +250,7 @@ This is convenient, for example, if you instantiate a policy. In the following e
## Intermediate Usage ## Intermediate Usage
### Shared memory ### Shared memory
`AsyncVectorEnv` runs each environment copy inside an individual process. At each call to `AsyncVectorEnv.reset` or `AsyncVectorEnv.step`, the observations of all of the parallel environments are sent back to the main process. To avoid expensive transfers of data between processes, especially with large observations (e.g. images), `AsyncVectorEnv` uses a shared memory by default (``shared_memory=True``) that processes can write to and read from at minimal cost. This can increase the throughput of the vectorized environment. `AsyncVectorEnv` runs each environment copy inside an individual process. At each call to `AsyncVectorEnv.reset` or `AsyncVectorEnv.step`, the observations of all of the parallel environments are sent back to the main process. To avoid expensive transfers of data between processes, especially with large observations (e.g. images), `AsyncVectorEnv` uses a shared memory by default (``shared_memory=True``) that processes can write to and read from at minimal cost. This can increase the throughput of the vectorized environment.
```python ```python
@@ -263,6 +268,7 @@ This is convenient, for example, if you instantiate a policy. In the following e
``` ```
### Exception handling ### Exception handling
Because sometimes things may not go as planned, the exceptions raised in any given environment copy are re-raised in the vectorized environment, even when the copy runs in parallel with `AsyncVectorEnv`. This way, you can choose how to handle these exceptions yourself (with ``try ... except``). Because sometimes things may not go as planned, the exceptions raised in any given environment copy are re-raised in the vectorized environment, even when the copy runs in parallel with `AsyncVectorEnv`. This way, you can choose how to handle these exceptions yourself (with ``try ... except``).
```python ```python
@@ -291,6 +297,7 @@ ValueError: An error occurred.
## Advanced Usage ## Advanced Usage
### Custom spaces ### Custom spaces
Vectorized environments will batch actions and observations if they are elements from standard Gymnasium spaces, such as `gymnasium.spaces.Box`, `gymnasium.spaces.Discrete`, or `gymnasium.spaces.Dict`. However, if you create your own environment with a custom action and/or observation space (inheriting from `gymnasium.Space`), the vectorized environment will not attempt to automatically batch the actions/observations, and instead, it will return the raw tuple of elements from all parallel environments. Vectorized environments will batch actions and observations if they are elements from standard Gymnasium spaces, such as `gymnasium.spaces.Box`, `gymnasium.spaces.Discrete`, or `gymnasium.spaces.Dict`. However, if you create your own environment with a custom action and/or observation space (inheriting from `gymnasium.Space`), the vectorized environment will not attempt to automatically batch the actions/observations, and instead, it will return the raw tuple of elements from all parallel environments.
In the following example, we create a new environment `SMILESEnv`, whose observations are strings representing the [SMILES](https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system) notation of a molecular structure, with a custom observation space `SMILES`. The observations returned by the vectorized environment are contained in a tuple of strings. In the following example, we create a new environment `SMILESEnv`, whose observations are strings representing the [SMILES](https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system) notation of a molecular structure, with a custom observation space `SMILES`. The observations returned by the vectorized environment are contained in a tuple of strings.

View File

@@ -79,7 +79,7 @@ atari/zaxxon
Atari environments are simulated via the Arcade Learning Environment (ALE) [[1]](#1). Atari environments are simulated via the Arcade Learning Environment (ALE) [[1]](#1).
### AutoROM (installing the ROMs) ## AutoROM (installing the ROMs)
ALE-py doesn't include the atari ROMs (`pip install gymnasium[atari]`) which are necessary to make any of the atari environments. ALE-py doesn't include the atari ROMs (`pip install gymnasium[atari]`) which are necessary to make any of the atari environments.
To install the atari ROM, use `pip install gymnasium[accept-rom-license]` which will install AutoROM and download the ROMs, install them in the default location. To install the atari ROM, use `pip install gymnasium[accept-rom-license]` which will install AutoROM and download the ROMs, install them in the default location.
@@ -87,7 +87,7 @@ In doing so, you agree to TODO
It is possible to install the ROMs in an alternative location, [repo](https://github.com/Farama-Foundation/AutoROM) has more information. It is possible to install the ROMs in an alternative location, [repo](https://github.com/Farama-Foundation/AutoROM) has more information.
### Action Space ## Action Space
The action space is a subset of the following discrete set of legal actions: The action space is a subset of the following discrete set of legal actions:
@@ -120,17 +120,17 @@ The reduced action space of an Atari environment may depend on the "flavor" of t
the arguments `difficulty` and `mode` when constructing the environment. This documentation only provides details on the the arguments `difficulty` and `mode` when constructing the environment. This documentation only provides details on the
action spaces of default flavor choices. action spaces of default flavor choices.
### Observation Space ## Observation Space
The observation issued by an Atari environment may be: The observation issued by an Atari environment may be:
- the RGB image that is displayed to a human player, - the RGB image that is displayed to a human player,
- a grayscale version of that image or - a grayscale version of that image or
- the state of the 128 Bytes of RAM of the console. - the state of the 128 Bytes of RAM of the console.
### Rewards ## Rewards
The exact reward dynamics depend on the environment and are usually documented in the game's manual. You can The exact reward dynamics depend on the environment and are usually documented in the game's manual. You can
find these manuals on [AtariAge](https://atariage.com/). find these manuals on [AtariAge](https://atariage.com/).
### Stochasticity ## Stochasticity
It was pointed out in [[1]](#1) that Atari games are entirely deterministic. Thus, agents could achieve It was pointed out in [[1]](#1) that Atari games are entirely deterministic. Thus, agents could achieve
state-of-the-art performance by simply memorizing an optimal sequence of actions while completely ignoring observations from the environment. state-of-the-art performance by simply memorizing an optimal sequence of actions while completely ignoring observations from the environment.
To avoid this, ALE implements sticky actions: Instead of always simulating the action passed to the environment, there is a small To avoid this, ALE implements sticky actions: Instead of always simulating the action passed to the environment, there is a small
@@ -142,12 +142,10 @@ a tuple of two positive integers. If `frameskip` is an integer, frame skipping i
repeated `frameskip` many times. Otherwise, if `frameskip` is a tuple, the number of skipped frames is chosen uniformly at repeated `frameskip` many times. Otherwise, if `frameskip` is a tuple, the number of skipped frames is chosen uniformly at
random between `frameskip[0]` (inclusive) and `frameskip[1]` (exclusive) in each environment step. random between `frameskip[0]` (inclusive) and `frameskip[1]` (exclusive) in each environment step.
## Common Arguments
### Common Arguments
When initializing Atari environments via `gymnasium.make`, you may pass some additional arguments. These work for any When initializing Atari environments via `gymnasium.make`, you may pass some additional arguments. These work for any
Atari environment. However, legal values for `mode` and `difficulty` depend on the environment. Atari environment. However, legal values for `mode` and `difficulty` depend on the environment.
- **mode**: `int`. Game mode, see [[2]](#2). Legal values depend on the environment and are listed in the table above. - **mode**: `int`. Game mode, see [[2]](#2). Legal values depend on the environment and are listed in the table above.
- **difficulty**: `int`. The difficulty of the game, see [[2]](#2). Legal values depend on the environment and are listed in - **difficulty**: `int`. The difficulty of the game, see [[2]](#2). Legal values depend on the environment and are listed in
@@ -171,8 +169,7 @@ action space will be reduced to a subset.
> It is highly recommended to specify `render_mode` during construction instead of calling `env.render()`. > It is highly recommended to specify `render_mode` during construction instead of calling `env.render()`.
> This will guarantee proper scaling, audio support, and proper framerates > This will guarantee proper scaling, audio support, and proper framerates
## Version History and Naming Schemes
### Version History and Naming Schemes
All Atari games are available in three versions. They differ in the default settings of the arguments above. All Atari games are available in three versions. They differ in the default settings of the arguments above.
The differences are listed in the following table: The differences are listed in the following table:
@@ -206,7 +203,7 @@ are in the "ALE" namespace. The suffix "-ram" is still available. Thus, we get t
| ALE/Amidar-v5 | `"rgb"` | `5` | `0.25` | `True` | | ALE/Amidar-v5 | `"rgb"` | `5` | `0.25` | `True` |
| ALE/Amidar-ram-v5 | `"ram"` | `5` | `0.25` | `True` | | ALE/Amidar-ram-v5 | `"ram"` | `5` | `0.25` | `True` |
### Flavors ## Flavors
Some games allow the user to set a difficulty level and a game mode. Different modes/difficulties may have different Some games allow the user to set a difficulty level and a game mode. Different modes/difficulties may have different
game dynamics and (if a reduced action space is used) different action spaces. We follow the convention of [[2]](#2) and game dynamics and (if a reduced action space is used) different action spaces. We follow the convention of [[2]](#2) and
refer to the combination of difficulty level and game mode as a flavor of a game. The following table shows refer to the combination of difficulty level and game mode as a flavor of a game. The following table shows
@@ -279,7 +276,7 @@ the available modes and difficulty levels for different Atari games:
> Each game also has a valid difficulty for the opposing AI, which has a different range depending on the game. These values can have a range of 0 - n, where n can be found at [the ALE documentation](https://github.com/mgbellemare/Arcade-Learning-Environment/blob/master/docs/games.md) > Each game also has a valid difficulty for the opposing AI, which has a different range depending on the game. These values can have a range of 0 - n, where n can be found at [the ALE documentation](https://github.com/mgbellemare/Arcade-Learning-Environment/blob/master/docs/games.md)
### References ## References
(#1)= (#1)=
<a id="1">[1]</a> <a id="1">[1]</a>

View File

@@ -20,18 +20,17 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Adventure-v5")` | | Import | `gymnasium.make("ALE/Adventure-v5")` |
### Description ## Description
You must find the enchanted chalice and return it to the golden castle. You can pick up various objects (keys, a sword, You must find the enchanted chalice and return it to the golden castle. You can pick up various objects (keys, a sword,
a bridge, or a magnet) and have to fight or outmanoeuvre dragons. a bridge, or a magnet) and have to fight or outmanoeuvre dragons.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=1). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=1).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -49,8 +48,7 @@ instead. The respective observation spaces are
respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Adventure-v5") env = gymnasium.make("ALE/Adventure-v5")
@@ -69,7 +67,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Adventure-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Adventure-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -19,11 +19,11 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/AirRaid-v5")` | | Import | `gymnasium.make("ALE/AirRaid-v5")` |
### Description ## Description
You control a ship that can move sideways. You must protect two buildings (one on the right and one on the left side of the screen) from You control a ship that can move sideways. You must protect two buildings (one on the right and one on the left side of the screen) from
flying saucers that are trying to drop bombs on them. flying saucers that are trying to drop bombs on them.
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -39,8 +39,7 @@ flavor looks like this:
| 4 | RIGHTFIRE | | 4 | RIGHTFIRE |
| 5 | LEFTFIRE | | 5 | LEFTFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -57,8 +56,7 @@ instead. The respective observation spaces are
respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/AirRaid-v5") env = gymnasium.make("ALE/AirRaid-v5")
@@ -77,7 +75,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("AirRaid-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("AirRaid-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -19,19 +19,18 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Alien-v5")` | | Import | `gymnasium.make("ALE/Alien-v5")` |
### Description ## Description
You are stuck in a maze-like space ship with three aliens. You goal is to destroy their eggs that are scattered You are stuck in a maze-like space ship with three aliens. You goal is to destroy their eggs that are scattered
all over the ship while simultaneously avoiding the aliens (they are trying to kill you). You have a flamethrower that can help you turn all over the ship while simultaneously avoiding the aliens (they are trying to kill you). You have a flamethrower that can help you turn
them away in tricky situations. Moreover, you can occasionally collect a power-up (pulsar) that gives you the temporary ability to kill aliens. them away in tricky situations. Moreover, you can occasionally collect a power-up (pulsar) that gives you the temporary ability to kill aliens.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=815). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=815).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -54,8 +53,7 @@ via `gymnasium.make`.
You score points by destroying eggs, killing aliens, using pulsars, and collecting special prizes. When you are caught You score points by destroying eggs, killing aliens, using pulsars, and collecting special prizes. When you are caught
by an alien, you will lose one of your lives. The number of lives you have depends on the game flavor. For a by an alien, you will lose one of your lives. The number of lives you have depends on the game flavor. For a
table of scores corresponding to the different achievements, consult [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=815). table of scores corresponding to the different achievements, consult [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=815).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Alien-v5") env = gymnasium.make("ALE/Alien-v5")
@@ -75,7 +73,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Alien-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Alien-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -19,12 +19,12 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Amidar-v5")` | | Import | `gymnasium.make("ALE/Amidar-v5")` |
### Description ## Description
This game is similar to Pac-Man: You are trying to visit all places on a 2-dimensional grid while simultaneously avoiding This game is similar to Pac-Man: You are trying to visit all places on a 2-dimensional grid while simultaneously avoiding
your enemies. You can turn the tables at one point in the game: Your enemies turn into chickens and you can your enemies. You can turn the tables at one point in the game: Your enemies turn into chickens and you can
catch them. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=817). catch them. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=817).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -44,8 +44,7 @@ flavor looks like this:
| 7 | RIGHTFIRE | | 7 | RIGHTFIRE |
| 8 | LEFTFIRE | | 8 | LEFTFIRE |
| 9 | DOWNFIRE | | 9 | DOWNFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -67,8 +66,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
You score points by traversing new parts of the grid. Coloring an entire box in the maze or catching chickens gives extra points. You score points by traversing new parts of the grid. Coloring an entire box in the maze or catching chickens gives extra points.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=817). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=817).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Amidar-v5") env = gymnasium.make("ALE/Amidar-v5")
@@ -87,7 +85,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Amidar-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Amidar-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,11 +18,11 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Assault-v5")` | | Import | `gymnasium.make("ALE/Assault-v5")` |
### Description ## Description
You control a vehicle that can move sideways. A big mother ship circles overhead and continually deploys smaller drones. You control a vehicle that can move sideways. A big mother ship circles overhead and continually deploys smaller drones.
You must destroy these enemies and dodge their attacks. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=827). You must destroy these enemies and dodge their attacks. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=827).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -38,8 +38,7 @@ flavor looks like this:
| 4 | LEFT | | 4 | LEFT |
| 5 | RIGHTFIRE | | 5 | RIGHTFIRE |
| 6 | LEFTFIRE | | 6 | LEFTFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -56,8 +55,7 @@ instead. The respective observation spaces are
respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Assault-v5") env = gymnasium.make("ALE/Assault-v5")
@@ -75,7 +73,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Assault-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Assault-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Asterix-v5")` | | Import | `gymnasium.make("ALE/Asterix-v5")` |
### Description ## Description
You are Asterix and can move horizontally (continuously) and vertically (discretely). Objects You are Asterix and can move horizontally (continuously) and vertically (discretely). Objects
move horizontally across the screen: lyres and other (more useful) objects. Your goal is to guide move horizontally across the screen: lyres and other (more useful) objects. Your goal is to guide
Asterix in such a way as to avoid lyres and collect as many other objects as possible. You score points by collecting Asterix in such a way as to avoid lyres and collect as many other objects as possible. You score points by collecting
@@ -26,7 +26,7 @@ objects and lose a life whenever you collect a lyre. You have three lives availa
many points, you will be awarded additional points. many points, you will be awarded additional points.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=3325). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=3325).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -44,8 +44,7 @@ flavor looks like this:
| 6 | UPLEFT | | 6 | UPLEFT |
| 7 | DOWNRIGHT | | 7 | DOWNRIGHT |
| 8 | DOWNLEFT | | 8 | DOWNLEFT |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -65,8 +64,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
A table of scores awarded for collecting the different objects is provided on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=3325). A table of scores awarded for collecting the different objects is provided on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=3325).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Asterix-v5") env = gymnasium.make("ALE/Asterix-v5")
@@ -84,7 +82,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Asterix-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Asterix-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,13 +18,13 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Asteroids-v5")` | | Import | `gymnasium.make("ALE/Asteroids-v5")` |
### Description ## Description
This is a well-known arcade game: You control a spaceship in an asteroid field and must break up asteroids by shooting This is a well-known arcade game: You control a spaceship in an asteroid field and must break up asteroids by shooting
them. Once all asteroids are destroyed, you enter a new level and new asteroids will appear. You will occasionally them. Once all asteroids are destroyed, you enter a new level and new asteroids will appear. You will occasionally
be attacked by a flying saucer. be attacked by a flying saucer.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=828&itemTypeID=HTMLMANUAL) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=828&itemTypeID=HTMLMANUAL)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -49,8 +49,7 @@ flavor looks like this:
| 13 | UPLEFTFIRE | | 13 | UPLEFTFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -72,8 +71,7 @@ via `gymnasium.make`.
You score points for destroying asteroids, satellites and UFOs. The smaller the asteroid, the more points you score You score points for destroying asteroids, satellites and UFOs. The smaller the asteroid, the more points you score
for destroying it. for destroying it.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=828&itemTypeID=HTMLMANUAL). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=828&itemTypeID=HTMLMANUAL).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Asteroids-v5") env = gymnasium.make("ALE/Asteroids-v5")
@@ -92,7 +90,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Asteroids-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Asteroids-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -19,14 +19,14 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Atlantis-v5")` | | Import | `gymnasium.make("ALE/Atlantis-v5")` |
### Description ## Description
Your job is to defend the submerged city of Atlantis. Your enemies slowly descend towards the city and you must Your job is to defend the submerged city of Atlantis. Your enemies slowly descend towards the city and you must
destroy them before they reach striking distance. To this end, you control three defense posts. destroy them before they reach striking distance. To this end, you control three defense posts.
You lose if your enemies manage to destroy all seven of Atlantis' installations. You may rebuild installations You lose if your enemies manage to destroy all seven of Atlantis' installations. You may rebuild installations
after you have fought of a wave of enemies and scored a sufficient number of points. after you have fought of a wave of enemies and scored a sufficient number of points.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=835). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=835).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -39,8 +39,7 @@ flavor looks like this:
| 1 | FIRE | | 1 | FIRE |
| 2 | RIGHTFIRE | | 2 | RIGHTFIRE |
| 3 | LEFTFIRE | | 3 | LEFTFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -63,8 +62,7 @@ via `gymnasium.make`.
You score points for destroying enemies, keeping installations protected during attack waves. You score more points You score points for destroying enemies, keeping installations protected during attack waves. You score more points
if you manage to destroy your enemies with one of the outer defense posts. if you manage to destroy your enemies with one of the outer defense posts.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=835). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=835).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Atlantis-v5") env = gymnasium.make("ALE/Atlantis-v5")
@@ -83,7 +81,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Amidar-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Amidar-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/BankHeist-v5")` | | Import | `gymnasium.make("ALE/BankHeist-v5")` |
### Description ## Description
You are a bank robber and (naturally) want to rob as many banks as possible. You control your getaway car and must You are a bank robber and (naturally) want to rob as many banks as possible. You control your getaway car and must
navigate maze-like cities. The police chases you and will appear whenever you rob a bank. You may destroy police cars navigate maze-like cities. The police chases you and will appear whenever you rob a bank. You may destroy police cars
by dropping sticks of dynamite. You can fill up your gas tank by entering a new city. by dropping sticks of dynamite. You can fill up your gas tank by entering a new city.
@@ -26,12 +26,11 @@ At the beginning of the game you have four lives. Lives are lost if you run out
or run over the dynamite you have previously dropped. or run over the dynamite you have previously dropped.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=1008). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=1008).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -54,8 +53,7 @@ via `gymnasium.make`.
You score points for robbing banks and destroying police cars. If you rob nine or more banks, and then leave the city, You score points for robbing banks and destroying police cars. If you rob nine or more banks, and then leave the city,
you will score extra points. you will score extra points.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=1008). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=1008).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/BankHeist-v5") env = gymnasium.make("ALE/BankHeist-v5")
@@ -74,7 +72,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("BankHeist-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("BankHeist-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -19,20 +19,19 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/BattleZone-v5")` | | Import | `gymnasium.make("ALE/BattleZone-v5")` |
### Description ## Description
You control a tank and must destroy enemy vehicles. This game is played in a first-person perspective and creates You control a tank and must destroy enemy vehicles. This game is played in a first-person perspective and creates
a 3D illusion. A radar screen shows enemies around you. You start with 5 lives and gain up to 2 extra lives if you reach a 3D illusion. A radar screen shows enemies around you. You start with 5 lives and gain up to 2 extra lives if you reach
a sufficient score. a sufficient score.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=859&itemTypeID=HTMLMANUAL) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=859&itemTypeID=HTMLMANUAL)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -54,8 +53,7 @@ via `gymnasium.make`.
You receive points for destroying enemies. You receive points for destroying enemies.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=859&itemTypeID=HTMLMANUAL). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=859&itemTypeID=HTMLMANUAL).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/BattleZone-v5") env = gymnasium.make("ALE/BattleZone-v5")
@@ -74,7 +72,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("BattleZone-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("BattleZone-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -25,12 +25,12 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/BeamRider-v5")` | | Import | `gymnasium.make("ALE/BeamRider-v5")` |
### Description ## Description
You control a space-ship that travels forward at a constant speed. You can only steer it sideways between discrete You control a space-ship that travels forward at a constant speed. You can only steer it sideways between discrete
positions. Your goal is to destroy enemy ships, avoid their attacks and dodge space debris. positions. Your goal is to destroy enemy ships, avoid their attacks and dodge space debris.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_thumbs.php?SystemID=2600&SoftwareID=860&itemTypeID=MANUAL) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_thumbs.php?SystemID=2600&SoftwareID=860&itemTypeID=MANUAL)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -51,8 +51,7 @@ flavor looks like this:
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -73,8 +72,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
You score points for destroying enemies. You score points for destroying enemies.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_thumbs.php?SystemID=2600&SoftwareID=860&itemTypeID=MANUAL). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_thumbs.php?SystemID=2600&SoftwareID=860&itemTypeID=MANUAL).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/BeamRider-v5") env = gymnasium.make("ALE/BeamRider-v5")
@@ -93,7 +91,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("BeamRider-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("BeamRider-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,19 +18,18 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Berzerk-v5")` | | Import | `gymnasium.make("ALE/Berzerk-v5")` |
### Description ## Description
You are stuck in a maze with evil robots. You must destroy them and avoid touching the walls of the maze, as this will kill you. You may be awarded extra lives after scoring a sufficient number of points, depending on the game mode. You are stuck in a maze with evil robots. You must destroy them and avoid touching the walls of the maze, as this will kill you. You may be awarded extra lives after scoring a sufficient number of points, depending on the game mode.
You may also be chased by an undefeatable enemy, Evil Otto, that you must avoid. Evil Otto does not appear in the default mode. You may also be chased by an undefeatable enemy, Evil Otto, that you must avoid. Evil Otto does not appear in the default mode.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=866&itemTypeID=HTMLMANUAL) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=866&itemTypeID=HTMLMANUAL)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -51,8 +50,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
You score points for destroying robots. You score points for destroying robots.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=866&itemTypeID=HTMLMANUAL). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=866&itemTypeID=HTMLMANUAL).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Berzerk-v5") env = gymnasium.make("ALE/Berzerk-v5")
@@ -71,7 +69,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Berzerk-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Berzerk-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,13 +18,13 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Bowling-v5")` | | Import | `gymnasium.make("ALE/Bowling-v5")` |
### Description ## Description
Your goal is to score as many points as possible in the game of Bowling. A game consists of 10 frames and you have two Your goal is to score as many points as possible in the game of Bowling. A game consists of 10 frames and you have two
tries per frame. Knocking down all pins on the first try is called a "strike". Knocking down all pins on the second roll tries per frame. Knocking down all pins on the first try is called a "strike". Knocking down all pins on the second roll
is called a "spar". Otherwise, the frame is called "open". is called a "spar". Otherwise, the frame is called "open".
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=879) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=879)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -41,8 +41,7 @@ flavor looks like this:
| 5 | DOWNFIRE | | 5 | DOWNFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -65,8 +64,7 @@ You receive points for knocking down pins. The exact score depends on whether yo
frame. Moreover, the points you score for one frame may depend on following frames. frame. Moreover, the points you score for one frame may depend on following frames.
You can score up to 300 points in one game (if you manage to do 12 strikes). You can score up to 300 points in one game (if you manage to do 12 strikes).
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=879). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=879).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Bowling-v5") env = gymnasium.make("ALE/Bowling-v5")
@@ -85,7 +83,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Bowling-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Bowling-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,17 +18,16 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Boxing-v5")` | | Import | `gymnasium.make("ALE/Boxing-v5")` |
### Description ## Description
You fight an opponent in a boxing ring. You score points for hitting the opponent. If you score 100 points, your opponent is You fight an opponent in a boxing ring. You score points for hitting the opponent. If you score 100 points, your opponent is
knocked out. knocked out.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=882). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=882).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -50,8 +49,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
You score points by landing punches. You score points by landing punches.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=882). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=882).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Boxing-v5") env = gymnasium.make("ALE/Boxing-v5")
@@ -70,7 +68,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Boxing-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Boxing-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,13 +18,13 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Breakout-v5")` | | Import | `gymnasium.make("ALE/Breakout-v5")` |
### Description ## Description
Another famous Atari game. The dynamics are similar to pong: You move a paddle and hit the ball in a brick wall at the Another famous Atari game. The dynamics are similar to pong: You move a paddle and hit the ball in a brick wall at the
top of the screen. Your goal is to destroy the brick wall. You can try to break through the wall and let the ball top of the screen. Your goal is to destroy the brick wall. You can try to break through the wall and let the ball
wreak havoc on the other side, all on its own! You have five lives. wreak havoc on the other side, all on its own! You have five lives.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=889). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=889).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -37,8 +37,7 @@ flavor looks like this:
| 1 | FIRE | | 1 | FIRE |
| 2 | RIGHT | | 2 | RIGHT |
| 3 | LEFT | | 3 | LEFT |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -60,8 +59,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
You score points by destroying bricks in the wall. The reward for destroying a brick depends on the color of the brick. You score points by destroying bricks in the wall. The reward for destroying a brick depends on the color of the brick.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=889). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=889).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Breakout-v5") env = gymnasium.make("ALE/Breakout-v5")
@@ -80,7 +78,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Breakout-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Breakout-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,13 +18,13 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Carnival-v5")` | | Import | `gymnasium.make("ALE/Carnival-v5")` |
### Description ## Description
This is a "shoot 'em up" game. Targets move horizontally across the screen and you must shoot them. You are This is a "shoot 'em up" game. Targets move horizontally across the screen and you must shoot them. You are
in control of a gun that can be moved horizontally. The supply of ammunition is limited and chickens may steal some bullets in control of a gun that can be moved horizontally. The supply of ammunition is limited and chickens may steal some bullets
from you if you don't hit them in time. from you if you don't hit them in time.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=908). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=908).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -40,8 +40,7 @@ flavor looks like this:
| 3 | LEFT | | 3 | LEFT |
| 4 | RIGHTFIRE | | 4 | RIGHTFIRE |
| 5 | LEFTFIRE | | 5 | LEFTFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -64,8 +63,7 @@ via `gymnasium.make`.
You score points by destroying targets. Points (or bullets) may be subtracted if you hit the target when it shows a minus sign. You score points by destroying targets. Points (or bullets) may be subtracted if you hit the target when it shows a minus sign.
You will score extra points if it shows a plus sign! You will score extra points if it shows a plus sign!
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=908). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=908).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Carnival-v5") env = gymnasium.make("ALE/Carnival-v5")
@@ -84,7 +82,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Carnival-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Carnival-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,21 +18,20 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Centipede-v5")` | | Import | `gymnasium.make("ALE/Centipede-v5")` |
### Description ## Description
You are an elf and must use your magic wands to fend off spiders, fleas and centipedes. Your goal is to protect mushrooms in You are an elf and must use your magic wands to fend off spiders, fleas and centipedes. Your goal is to protect mushrooms in
an enchanted forest. If you are bitten by a spider, flea or centipede, you will be temporally paralyzed and you will an enchanted forest. If you are bitten by a spider, flea or centipede, you will be temporally paralyzed and you will
lose a magic wand. The game ends once you have lost all wands. You may receive additional wands after scoring lose a magic wand. The game ends once you have lost all wands. You may receive additional wands after scoring
a sufficient number of points. a sufficient number of points.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=911). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=911).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -55,8 +54,7 @@ You score points by hitting centipedes, scorpions, fleas and spiders. Additional
(i.e. after you have lost a wand) for mushrooms that were not destroyed. (i.e. after you have lost a wand) for mushrooms that were not destroyed.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=911). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=911).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Centipede-v5") env = gymnasium.make("ALE/Centipede-v5")
@@ -75,7 +73,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Centipede-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Centipede-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,18 +18,17 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/ChopperCommand-v5")` | | Import | `gymnasium.make("ALE/ChopperCommand-v5")` |
### Description ## Description
You control a helicopter and must protect truck convoys. To that end, you need to shoot down enemy aircraft. You control a helicopter and must protect truck convoys. To that end, you need to shoot down enemy aircraft.
A mini-map is displayed at the bottom of the screen. A mini-map is displayed at the bottom of the screen.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=921). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=921).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -52,8 +51,7 @@ You score points by destroying planes and other helicopters. You score extra poi
of trucks that have survived. of trucks that have survived.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=921). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareID=921).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/ChopperCommand-v5") env = gymnasium.make("ALE/ChopperCommand-v5")
@@ -72,7 +70,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("ChopperCommand-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("ChopperCommand-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/CrazyClimber-v5")` | | Import | `gymnasium.make("ALE/CrazyClimber-v5")` |
### Description ## Description
You are a climber trying to reach the top of four builidings, while avoiding obstacles like closing You are a climber trying to reach the top of four builidings, while avoiding obstacles like closing
windows and falling objects. When you receive damage (windows closing or objects) you will fall and windows and falling objects. When you receive damage (windows closing or objects) you will fall and
lose one life; you have a total of 5 lives before the end games. At the top of each building, there's lose one life; you have a total of 5 lives before the end games. At the top of each building, there's
@@ -28,7 +28,7 @@ possible while receiving the least amount of damage.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=113). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=113).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -46,8 +46,7 @@ flavor looks like this:
| 6 | UPLEFT | | 6 | UPLEFT |
| 7 | DOWNRIGHT | | 7 | DOWNRIGHT |
| 8 | DOWNLEFT | | 8 | DOWNLEFT |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -67,8 +66,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
A table of scores awarded for completing each row of a building is provided on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=113). A table of scores awarded for completing each row of a building is provided on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=113).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/CrazyClimber-v5") env = gymnasium.make("ALE/CrazyClimber-v5")
@@ -87,7 +85,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("CrazyClimber-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("CrazyClimber-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -19,7 +19,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Defender-v5")` | | Import | `gymnasium.make("ALE/Defender-v5")` |
### Description ## Description
Aliens attack the earth. You control a spaceship and must defend humanity by destroying alien ships and rescuing humanoids. Aliens attack the earth. You control a spaceship and must defend humanity by destroying alien ships and rescuing humanoids.
You have three lives and three smart bombs. You lose a live when you are shot down by an alien spaceship. You have three lives and three smart bombs. You lose a live when you are shot down by an alien spaceship.
Points are scored by destroying enemies and retrieving humans that are being abducted. You have an unlimited number of Points are scored by destroying enemies and retrieving humans that are being abducted. You have an unlimited number of
@@ -27,13 +27,12 @@ laser missiles.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=128) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=128)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -55,8 +54,7 @@ via `gymnasium.make`.
You receive points for destroying enemies, rescuing abducted humans and keeping humans alive. You receive points for destroying enemies, rescuing abducted humans and keeping humans alive.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=128). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=128).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Defender-v5") env = gymnasium.make("ALE/Defender-v5")
@@ -75,7 +73,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Defender-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Defender-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/DemonAttack-v5")` | | Import | `gymnasium.make("ALE/DemonAttack-v5")` |
### Description ## Description
You are facing waves of demons in the ice planet of Krybor. Points are accumulated by destroying You are facing waves of demons in the ice planet of Krybor. Points are accumulated by destroying
demons. You begin with 3 reserve bunkers, and can increase its number (up to 6) by avoiding enemy demons. You begin with 3 reserve bunkers, and can increase its number (up to 6) by avoiding enemy
attacks. Each attack wave you survive without any hits, grants you a new bunker. Every time an enemy attacks. Each attack wave you survive without any hits, grants you a new bunker. Every time an enemy
@@ -27,7 +27,7 @@ the game ends.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=135). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=135).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -45,8 +45,7 @@ flavor looks like this:
| 6 | UPLEFT | | 6 | UPLEFT |
| 7 | DOWNRIGHT | | 7 | DOWNRIGHT |
| 8 | DOWNLEFT | | 8 | DOWNLEFT |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -68,8 +67,7 @@ via `gymnasium.make`.
Each enemy you slay gives you points. The amount of points depends on the type of demon and which Each enemy you slay gives you points. The amount of points depends on the type of demon and which
wave you are in. A detailed table of scores is provided on [the AtariAge wave you are in. A detailed table of scores is provided on [the AtariAge
page](https://atariage.com/manual_html_page.php?SoftwareLabelID=135). page](https://atariage.com/manual_html_page.php?SoftwareLabelID=135).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/DemonAttack-v5") env = gymnasium.make("ALE/DemonAttack-v5")
@@ -88,7 +86,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("DemonAttack-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("DemonAttack-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,14 +18,14 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/DoubleDunk-v5")` | | Import | `gymnasium.make("ALE/DoubleDunk-v5")` |
### Description ## Description
You are playing a 2v2 game of basketball. At the start of each possession, you select between a set You are playing a 2v2 game of basketball. At the start of each possession, you select between a set
of different plays and then execute them to either score or prevent your rivals from scoring. The of different plays and then execute them to either score or prevent your rivals from scoring. The
game lasts a set amount of time or until one of the teams reaches a certain score game lasts a set amount of time or until one of the teams reaches a certain score
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=153). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=153).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -43,8 +43,7 @@ flavor looks like this:
| 6 | UPLEFT | | 6 | UPLEFT |
| 7 | DOWNRIGHT | | 7 | DOWNRIGHT |
| 8 | DOWNLEFT | | 8 | DOWNLEFT |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -66,8 +65,7 @@ via `gymnasium.make`.
Scores follow the rules of basketball. You can get either 3 points, 2 points foul line) depending Scores follow the rules of basketball. You can get either 3 points, 2 points foul line) depending
from where you shoot. After a defensive foul, a successful shot from the foul line gives you 1 from where you shoot. After a defensive foul, a successful shot from the foul line gives you 1
point. point.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/DoubleDunk-v5") env = gymnasium.make("ALE/DoubleDunk-v5")
@@ -86,7 +84,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("DoubleDunk-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("DoubleDunk-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/ElevatorAction-v5")` | | Import | `gymnasium.make("ALE/ElevatorAction-v5")` |
### Description ## Description
You are a secret agent that must retrieve some secret documents and reach the ground level of a You are a secret agent that must retrieve some secret documents and reach the ground level of a
building by going down an elevator/stairs. Once you reach the ground level, you are picked up and building by going down an elevator/stairs. Once you reach the ground level, you are picked up and
taken to the next level. You are equipped with a gun to defend yourself against enemy agents waiting taken to the next level. You are equipped with a gun to defend yourself against enemy agents waiting
@@ -28,7 +28,7 @@ marked with a red door, which contain the secret documents.
This is an unreleased prototype based on the arcade game. Limited documentation can be found on [the AtariAge This is an unreleased prototype based on the arcade game. Limited documentation can be found on [the AtariAge
page](https://atariage.com/manual_thumbs.php?SoftwareLabelID=1131). page](https://atariage.com/manual_thumbs.php?SoftwareLabelID=1131).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -46,8 +46,7 @@ flavor looks like this:
| 6 | UPLEFT | | 6 | UPLEFT |
| 7 | DOWNRIGHT | | 7 | DOWNRIGHT |
| 8 | DOWNLEFT | | 8 | DOWNLEFT |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -69,8 +68,7 @@ via `gymnasium.make`.
You start with 4 lives and are awarded 100 points for each enemy shot, and 500 points for each You start with 4 lives and are awarded 100 points for each enemy shot, and 500 points for each
secret document collected (visiting a red door). Each time you get shot you lose one life and the secret document collected (visiting a red door). Each time you get shot you lose one life and the
game ends when losing all lives. game ends when losing all lives.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/ElevatorAction-v5") env = gymnasium.make("ALE/ElevatorAction-v5")
@@ -89,7 +87,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("ElevatorAction-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("ElevatorAction-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,14 +18,14 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Enduro-v5")` | | Import | `gymnasium.make("ALE/Enduro-v5")` |
### Description ## Description
You are a racer in the National Enduro, a long-distance endurance race. You must overtake a certain You are a racer in the National Enduro, a long-distance endurance race. You must overtake a certain
amount of cars each day to stay on the race. The first day you need to pass 200 cars, and 300 for amount of cars each day to stay on the race. The first day you need to pass 200 cars, and 300 for
each following day. The game ends if you do not meet your overtake quota for the day. each following day. The game ends if you do not meet your overtake quota for the day.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=163). Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=163).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -43,8 +43,7 @@ flavor looks like this:
| 6 | UPLEFT | | 6 | UPLEFT |
| 7 | DOWNRIGHT | | 7 | DOWNRIGHT |
| 8 | DOWNLEFT | | 8 | DOWNLEFT |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -64,8 +63,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
You get 1 point for each vehicle you overtake. You get 1 point for each vehicle you overtake.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Enduro-v5") env = gymnasium.make("ALE/Enduro-v5")
@@ -84,7 +82,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Enduro-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Enduro-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/FishingDerby-v5")` | | Import | `gymnasium.make("ALE/FishingDerby-v5")` |
### Description ## Description
your objective is to catch more sunfish than your opponent. But it's not just between you and the other fisherman, as a big, black shark is lurking just below the surface, waiting to steal your catch! Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=182). your objective is to catch more sunfish than your opponent. But it's not just between you and the other fisherman, as a big, black shark is lurking just below the surface, waiting to steal your catch! Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=182).
### Rewards ### Rewards
@@ -58,8 +58,7 @@ legal space by passing the keyword argument `full_action_space=True` to `make`.
The reduced action space of an Atari environment may depend on the flavor of the game. You can specify the flavor by providing The reduced action space of an Atari environment may depend on the flavor of the game. You can specify the flavor by providing
the arguments `difficulty` and `mode` when constructing the environment. This documentation only provides details on the the arguments `difficulty` and `mode` when constructing the environment. This documentation only provides details on the
action spaces of default flavors. action spaces of default flavors.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -76,8 +75,7 @@ instead. The respective observation spaces are
respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/FishingDerby-v5") env = gymnasium.make("ALE/FishingDerby-v5")
@@ -95,7 +93,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("FishingDerby-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("FishingDerby-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Freeway-v5")` | | Import | `gymnasium.make("ALE/Freeway-v5")` |
### Description ## Description
your objective is to guide your chicken across lane after lane of busy rush hour traffic. You receive a point for every chicken that makes it to the top of the screen after crossing all the lanes of traffic. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_thumbs.php?SoftwareLabelID=192). your objective is to guide your chicken across lane after lane of busy rush hour traffic. You receive a point for every chicken that makes it to the top of the screen after crossing all the lanes of traffic. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_thumbs.php?SoftwareLabelID=192).
### Rewards ### Rewards
@@ -58,8 +58,7 @@ legal space by passing the keyword argument `full_action_space=True` to `make`.
The reduced action space of an Atari environment may depend on the flavor of the game. You can specify the flavor by providing The reduced action space of an Atari environment may depend on the flavor of the game. You can specify the flavor by providing
the arguments `difficulty` and `mode` when constructing the environment. This documentation only provides details on the the arguments `difficulty` and `mode` when constructing the environment. This documentation only provides details on the
action spaces of default flavors. action spaces of default flavors.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -76,8 +75,7 @@ instead. The respective observation spaces are
respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Freeway-v5") env = gymnasium.make("ALE/Freeway-v5")
@@ -96,7 +94,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Freeway-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Freeway-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Frostbite-v5")` | | Import | `gymnasium.make("ALE/Frostbite-v5")` |
### Description ## Description
In Frostbite, the player controls "Frostbite Bailey" who hops back and forth across across an Arctic river, changing the color of the ice blocks from white to blue. Each time he does so, a block is added to his igloo. [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=199). In Frostbite, the player controls "Frostbite Bailey" who hops back and forth across across an Arctic river, changing the color of the ice blocks from white to blue. Each time he does so, a block is added to his igloo. [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=199).
### Rewards ### Rewards
@@ -58,8 +58,7 @@ legal space by passing the keyword argument `full_action_space=True` to `make`.
The reduced action space of an Atari environment may depend on the flavor of the game. You can specify the flavor by providing The reduced action space of an Atari environment may depend on the flavor of the game. You can specify the flavor by providing
the arguments `difficulty` and `mode` when constructing the environment. This documentation only provides details on the the arguments `difficulty` and `mode` when constructing the environment. This documentation only provides details on the
action spaces of default flavors. action spaces of default flavors.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -76,8 +75,7 @@ instead. The respective observation spaces are
respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Frostbite-v5") env = gymnasium.make("ALE/Frostbite-v5")
@@ -96,7 +94,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Frostbite-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Frostbite-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Gopher-v5")` | | Import | `gymnasium.make("ALE/Gopher-v5")` |
### Description ## Description
The player controls a shovel-wielding farmer who protects a crop of three carrots from a gopher. [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=218). The player controls a shovel-wielding farmer who protects a crop of three carrots from a gopher. [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=218).
### Rewards ### Rewards
@@ -58,8 +58,7 @@ legal space by passing the keyword argument `full_action_space=True` to `make`.
The reduced action space of an Atari environment may depend on the flavor of the game. You can specify the flavor by providing The reduced action space of an Atari environment may depend on the flavor of the game. You can specify the flavor by providing
the arguments `difficulty` and `mode` when constructing the environment. This documentation only provides details on the the arguments `difficulty` and `mode` when constructing the environment. This documentation only provides details on the
action spaces of default flavors. action spaces of default flavors.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -76,8 +75,7 @@ instead. The respective observation spaces are
respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Gopher-v5") env = gymnasium.make("ALE/Gopher-v5")
@@ -96,7 +94,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Gopher-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Gopher-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Gravitar-v5")` | | Import | `gymnasium.make("ALE/Gravitar-v5")` |
### Description ## Description
The player controls a small blue spacecraft. The game starts in a fictional solar system with several planets to explore. If the player moves his ship into a planet, he will be taken to a side-view landscape. Player has to destroy red bunkers [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=223). The player controls a small blue spacecraft. The game starts in a fictional solar system with several planets to explore. If the player moves his ship into a planet, he will be taken to a side-view landscape. Player has to destroy red bunkers [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=223).
### Rewards ### Rewards
@@ -58,8 +58,7 @@ legal space by passing the keyword argument `full_action_space=True` to `make`.
The reduced action space of an Atari environment may depend on the flavor of the game. You can specify the flavor by providing The reduced action space of an Atari environment may depend on the flavor of the game. You can specify the flavor by providing
the arguments `difficulty` and `mode` when constructing the environment. This documentation only provides details on the the arguments `difficulty` and `mode` when constructing the environment. This documentation only provides details on the
action spaces of default flavors. action spaces of default flavors.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -76,8 +75,7 @@ instead. The respective observation spaces are
respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Gravitar-v5") env = gymnasium.make("ALE/Gravitar-v5")
@@ -96,7 +94,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Gravitar-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Gravitar-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -19,7 +19,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Hero-v5")` | | Import | `gymnasium.make("ALE/Hero-v5")` |
### Description ## Description
You need to rescue miners that are stuck in a mine shaft. You have access to various tools: A propeller backpack that You need to rescue miners that are stuck in a mine shaft. You have access to various tools: A propeller backpack that
allows you to fly wherever you want, sticks of dynamite that can be used to blast through walls, a laser beam to kill allows you to fly wherever you want, sticks of dynamite that can be used to blast through walls, a laser beam to kill
vermin, and a raft to float across stretches of lava. vermin, and a raft to float across stretches of lava.
@@ -27,13 +27,12 @@ You have a limited amount of power. Once you run out, you lose a live.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=228) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=228)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -56,8 +55,7 @@ You score points for shooting critters, rescuing miners, and dynamiting walls.
Extra points are rewarded for any power remaining after rescuing a miner. Extra points are rewarded for any power remaining after rescuing a miner.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=228). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=228).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Hero-v5") env = gymnasium.make("ALE/Hero-v5")
@@ -76,7 +74,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Hero-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Hero-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,19 +18,18 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/IceHockey-v5")` | | Import | `gymnasium.make("ALE/IceHockey-v5")` |
### Description ## Description
Your goal is to score as many points as possible in a standard game of Ice Hockey over a 3-minute time period. The ball is usually called "the puck". Your goal is to score as many points as possible in a standard game of Ice Hockey over a 3-minute time period. The ball is usually called "the puck".
There are 32 shot angles ranging from the extreme left to the extreme right. The angles can only aim towards the opponent's goal. There are 32 shot angles ranging from the extreme left to the extreme right. The angles can only aim towards the opponent's goal.
Just as in real hockey, you can pass the puck by shooting it off the sides of the rink. This can be really key when you're in position to score a goal. Just as in real hockey, you can pass the puck by shooting it off the sides of the rink. This can be really key when you're in position to score a goal.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=241) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=241)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -53,8 +52,7 @@ via `gymnasium.make`.
You score points by shooting the puck into your opponent's goal. Your opponent scores in the same manner. You score points by shooting the puck into your opponent's goal. Your opponent scores in the same manner.
There are no limits to how many points you can get per game, other than the time limit of 3-minute games. There are no limits to how many points you can get per game, other than the time limit of 3-minute games.
For a more detailed documentation, consult [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=241). For a more detailed documentation, consult [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=241).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/IceHockey-v5") env = gymnasium.make("ALE/IceHockey-v5")
@@ -73,7 +71,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Icehockey-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Icehockey-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,20 +18,19 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Jamesbond-v5")` | | Import | `gymnasium.make("ALE/Jamesbond-v5")` |
### Description ## Description
Your mission is to control Mr. Bond's specially designed multipurpose craft to complete a variety of missions. Your mission is to control Mr. Bond's specially designed multipurpose craft to complete a variety of missions.
The craft moves forward with a right motion and slightly back with a left motion. The craft moves forward with a right motion and slightly back with a left motion.
An up or down motion causes the craft to jump or dive. An up or down motion causes the craft to jump or dive.
You can also fire by either lobbing a bomb to the bottom of the screen or firing a fixed angle shot to the top of the screen. You can also fire by either lobbing a bomb to the bottom of the screen or firing a fixed angle shot to the top of the screen.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=250) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=250)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -54,8 +53,7 @@ via `gymnasium.make`.
The game ends when you complete the last mission or when you lose the last craft. In either case, you'll receive your final score. The game ends when you complete the last mission or when you lose the last craft. In either case, you'll receive your final score.
There will be a rating based on your score. The highest rating in NOVICE is 006. The highest rating in AGENT is 007. There will be a rating based on your score. The highest rating in NOVICE is 006. The highest rating in AGENT is 007.
For a more detailed documentation, consult [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=250). For a more detailed documentation, consult [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=250).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Jamesbond-v5") env = gymnasium.make("ALE/Jamesbond-v5")
@@ -74,7 +72,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Jamesbond-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Jamesbond-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,12 +18,12 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/JourneyEscape-v5")` | | Import | `gymnasium.make("ALE/JourneyEscape-v5")` |
### Description ## Description
You must lead all 5 members of JOURNEY through waves of pesky characters and backstage obstacles to the Scarab Escape Vehicle before time runs out. You must lead all 5 members of JOURNEY through waves of pesky characters and backstage obstacles to the Scarab Escape Vehicle before time runs out.
You must also protect $50,000 in concert cash from grasping groupies, photographers, and promoters. You must also protect $50,000 in concert cash from grasping groupies, photographers, and promoters.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=252) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=252)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -48,8 +48,7 @@ flavor looks like this:
| 15 | UPLEFTFIRE | | 15 | UPLEFTFIRE |
| 16 | DOWNRIGHTFIRE | | 16 | DOWNRIGHTFIRE |
| 17 | DOWNLEFTFIRE | | 17 | DOWNLEFTFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -72,8 +71,7 @@ via `gymnasium.make`.
At the start of the game, you will have $50,000 and 60 units of time. At the start of the game, you will have $50,000 and 60 units of time.
Your end game score with be dependent on how much time you have remaining and who you encounter along the way. Your end game score with be dependent on how much time you have remaining and who you encounter along the way.
For a more detailed documentation, consult [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=252). For a more detailed documentation, consult [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=252).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/JourneyEscape-v5") env = gymnasium.make("ALE/JourneyEscape-v5")
@@ -92,7 +90,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("JourneyEscape-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("JourneyEscape-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,18 +18,17 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Kangaroo-v5")` | | Import | `gymnasium.make("ALE/Kangaroo-v5")` |
### Description ## Description
The object of the game is to score as many points as you can while controlling Mother Kangaroo to rescue her precious baby. You start the game with three lives. The object of the game is to score as many points as you can while controlling Mother Kangaroo to rescue her precious baby. You start the game with three lives.
During this rescue mission, Mother Kangaroo encounters many obstacles. You need to help her climb ladders, pick bonus fruit, and throw punches at monkeys. During this rescue mission, Mother Kangaroo encounters many obstacles. You need to help her climb ladders, pick bonus fruit, and throw punches at monkeys.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=923) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=923)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -52,8 +51,7 @@ via `gymnasium.make`.
Your score will be shown at the top right corner of the game. Your score will be shown at the top right corner of the game.
Your end game score with be dependent on how much time you have remaining and who you encounter along the way. Your end game score with be dependent on how much time you have remaining and who you encounter along the way.
For a more detailed documentation, consult [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=923). For a more detailed documentation, consult [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=923).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Kangaroo-v5") env = gymnasium.make("ALE/Kangaroo-v5")
@@ -72,7 +70,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Kangaroo-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Kangaroo-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,18 +18,17 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Krull-v5")` | | Import | `gymnasium.make("ALE/Krull-v5")` |
### Description ## Description
Your mission is to find and enter the Beast's Black Fortress, rescue Princess Lyssa, and destroy the Beast. Your mission is to find and enter the Beast's Black Fortress, rescue Princess Lyssa, and destroy the Beast.
The task is not an easy one, for the location of the Black Fortress changes with each sunrise on Krull. The task is not an easy one, for the location of the Black Fortress changes with each sunrise on Krull.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=267) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=267)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -52,8 +51,7 @@ via `gymnasium.make`.
You will receive various scores for each monster you kill. You will receive various scores for each monster you kill.
You can play the game until you have lost all your lives. You can play the game until you have lost all your lives.
For a more detailed documentation, consult [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=267). For a more detailed documentation, consult [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=267).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Krull-v5") env = gymnasium.make("ALE/Krull-v5")
@@ -72,7 +70,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Krull-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Krull-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,10 +18,10 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/KungFuMaster-v5")` | | Import | `gymnasium.make("ALE/KungFuMaster-v5")` |
### Description ## Description
You are a Kung-Fu Master fighting your way through the Evil Wizard's temple. Your goal is to rescue Princess Victoria, defeating various enemies along the way. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_thumbs.php?SoftwareLabelID=268). You are a Kung-Fu Master fighting your way through the Evil Wizard's temple. Your goal is to rescue Princess Victoria, defeating various enemies along the way. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_thumbs.php?SoftwareLabelID=268).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this: By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this:
| Num | Action | | Num | Action |
@@ -40,8 +40,7 @@ By default, all actions that can be performed on an Atari 2600 are available in
| 11 | UPLEFTFIRE | | 11 | UPLEFTFIRE |
| 12 | DOWNRIGHTFIRE | | 12 | DOWNRIGHTFIRE |
| 13 | DOWNLEFTFIRE | | 13 | DOWNLEFTFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -58,8 +57,7 @@ instead. The respective observation spaces are
respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/KungFuMaster-v5") env = gymnasium.make("ALE/KungFuMaster-v5")
@@ -76,7 +74,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("KungFuMaster-v0")` The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("KungFuMaster-v0")`
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,15 +18,14 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/MontezumaRevenge-v5")` | | Import | `gymnasium.make("ALE/MontezumaRevenge-v5")` |
### Description ## Description
Your goal is to acquire Montezuma's treasure by making your way through a maze of chambers within the emperor's fortress. You must avoid deadly creatures while collecting valuables and tools which can help you escape with the treasure. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=310). Your goal is to acquire Montezuma's treasure by making your way through a maze of chambers within the emperor's fortress. You must avoid deadly creatures while collecting valuables and tools which can help you escape with the treasure. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=310).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions
will be available in the default flavor. will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -43,8 +42,7 @@ instead. The respective observation spaces are
respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/MontezumaRevenge-v5") env = gymnasium.make("ALE/MontezumaRevenge-v5")
@@ -61,7 +59,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("MontezumaRevenge-v0")` The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("MontezumaRevenge-v0")`
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,10 +18,10 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/MsPacman-v5")` | | Import | `gymnasium.make("ALE/MsPacman-v5")` |
### Description ## Description
Your goal is to collect all of the pellets on the screen while avoiding the ghosts. Your goal is to collect all of the pellets on the screen while avoiding the ghosts.
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this: By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this:
| Num | Action | | Num | Action |
@@ -35,8 +35,7 @@ By default, all actions that can be performed on an Atari 2600 are available in
| 6 | UPLEFT | | 6 | UPLEFT |
| 7 | DOWNRIGHT | | 7 | DOWNRIGHT |
| 8 | DOWNLEFT | | 8 | DOWNLEFT |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -53,8 +52,7 @@ instead. The respective observation spaces are
respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/MsPacman-v5") env = gymnasium.make("ALE/MsPacman-v5")
@@ -71,7 +69,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("MsPacman-v0")` The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("MsPacman-v0")`
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,10 +18,10 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/NameThisGame-v5")` | | Import | `gymnasium.make("ALE/NameThisGame-v5")` |
### Description ## Description
Your goal is to defend the treasure that you have discovered. You must fight off a shark and an octopus while keeping an eye on your oxygen supply. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=323). Your goal is to defend the treasure that you have discovered. You must fight off a shark and an octopus while keeping an eye on your oxygen supply. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=323).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this: By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this:
| Num | Action | | Num | Action |
@@ -33,8 +33,7 @@ By default, all actions that can be performed on an Atari 2600 are available in
| 4 | RIGHTFIRE | | 4 | RIGHTFIRE |
| 5 | LEFTFIRE | | 5 | LEFTFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -51,8 +50,7 @@ instead. The respective observation spaces are
respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/NameThisGame-v5") env = gymnasium.make("ALE/NameThisGame-v5")
@@ -69,7 +67,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("NameThisGame-v0")` The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("NameThisGame-v0")`
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,10 +18,10 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Phoenix-v5")` | | Import | `gymnasium.make("ALE/Phoenix-v5")` |
### Description ## Description
Your goal is to reach and shoot the alien pilot. On your way there, you must eliminate waves of war birds while avoiding their bombs. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_thumbs.php?SoftwareLabelID=355). Your goal is to reach and shoot the alien pilot. On your way there, you must eliminate waves of war birds while avoiding their bombs. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_thumbs.php?SoftwareLabelID=355).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this: By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this:
| Num | Action | | Num | Action |
@@ -34,8 +34,7 @@ By default, all actions that can be performed on an Atari 2600 are available in
| 5 | RIGHTFIRE | | 5 | RIGHTFIRE |
| 6 | LEFTFIRE | | 6 | LEFTFIRE |
| 7 | DOWNFIRE | | 7 | DOWNFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -52,8 +51,7 @@ instead. The respective observation spaces are
respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments respectively. The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Phoenix-v5") env = gymnasium.make("ALE/Phoenix-v5")
@@ -70,7 +68,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Phoenix-v0")` The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Phoenix-v0")`
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,16 +18,15 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Pitfall-v5")` | | Import | `gymnasium.make("ALE/Pitfall-v5")` |
### Description ## Description
You control Pitfall Harry and are tasked with collecting all the treasures in a jungle within 20 minutes. You have three lives. The game is over if you collect all the treasures or if you die or if the time runs out. You control Pitfall Harry and are tasked with collecting all the treasures in a jungle within 20 minutes. You have three lives. The game is over if you collect all the treasures or if you die or if the time runs out.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=360) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=360)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor. Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
- A grayscale image - A grayscale image
@@ -47,8 +46,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
You get score points for collecting treasure, you lose points through some misfortunes like falling down a hole. You get score points for collecting treasure, you lose points through some misfortunes like falling down a hole.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=360). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=360).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Pitfall-v5") env = gymnasium.make("ALE/Pitfall-v5")
@@ -67,7 +65,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Pitfall-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Pitfall-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,11 +18,11 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Pong-v5")` | | Import | `gymnasium.make("ALE/Pong-v5")` |
### Description ## Description
You control the right paddle, you compete against the left paddle controlled by the computer. You each try to keep deflecting the ball away from your goal and into your opponent's goal. You control the right paddle, you compete against the left paddle controlled by the computer. You each try to keep deflecting the ball away from your goal and into your opponent's goal.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=587) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=587)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of `mode` and `difficulty`). The reduced action space for the default number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of `mode` and `difficulty`). The reduced action space for the default
@@ -38,8 +38,7 @@ flavor looks like this:
| 5 | LEFTFIRE | | 5 | LEFTFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
- A grayscale image - A grayscale image
@@ -59,8 +58,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
You get score points for getting the ball to pass the opponent's paddle. You lose points if the ball passes your paddle. You get score points for getting the ball to pass the opponent's paddle. You lose points if the ball passes your paddle.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=587). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=587).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Pong-v5") env = gymnasium.make("ALE/Pong-v5")
@@ -79,7 +77,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Pong-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Pong-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,12 +18,12 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Pooyan-v5")` | | Import | `gymnasium.make("ALE/Pooyan-v5")` |
### Description ## Description
You are a mother pig protecting her piglets (Pooyans) from wolves. In the first scene, you can move up and down a rope. Try to shoot the worker's balloons, while guarding yourself from attacks. If the wolves reach the ground safely they will get behind and try to eat you. In the second scene, the wolves try to float up. You have to try and stop them using arrows and bait. You die if a wolf eats you, or a stone or rock hits you. You are a mother pig protecting her piglets (Pooyans) from wolves. In the first scene, you can move up and down a rope. Try to shoot the worker's balloons, while guarding yourself from attacks. If the wolves reach the ground safely they will get behind and try to eat you. In the second scene, the wolves try to float up. You have to try and stop them using arrows and bait. You die if a wolf eats you, or a stone or rock hits you.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=372) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=372)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of `mode` and `difficulty`). The reduced action space for the default number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of `mode` and `difficulty`). The reduced action space for the default
@@ -37,8 +37,7 @@ flavor looks like this:
| 3 | DOWN | | 3 | DOWN |
| 4 | UPFIRE | | 4 | UPFIRE |
| 5 | DOWNFIRE | | 5 | DOWNFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
- A grayscale image - A grayscale image
@@ -58,8 +57,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
If you hit a balloon, wolf or stone with an arrow you score points. If you hit a balloon, wolf or stone with an arrow you score points.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=372). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=372).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Pooyan-v5") env = gymnasium.make("ALE/Pooyan-v5")
@@ -78,7 +76,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Pooyan-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Pooyan-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,16 +18,15 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/PrivateEye-v5")` | | Import | `gymnasium.make("ALE/PrivateEye-v5")` |
### Description ## Description
You control the French Private Eye Pierre Touche. Navigate the city streets, parks, secret passages, dead-ends and one-ways in search of the ringleader, Henri Le Fiend and his gang. You also need to find evidence and stolen goods that are scattered about. There are five cases, complete each case before its statute of limitations expires. You control the French Private Eye Pierre Touche. Navigate the city streets, parks, secret passages, dead-ends and one-ways in search of the ringleader, Henri Le Fiend and his gang. You also need to find evidence and stolen goods that are scattered about. There are five cases, complete each case before its statute of limitations expires.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=376) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=376)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor. Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
- A grayscale image - A grayscale image
@@ -47,8 +46,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
You score points for completing your tasks like gathering evidence, nabbing questionable characters or closing cases etc. You lose points if you get hit or if your auto is on a pothole. You score points for completing your tasks like gathering evidence, nabbing questionable characters or closing cases etc. You lose points if you get hit or if your auto is on a pothole.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=376). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=376).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/PrivateEye-v5") env = gymnasium.make("ALE/PrivateEye-v5")
@@ -67,7 +65,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("PrivateEye-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("PrivateEye-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,11 +18,11 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Qbert-v5")` | | Import | `gymnasium.make("ALE/Qbert-v5")` |
### Description ## Description
You are Q*bert. Your goal is to change the color of all the cubes on the pyramid to the pyramid's 'destination' color. To do this, you must hop on each cube on the pyramid one at a time while avoiding nasty creatures that lurk there. You are Q*bert. Your goal is to change the color of all the cubes on the pyramid to the pyramid's 'destination' color. To do this, you must hop on each cube on the pyramid one at a time while avoiding nasty creatures that lurk there.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=1224&itemTypeID=HTMLMANUAL) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=1224&itemTypeID=HTMLMANUAL)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of `mode` and `difficulty`). The reduced action space for the default number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of `mode` and `difficulty`). The reduced action space for the default
@@ -37,8 +37,7 @@ flavor looks like this:
| 4 | LEFT | | 4 | LEFT |
| 5 | DOWN | | 5 | DOWN |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
- A grayscale image - A grayscale image
@@ -58,8 +57,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
You score points for changing color of the cubes to their destination colors or by defeating enemies. You also gain points for completing a level. You score points for changing color of the cubes to their destination colors or by defeating enemies. You also gain points for completing a level.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=1224&itemTypeID=HTMLMANUAL). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SystemID=2600&SoftwareID=1224&itemTypeID=HTMLMANUAL).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Qbert-v5") env = gymnasium.make("ALE/Qbert-v5")
@@ -78,7 +76,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Qbert-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Qbert-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Riverraid-v5")` | | Import | `gymnasium.make("ALE/Riverraid-v5")` |
### Description ## Description
You control a jet that flies over a river: you can move it sideways and fire missiles to destroy enemy objects. Each time an enemy object is destroyed you score points (i.e. rewards). You control a jet that flies over a river: you can move it sideways and fire missiles to destroy enemy objects. Each time an enemy object is destroyed you score points (i.e. rewards).
You lose a jet when you run out of fuel: fly over a fuel depot when you begin to run low. You lose a jet when you run out of fuel: fly over a fuel depot when you begin to run low.
@@ -29,10 +29,9 @@ The game begins with a squadron of three jets in reserve and you're given an add
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=409) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=409)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment.Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor. By default, all actions that can be performed on an Atari 2600 are available in this environment.Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -62,8 +61,7 @@ Score points are your only reward. You get score points each time you destroy an
| Bridge | 500 | | Bridge | 500 |
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=409). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=409).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Riverraid-v5") env = gymnasium.make("ALE/Riverraid-v5")
@@ -82,7 +80,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Riverraid-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Riverraid-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/RoadRunner-v0")` | | Import | `gymnasium.make("ALE/RoadRunner-v0")` |
### Description ## Description
You control the Road Runner(TM) in a race; you can control the direction to run in and times to jumps. You control the Road Runner(TM) in a race; you can control the direction to run in and times to jumps.
The goal is to outrun Wile E. Coyote(TM) while avoiding the hazards of the desert. The goal is to outrun Wile E. Coyote(TM) while avoiding the hazards of the desert.
@@ -32,10 +32,9 @@ destroying the coyote.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=412) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=412)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment.Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor. By default, all actions that can be performed on an Atari 2600 are available in this environment.Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -64,8 +63,7 @@ Score points are your only reward. You get score points each time you:
| get the coyote hit by a truck | 1000 | | get the coyote hit by a truck | 1000 |
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=412). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=412).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/RoadRunner-v5") env = gymnasium.make("ALE/RoadRunner-v5")
@@ -84,7 +82,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("RoadRunner-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("RoadRunner-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Robotank-v0")` | | Import | `gymnasium.make("ALE/Robotank-v0")` |
### Description ## Description
You control your Robot Tanks to destroy enemies and avoid enemy fire. You control your Robot Tanks to destroy enemies and avoid enemy fire.
Game ends when all of your Robot Tanks are Game ends when all of your Robot Tanks are
@@ -34,10 +34,9 @@ You earn one bonus Robot Tank for every enemy squadron destroyed. The maximum
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=416) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=416)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment.Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor. By default, all actions that can be performed on an Atari 2600 are available in this environment.Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -63,8 +62,7 @@ A small tank appears at the top of your screen for each enemy
destroyed. destroyed.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=416). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=416).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Robotank-v5") env = gymnasium.make("ALE/Robotank-v5")
@@ -83,7 +81,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Robotank-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Robotank-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Seaquest-v0")` | | Import | `gymnasium.make("ALE/Seaquest-v0")` |
### Description ## Description
You control a sub able to move in all directions and fire torpedoes. You control a sub able to move in all directions and fire torpedoes.
The goal is to retrieve as many divers as you The goal is to retrieve as many divers as you
can, while dodging and blasting enemy subs and killer sharks; points will be awarded accordingly. can, while dodging and blasting enemy subs and killer sharks; points will be awarded accordingly.
@@ -37,10 +37,9 @@ to surface, with less than six divers, you lose one diver as well.
Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=424) Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=424)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment.Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor. By default, all actions that can be performed on an Atari 2600 are available in this environment.Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -73,8 +72,7 @@ moment you surface. The more oxygen you have left, the more bonus points
you're given. you're given.
For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=424). For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=424).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Seaquest-v5") env = gymnasium.make("ALE/Seaquest-v5")
@@ -93,7 +91,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Seaquest-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Seaquest-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,7 +18,7 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Skiing-v0")` | | Import | `gymnasium.make("ALE/Skiing-v0")` |
### Description ## Description
You control a skier who can move sideways. You control a skier who can move sideways.
The goal is to run through all gates (between the poles) in the fastest time. The goal is to run through all gates (between the poles) in the fastest time.
@@ -29,7 +29,7 @@ and keep going. But you do lose time, so be careful!
Detailed documentation can be found on [the AtariAge page [SLALOM RACING section]](https://atariage.com/manual_html_page.php?SoftwareLabelID=434) Detailed documentation can be found on [the AtariAge page [SLALOM RACING section]](https://atariage.com/manual_html_page.php?SoftwareLabelID=434)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend number of actions (those that are meaningful in this game) are available. The reduced action space may depend
@@ -41,8 +41,7 @@ flavor looks like this:
| 0 | NOOP | | 0 | NOOP |
| 1 | RIGHT | | 1 | RIGHT |
| 2 | LEFT | | 2 | LEFT |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -64,8 +63,7 @@ via `gymnasium.make`.
Seconds are your only rewards - negative rewards and penalties (e.g. missing a gate) are assigned as additional seconds. Seconds are your only rewards - negative rewards and penalties (e.g. missing a gate) are assigned as additional seconds.
For a more detailed documentation, see [the AtariAge page [SLALOM RACING section]](https://atariage.com/manual_html_page.php?SoftwareLabelID=434). For a more detailed documentation, see [the AtariAge page [SLALOM RACING section]](https://atariage.com/manual_html_page.php?SoftwareLabelID=434).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Skiing-v5") env = gymnasium.make("ALE/Skiing-v5")
@@ -84,7 +82,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Skiing-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Skiing-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,14 +18,13 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Solaris-v5")` | | Import | `gymnasium.make("ALE/Solaris-v5")` |
### Description ## Description
You control a spaceship. Blast enemies before they can blast you. You can warp to different sectors. You have to defend Federation planets, and destroy Zylon forces. Keep track of your fuel, if you run out you lose a life. Warp to a Federation planet to refuel. The game ends if all your ships are destroyed or if you reach the Solaris planet. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=450) You control a spaceship. Blast enemies before they can blast you. You can warp to different sectors. You have to defend Federation planets, and destroy Zylon forces. Keep track of your fuel, if you run out you lose a life. Warp to a Federation planet to refuel. The game ends if all your ships are destroyed or if you reach the Solaris planet. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=450)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor. Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
- A grayscale image - A grayscale image
@@ -45,8 +44,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
You gain points for destroying enemies, rescuing cadets, making it through a corridor, destroying enemy planets etc. For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=450). You gain points for destroying enemies, rescuing cadets, making it through a corridor, destroying enemy planets etc. For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=450).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Solaris-v5") env = gymnasium.make("ALE/Solaris-v5")
@@ -65,7 +63,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Solaris-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Solaris-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,11 +18,11 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/SpaceInvaders-v5")` | | Import | `gymnasium.make("ALE/SpaceInvaders-v5")` |
### Description ## Description
Your objective is to destroy the space invaders by shooting your laser cannon at them before they reach the Earth. The game ends when all your lives are lost after taking enemy fire, or when they reach the earth. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=460) Your objective is to destroy the space invaders by shooting your laser cannon at them before they reach the Earth. The game ends when all your lives are lost after taking enemy fire, or when they reach the earth. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=460)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of `mode` and `difficulty`). The reduced action space for the default number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of `mode` and `difficulty`). The reduced action space for the default
@@ -37,8 +37,7 @@ flavor looks like this:
| 4 | RIGHTFIRE | | 4 | RIGHTFIRE |
| 5 | LEFTFIRE | | 5 | LEFTFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
- A grayscale image - A grayscale image
@@ -58,8 +57,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
You gain points for destroying space invaders. The invaders in the back rows are worth more points. For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=460). You gain points for destroying space invaders. The invaders in the back rows are worth more points. For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=460).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/SpaceInvaders-v5") env = gymnasium.make("ALE/SpaceInvaders-v5")
@@ -78,7 +76,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("SpaceInvaders-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("SpaceInvaders-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,11 +18,11 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/StarGunner-v5")` | | Import | `gymnasium.make("ALE/StarGunner-v5")` |
### Description ## Description
Stop the alien invasion by shooting down alien saucers and creatures while avoiding bombs. More details can be found on [the Atari Mania page](http://www.atarimania.com/game-atari-2600-vcs-stargunner_16921.html) Stop the alien invasion by shooting down alien saucers and creatures while avoiding bombs. More details can be found on [the Atari Mania page](http://www.atarimania.com/game-atari-2600-vcs-stargunner_16921.html)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of `mode` and `difficulty`). The reduced action space for the default number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of `mode` and `difficulty`). The reduced action space for the default
@@ -37,8 +37,7 @@ flavor looks like this:
| 4 | LEFT | | 4 | LEFT |
| 5 | DOWN | | 5 | DOWN |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
- A grayscale image - A grayscale image
@@ -59,8 +58,7 @@ via `gymnasium.make`.
You score points for destroying enemies. You get bonus points for clearing a wave and a level. For a more detailed documentation, see [the Atari Mania page](http://www.atarimania.com/game-atari-2600-vcs-stargunner_16921.html). You score points for destroying enemies. You get bonus points for clearing a wave and a level. For a more detailed documentation, see [the Atari Mania page](http://www.atarimania.com/game-atari-2600-vcs-stargunner_16921.html).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/StarGunner-v5") env = gymnasium.make("ALE/StarGunner-v5")
@@ -79,7 +77,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("StarGunner-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("StarGunner-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,17 +18,16 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Tennis-v5")` | | Import | `gymnasium.make("ALE/Tennis-v5")` |
### Description ## Description
You control the orange player playing against a computer-controlled blue player. The game follows the rules of tennis. You control the orange player playing against a computer-controlled blue player. The game follows the rules of tennis.
The first player to win at least 6 games with a margin of at least two games wins the match. If the score is tied at 6-6, the first player to go 2 games up wins the match. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=555) The first player to win at least 6 games with a margin of at least two games wins the match. If the score is tied at 6-6, the first player to go 2 games up wins the match. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=555)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor. Even if you use v0 or v4 or specify `full_action_space=False` during initialization, all actions will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
- A grayscale image - A grayscale image
@@ -48,8 +47,7 @@ via `gymnasium.make`.
### Rewards ### Rewards
The scoring is as per the sport of tennis, played till one set. For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=555). The scoring is as per the sport of tennis, played till one set. For a more detailed documentation, see [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=555).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Tennis-v5") env = gymnasium.make("ALE/Tennis-v5")
@@ -68,7 +66,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Tennis-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Tennis-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,11 +18,11 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/TimePilot-v5")` | | Import | `gymnasium.make("ALE/TimePilot-v5")` |
### Description ## Description
You control an aircraft. Use it to destroy your enemies. As you progress in the game, you encounter enemies with technology that is increasingly from the future. More details can be found on [the Atari Mania page](http://www.atarimania.com/game-atari-2600-vcs-time-pilot_8038.html) You control an aircraft. Use it to destroy your enemies. As you progress in the game, you encounter enemies with technology that is increasingly from the future. More details can be found on [the Atari Mania page](http://www.atarimania.com/game-atari-2600-vcs-time-pilot_8038.html)
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. By default, all actions that can be performed on an Atari 2600 are available in this environment.
However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced However, if you use v0 or v4 or specify `full_action_space=False` during initialization, only a reduced
number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of `mode` and `difficulty`). The reduced action space for the default number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of `mode` and `difficulty`). The reduced action space for the default
@@ -41,8 +41,7 @@ flavor looks like this:
| 8 | LEFTFIRE | | 8 | LEFTFIRE |
| 9 | DOWNFIRE | | 9 | DOWNFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
- A grayscale image - A grayscale image
@@ -64,8 +63,7 @@ via `gymnasium.make`.
You score points for destroying enemies, gaining more points for difficult enemies. For a more detailed documentation, see [the Atari Mania page](http://www.atarimania.com/game-atari-2600-vcs-time-pilot_8038.html). You score points for destroying enemies, gaining more points for difficult enemies. For a more detailed documentation, see [the Atari Mania page](http://www.atarimania.com/game-atari-2600-vcs-time-pilot_8038.html).
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/TimePilot-v5") env = gymnasium.make("ALE/TimePilot-v5")
@@ -83,7 +81,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("TimePilot-v0")`. The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("TimePilot-v0")`.
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,10 +18,10 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Tutankham-v5")` | | Import | `gymnasium.make("ALE/Tutankham-v5")` |
### Description ## Description
Your goal is to rack up points by finding treasures in the mazes of the tomb while eliminating its guardians. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_thumbs.php?SoftwareLabelID=572). Your goal is to rack up points by finding treasures in the mazes of the tomb while eliminating its guardians. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_thumbs.php?SoftwareLabelID=572).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this: By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this:
| Num | Action | | Num | Action |
@@ -34,8 +34,7 @@ By default, all actions that can be performed on an Atari 2600 are available in
| 5 | UPFIRE | | 5 | UPFIRE |
| 6 | RIGHTFIRE | | 6 | RIGHTFIRE |
| 7 | LEFTFIRE | | 7 | LEFTFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -52,8 +51,7 @@ instead. The respective observation spaces are
The general article on Atari environments outlines different ways to instantiate corresponding environments The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Tutankham-v5") env = gymnasium.make("ALE/Tutankham-v5")
@@ -70,7 +68,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Tutankham-v0")` The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Tutankham-v0")`
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,10 +18,10 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/UpNDown-v5")` | | Import | `gymnasium.make("ALE/UpNDown-v5")` |
### Description ## Description
Your goal is to steer your baja bugger to collect prizes and eliminate opponents. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=574). Your goal is to steer your baja bugger to collect prizes and eliminate opponents. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=574).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this: By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this:
| Num | Action | | Num | Action |
@@ -32,8 +32,7 @@ By default, all actions that can be performed on an Atari 2600 are available in
| 3 | DOWN | | 3 | DOWN |
| 4 | UPFIRE | | 4 | UPFIRE |
| 5 | DOWNFIRE | | 5 | DOWNFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -50,8 +49,7 @@ instead. The respective observation spaces are
The general article on Atari environments outlines different ways to instantiate corresponding environments The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/UpNDown-v5") env = gymnasium.make("ALE/UpNDown-v5")
@@ -69,7 +67,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("UpNDown-v0")` The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("UpNDown-v0")`
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,13 +18,12 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Venture-v5")` | | Import | `gymnasium.make("ALE/Venture-v5")` |
### Description ## Description
Your goal is to capture the treasure in every chamber of the dungeon while eliminating the monsters. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=576). Your goal is to capture the treasure in every chamber of the dungeon while eliminating the monsters. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=576).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. Even if you use v0 or v4 or specify full_action_space=False during initialization, all actions will be available in the default flavor. By default, all actions that can be performed on an Atari 2600 are available in this environment. Even if you use v0 or v4 or specify full_action_space=False during initialization, all actions will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -41,8 +40,7 @@ instead. The respective observation spaces are
The general article on Atari environments outlines different ways to instantiate corresponding environments The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Venture-v5") env = gymnasium.make("ALE/Venture-v5")
@@ -61,7 +59,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Venture-v0")` The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Venture-v0")`
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,10 +18,10 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/VideoPinball-v5")` | | Import | `gymnasium.make("ALE/VideoPinball-v5")` |
### Description ## Description
Your goal is to keep the ball in play as long as possible and to score as many points as possible. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=588). Your goal is to keep the ball in play as long as possible and to score as many points as possible. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=588).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this: By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this:
| Num | Action | | Num | Action |
@@ -35,8 +35,7 @@ By default, all actions that can be performed on an Atari 2600 are available in
| 6 | UPFIRE | | 6 | UPFIRE |
| 7 | RIGHTFIRE | | 7 | RIGHTFIRE |
| 8 | LEFTFIRE | | 8 | LEFTFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -53,8 +52,7 @@ instead. The respective observation spaces are
The general article on Atari environments outlines different ways to instantiate corresponding environments The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/'VideoPinball-v5") env = gymnasium.make("ALE/'VideoPinball-v5")
@@ -73,7 +71,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("VideoPinball-v0")` The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("VideoPinball-v0")`
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -18,10 +18,10 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/WizardOfWor-v5")` | | Import | `gymnasium.make("ALE/WizardOfWor-v5")` |
### Description ## Description
Your goal is to beat the Wizard using your laser and radar scanner. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=598). Your goal is to beat the Wizard using your laser and radar scanner. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=598).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this: By default, all actions that can be performed on an Atari 2600 are available in this environment. However, if you use v0 or v4 or specify full_action_space=False during initialization, only a reduced number of actions (those that are meaningful in this game) are available. The reduced action space may depend on the flavor of the environment (the combination of mode and difficulty). The reduced action space for the default flavor looks like this:
| Num | Action | | Num | Action |
@@ -36,8 +36,7 @@ By default, all actions that can be performed on an Atari 2600 are available in
| 7 | RIGHTFIRE | | 7 | RIGHTFIRE |
| 8 | LEFTFIRE | | 8 | LEFTFIRE |
| 9 | DOWNFIRE | | 9 | DOWNFIRE |
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -54,8 +53,7 @@ instead. The respective observation spaces are
The general article on Atari environments outlines different ways to instantiate corresponding environments The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/WizardOfWor-v5") env = gymnasium.make("ALE/WizardOfWor-v5")
@@ -74,7 +72,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("WizardOfWor-v0")` The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("WizardOfWor-v0")`
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -19,13 +19,12 @@ This environment is part of the <a href='..'>Atari environments</a>. Please read
| Observation Low | 0 | | Observation Low | 0 |
| Import | `gymnasium.make("ALE/Zaxxon-v5")` | | Import | `gymnasium.make("ALE/Zaxxon-v5")` |
### Description ## Description
Your goal is to stop the evil robot Zaxxon and its armies from enslaving the galaxy by piloting your fighter and shooting enemies. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=606). Your goal is to stop the evil robot Zaxxon and its armies from enslaving the galaxy by piloting your fighter and shooting enemies. Detailed documentation can be found on [the AtariAge page](https://atariage.com/manual_html_page.php?SoftwareLabelID=606).
### Actions ## Actions
By default, all actions that can be performed on an Atari 2600 are available in this environment. Even if you use v0 or v4 or specify full_action_space=False during initialization, all actions will be available in the default flavor. By default, all actions that can be performed on an Atari 2600 are available in this environment. Even if you use v0 or v4 or specify full_action_space=False during initialization, all actions will be available in the default flavor.
## Observations
### Observations
By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is By default, the environment returns the RGB image that is displayed to human players as an observation. However, it is
possible to observe possible to observe
- The 128 Bytes of RAM of the console - The 128 Bytes of RAM of the console
@@ -42,8 +41,7 @@ instead. The respective observation spaces are
The general article on Atari environments outlines different ways to instantiate corresponding environments The general article on Atari environments outlines different ways to instantiate corresponding environments
via `gymnasium.make`. via `gymnasium.make`.
## Arguments
### Arguments
``` ```
env = gymnasium.make("ALE/Zaxxon-v5") env = gymnasium.make("ALE/Zaxxon-v5")
@@ -60,7 +58,7 @@ are available. These are no longer supported in v5. In order to obtain equivalen
the general article on Atari environments. the general article on Atari environments.
The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Zaxxon-v0")` The versions v0 and v4 are not contained in the "ALE" namespace. I.e. they are instantiated via `gymnasium.make("Zaxxon-v0")`
### Version History ## Version History
A thorough discussion of the intricate differences between the versions and configurations can be found in the A thorough discussion of the intricate differences between the versions and configurations can be found in the
general article on Atari environments. general article on Atari environments.

View File

@@ -3,7 +3,7 @@ firstpage:
lastpage: lastpage:
--- ---
## Box2D # Box2D
```{toctree} ```{toctree}
:hidden: :hidden:

View File

@@ -3,7 +3,7 @@ firstpage:
lastpage: lastpage:
--- ---
## Classic Control # Classic Control
```{toctree} ```{toctree}
:hidden: :hidden:

View File

@@ -3,21 +3,22 @@ firstpage:
lastpage: lastpage:
--- ---
## MuJoCo # MuJoCo
```{toctree} ```{toctree}
:hidden: :hidden:
ant mujoco/ant
half_cheetah mujoco/half_cheetah
hopper mujoco/hopper
humanoid_standup mujoco/humanoid_standup
humanoid mujoco/humanoid
inverted_double_pendulum mujoco/inverted_double_pendulum
inverted_pendulum mujoco/inverted_pendulum
reacher mujoco/reacher
swimmer mujoco/swimmer
walker2d mujoco/pusher
mujoco/walker2d
``` ```
```{raw} html ```{raw} html

View File

@@ -3,7 +3,7 @@ firstpage:
lastpage: lastpage:
--- ---
## Toy Text # Toy Text
```{toctree} ```{toctree}
:hidden: :hidden:

View File

@@ -4,7 +4,7 @@ firstpage:
lastpage: lastpage:
--- ---
## Gymnasium is a standard API for reinforcement learning, and a diverse collection of reference environments # Gymnasium is a standard API for reinforcement learning, and a diverse collection of reference environments
```{figure} _static/videos/box2d/lunar_lander_continuous.gif ```{figure} _static/videos/box2d/lunar_lander_continuous.gif
@@ -39,7 +39,8 @@ content/basic_usage
:hidden: :hidden:
:caption: API :caption: API
api/core api/env
api/registry
api/spaces api/spaces
api/wrappers api/wrappers
api/vector api/vector
@@ -50,11 +51,11 @@ api/utils
:hidden: :hidden:
:caption: Environments :caption: Environments
environments/atari
environments/mujoco
environments/toy_text
environments/classic_control environments/classic_control
environments/box2d environments/box2d
environments/toy_text
environments/mujoco
environments/atari
environments/third_party_environments environments/third_party_environments
``` ```
@@ -73,5 +74,4 @@ content/handling_timelimits
Github <https://github.com/Farama-Foundation/Gymnasium> Github <https://github.com/Farama-Foundation/Gymnasium>
Donate <https://farama.org/donations> Donate <https://farama.org/donations>
``` ```

View File

@@ -33,29 +33,32 @@ RenderFrame = TypeVar("RenderFrame")
class Env(Generic[ObsType, ActType]): class Env(Generic[ObsType, ActType]):
r"""The main Gymnasium class. r"""The main Gymnasium class for implementing Reinforcement Learning Agents environments.
It encapsulates an environment with arbitrary behind-the-scenes dynamics. The class encapsulates an environment with arbitrary behind-the-scenes dynamics through the :meth:`step` and :meth:`reset` functions.
An environment can be partially or fully observed. An environment can be partially or fully observed by single agents. For multi-agent environments, see PettingZoo.
The main API methods that users of this class need to know are: The main API methods that users of this class need to know are:
- :meth:`step` - Takes a step in the environment using an action returning the next observation, reward, - :meth:`step` - Updates an environment with actions returning the next agent observation, the reward for taking that actions,
if the environment terminated and observation information. if the environment has terminated or truncated due to the latest action and information from the environment about the step, i.e. metrics, debug info.
- :meth:`reset` - Resets the environment to an initial state, returning the initial observation and observation information. - :meth:`reset` - Resets the environment to an initial state, required before calling step.
- :meth:`render` - Renders the environment observation with modes depending on the output Returns the first agent observation for an episode and information, i.e. metrics, debug info.
- :meth:`close` - Closes the environment, important for rendering where pygame is imported - :meth:`render` - Renders the environments to help visualise what the agent see, examples modes are "human", "rgb_array", "ansi" for text.
- :meth:`close` - Closes the environment, important when external software is used, i.e. pygame for rendering, databases
And set the following attributes: Environments have additional attributes for users to understand the implementation
- :attr:`action_space` - The Space object corresponding to valid actions - :attr:`action_space` - The Space object corresponding to valid actions, all valid actions should be contained within the space.
- :attr:`observation_space` - The Space object corresponding to valid observations - :attr:`observation_space` - The Space object corresponding to valid observations, all valid observations should be contained within the space.
- :attr:`reward_range` - A tuple corresponding to the minimum and maximum possible rewards - :attr:`reward_range` - A tuple corresponding to the minimum and maximum possible rewards for an agent over an episode.
- :attr:`spec` - An environment spec that contains the information used to initialize the environment from `gymnasium.make` The default reward range is set to :math:`(-\infty,+\infty)`.
- :attr:`metadata` - The metadata of the environment, i.e. render modes - :attr:`spec` - An environment spec that contains the information used to initialize the environment from :meth:`gymnasium.make`
- :attr:`np_random` - The random number generator for the environment - :attr:`metadata` - The metadata of the environment, i.e. render modes, render fps
- :attr:`np_random` - The random number generator for the environment. This is automatically assigned during
``super().reset(seed=seed)`` and when assessing ``self.np_random``.
Note: a default reward range set to :math:`(-\infty,+\infty)` already exists. Set it if you want a narrower range. .. seealso:: For modifying or extending environments use the :py:class:`gymnasium.Wrapper` class
""" """
# Set this in SOME subclasses # Set this in SOME subclasses
@@ -72,40 +75,41 @@ class Env(Generic[ObsType, ActType]):
# Created # Created
_np_random: Optional[np.random.Generator] = None _np_random: Optional[np.random.Generator] = None
@property
def np_random(self) -> np.random.Generator:
"""Returns the environment's internal :attr:`_np_random` that if not set will initialize with a random seed."""
if self._np_random is None:
self._np_random, seed = seeding.np_random()
return self._np_random
@np_random.setter
def np_random(self, value: np.random.Generator):
self._np_random = value
def step(self, action: ActType) -> Tuple[ObsType, float, bool, bool, dict]: def step(self, action: ActType) -> Tuple[ObsType, float, bool, bool, dict]:
"""Run one timestep of the environment's dynamics. """Run one timestep of the environment's dynamics using the agent actions.
When end of episode is reached, you are responsible for calling :meth:`reset` to reset this environment's state. When the end of an episode is reached (``terminated or truncated``), it is necessary to call :meth:`reset` to
Accepts an action and returns either a tuple `(observation, reward, terminated, truncated, info)`. reset this environment's state for the next episode.
.. versionchanged:: 0.26
The Step API was changed removing ``done`` in favor of ``terminated`` and ``truncated`` to make it clearer
to users when the environment had terminated or truncated which is critical for reinforcement learning
bootstrapping algorithms.
Args: Args:
action (ActType): an action provided by the agent action (ActType): an action provided by the agent to update the environment state.
Returns: Returns:
observation (object): this will be an element of the environment's :attr:`observation_space`. observation (ObsType): An element of the environment's :attr:`observation_space` as the next observation due to the agent actions.
This may, for instance, be a numpy array containing the positions and velocities of certain objects. An example is a numpy array containing the positions and velocities of the pole in CartPole.
reward (float): The amount of reward returned as a result of taking the action. reward (float): The reward as a result of taking the action.
terminated (bool): whether a `terminal state` (as defined under the MDP of the task) is reached. terminated (bool): Whether the agent reaches the terminal state (as defined under the MDP of the task)
In this case further step() calls could return undefined results. which can be positive or negative. An example is reaching the goal state or moving into the lava from
truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied. the Sutton and Barton, Gridworld. If true, the user needs to call :meth:`reset`.
Typically a timelimit, but could also be used to indicate an agent physically going out of bounds. truncated (bool): Whether the truncation condition outside the scope of the MDP is satisfied.
Can be used to end the episode prematurely before a `terminal state` is reached. Typically, this is a timelimit, but could also be used to indicate an agent physically going out of bounds.
info (dictionary): `info` contains auxiliary diagnostic information (helpful for debugging, learning, and logging). Can be used to end the episode prematurely before a terminal state is reached.
If true, the user needs to call :meth:`reset`.
info (dict): Contains auxiliary diagnostic information (helpful for debugging, learning, and logging).
This might, for instance, contain: metrics that describe the agent's performance state, variables that are This might, for instance, contain: metrics that describe the agent's performance state, variables that are
hidden from observations, or individual reward terms that are combined to produce the total reward. hidden from observations, or individual reward terms that are combined to produce the total reward.
It also can contain information that distinguishes truncation and termination, however, this is deprecated in favor In OpenAI Gym <v26, it contains "TimeLimit.truncated" to distinguish truncation and termination,
of returning two booleans, and will be removed in a future version. however this is deprecated in favour of returning terminated and truncated variables.
done (bool): (Deprecated) A boolean value for if the episode has ended, in which case further :meth:`step` calls will
return undefined results. This was removed in OpenAI Gym v26 in favor of terminated and truncated attributes.
A done signal may be emitted for different reasons: Maybe the task underlying the environment was solved successfully,
a certain timelimit was exceeded, or the physics simulation has entered an invalid state.
""" """
raise NotImplementedError raise NotImplementedError
@@ -115,16 +119,24 @@ class Env(Generic[ObsType, ActType]):
seed: Optional[int] = None, seed: Optional[int] = None,
options: Optional[dict] = None, options: Optional[dict] = None,
) -> Tuple[ObsType, dict]: ) -> Tuple[ObsType, dict]:
"""Resets the environment to an initial state and returns the initial observation. """Resets the environment to an initial internal state, returning an initial observation and info.
This method can reset the environment's random number generator(s) if ``seed`` is an integer or This method generates a new starting state often with some randomness to ensure that the agent explores the
if the environment has not yet initialized a random number generator. state space and learns a generalised policy about the environment. This randomness can be controlled
If the environment already has a random number generator and :meth:`reset` is called with ``seed=None``, with the ``seed`` parameter otherwise if the environment already has a random number generator and
the RNG should not be reset. Moreover, :meth:`reset` should (in the typical use case) be called with an :meth:`reset` is called with ``seed=None``, the RNG is not reset.
integer seed right after initialization and then never again.
Therefore, :meth:`reset` should (in the typical use case) be called with a seed right after initialization and then never again.
For Custom environments, the first line of :meth:`reset` should be ``super().reset(seed=seed)`` which implements
the seeding correctly.
.. versionchanged:: v0.25
The ``return_info`` parameter was removed and now info is expected to be returned.
Args: Args:
seed (optional int): The seed that is used to initialize the environment's PRNG. seed (optional int): The seed that is used to initialize the environment's PRNG (`np_random`).
If the environment does not already have a PRNG and ``seed=None`` (the default option) is passed, If the environment does not already have a PRNG and ``seed=None`` (the default option) is passed,
a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom). a seed will be chosen from some source of entropy (e.g. timestamp or /dev/urandom).
However, if the environment already has a PRNG and ``seed=None`` is passed, the PRNG will *not* be reset. However, if the environment already has a PRNG and ``seed=None`` is passed, the PRNG will *not* be reset.
@@ -134,9 +146,8 @@ class Env(Generic[ObsType, ActType]):
options (optional dict): Additional information to specify how the environment is reset (optional, options (optional dict): Additional information to specify how the environment is reset (optional,
depending on the specific environment) depending on the specific environment)
Returns: Returns:
observation (object): Observation of the initial state. This will be an element of :attr:`observation_space` observation (ObsType): Observation of the initial state. This will be an element of :attr:`observation_space`
(typically a numpy array) and is analogous to the observation returned by :meth:`step`. (typically a numpy array) and is analogous to the observation returned by :meth:`step`.
info (dictionary): This dictionary contains auxiliary information complementing ``observation``. It should be analogous to info (dictionary): This dictionary contains auxiliary information complementing ``observation``. It should be analogous to
the ``info`` returned by :meth:`step`. the ``info`` returned by :meth:`step`.
@@ -146,45 +157,76 @@ class Env(Generic[ObsType, ActType]):
self._np_random, seed = seeding.np_random(seed) self._np_random, seed = seeding.np_random(seed)
def render(self) -> Optional[Union[RenderFrame, List[RenderFrame]]]: def render(self) -> Optional[Union[RenderFrame, List[RenderFrame]]]:
"""Compute the render frames as specified by render_mode attribute during initialization of the environment. """Compute the render frames as specified by :attr:`render_mode` during the initialization of the environment.
The set of supported modes varies per environment. (And some The environment's :attr:`metadata` render modes (`env.metadata["render_modes"]`) should contain the possible
third-party environments may not support rendering at all.) ways to implement the render modes. In addition, list versions for most render modes is achieved through
By convention, if render_mode is: `gymnasium.make` which automatically applies a wrapper to collect rendered frames.
- None (default): no render is computed.
- human: render return None.
The environment is continuously rendered in the current display or terminal. Usually for human consumption.
- rgb_array: return a single frame representing the current state of the environment.
A frame is a numpy.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.
- rgb_array_list: return a list of frames representing the states of the environment since the last reset.
Each frame is a numpy.ndarray with shape (x, y, 3), as with `rgb_array`.
- ansi: Return a strings (str) or StringIO.StringIO containing a
terminal-style text representation for each time step.
The text can include newlines and ANSI escape sequences (e.g. for colors).
Note: Note:
Make sure that your class's metadata 'render_modes' key includes As the :attr:`render_mode` is known during ``__init__``, the objects used to render the environment state
the list of supported modes. It's recommended to call super() should be initialised in ``__init__``.
in implementations to use the functionality of this method.
By convention, if the :attr:`render_mode` is:
- None (default): no render is computed.
- "human": The environment is continuously rendered in the current display or terminal, usually for human consumption.
This rendering should occur during :meth:`step` and :meth:`render` doesn't need to be called. Returns ``None``.
- "rgb_array": Return a single frame representing the current state of the environment.
A frame is a ``np.ndarray`` with shape ``(x, y, 3)`` representing RGB values for an x-by-y pixel image.
- "ansi": Return a strings (``str``) or ``StringIO.StringIO`` containing a terminal-style text representation
for each time step. The text can include newlines and ANSI escape sequences (e.g. for colors).
- "rgb_array_list" and "ansi_list": List based version of render modes are possible (except Human) through the
wrapper, :py:class:`gymnasium.wrappers.RenderCollection` that is automatically applied during ``gymnasium.make(..., render_mode="rgb_array_list")``.
The frames collected are popped after :meth:`render` is called or :meth:`reset`.
Note:
Make sure that your class's :attr:`metadata` ``"render_modes"`` key includes the list of supported modes.
.. versionchanged:: 0.25.0
The render function was changed to no longer accept parameters, rather these parameters should be specified
in the environment initialised, i.e., ``gymnasium.make("CartPole-v1", render_mode="human")``
""" """
raise NotImplementedError raise NotImplementedError
def close(self): def close(self):
"""Override close in your subclass to perform any necessary cleanup.""" """After the user has finished using the environment, close contains the code necessary to "clean up" the environment.
This is critical for closing rendering windows, database or HTTP connections.
"""
pass pass
@property @property
def unwrapped(self) -> "Env": def unwrapped(self) -> "Env":
"""Returns the base non-wrapped environment. """Returns the base non-wrapped environment (i.e., removes all wrappers).
Returns: Returns:
Env: The base non-wrapped gymnasium.Env instance Env: The base non-wrapped :class:`gymnasium.Env` instance
""" """
return self return self
@property
def np_random(self) -> np.random.Generator:
"""Returns the environment's internal :attr:`_np_random` that if not set will initialise with a random seed.
Returns:
Instances of `np.random.Generator`
"""
if self._np_random is None:
self._np_random, seed = seeding.np_random()
return self._np_random
@np_random.setter
def np_random(self, value: np.random.Generator):
self._np_random = value
def __str__(self): def __str__(self):
"""Returns a string of the environment with the spec id if specified.""" """Returns a string of the environment with :attr:`spec` id's if :attr:`spec.
Returns:
A string identifying the environment
"""
if self.spec is None: if self.spec is None:
return f"<{type(self).__name__} instance>" return f"<{type(self).__name__} instance>"
else: else:
@@ -195,21 +237,67 @@ class Env(Generic[ObsType, ActType]):
return self return self
def __exit__(self, *args): def __exit__(self, *args):
"""Support with-statement for the environment.""" """Support with-statement for the environment and closes the environment."""
self.close() self.close()
# propagate exception # propagate exception
return False return False
class Wrapper(Env[ObsType, ActType]): class Wrapper(Env[ObsType, ActType]):
"""Wraps an environment to allow a modular transformation of the :meth:`step` and :meth:`reset` methods. """Wraps a :class:`gymnasium.Env` to allow a modular transformation of the :meth:`step` and :meth:`reset` methods.
This class is the base class for all wrappers. The subclass could override This class is the base class of all wrappers to change the behavior of the underlying environment allowing
some methods to change the behavior of the original environment without touching the modification to the :attr:`action_space`, :attr:`observation_space`, :attr:`reward_range` and :attr:`metadata`
original code. that doesn't change the underlying environment attributes.
In addition, for several attributes (:attr:`spec`, :attr:`render_mode`, :attr:`np_random`) will point back to the
wrapper's environment.
Wrappers are a convenient way to modify an existing environment without having to alter the underlying code directly.
Using wrappers will allow you to avoid a lot of boilerplate code and make your environment more modular. Wrappers can
also be chained to combine their effects. Most environments that are generated via `gymnasium.make` will already be wrapped by default.
In order to wrap an environment, you must first initialize a base environment. Then you can pass this environment along
with (possibly optional) parameters to the wrapper's constructor.
>>> import gymnasium as gym
>>> from gymnasium.wrappers import RescaleAction
>>> base_env = gym.make("BipedalWalker-v3")
>>> base_env.action_space
Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
>>> wrapped_env = RescaleAction(base_env, min_action=0, max_action=1)
>>> wrapped_env.action_space
Box([0. 0. 0. 0.], [1. 1. 1. 1.], (4,), float32)
You can access the environment underneath the **first** wrapper by using the :attr:`env` attribute.
As the :class:`Wrapper` class inherits from :class:`Env` then :attr:`env` can be another wrapper.
>>> wrapped_env
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
>>> wrapped_env.env
<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>
If you want to get to the environment underneath **all** of the layers of wrappers, you can use the `.unwrapped` attribute.
If the environment is already a bare environment, the `.unwrapped` attribute will just return itself.
>>> wrapped_env
<RescaleAction<TimeLimit<OrderEnforcing<BipedalWalker<BipedalWalker-v3>>>>>
>>> wrapped_env.unwrapped
<gymnasium.envs.box2d.bipedal_walker.BipedalWalker object at 0x7f87d70712d0>
There are three common things you might want a wrapper to do:
- Transform actions before applying them to the base environment
- Transform observations that are returned by the base environment
- Transform rewards that are returned by the base environment
Such wrappers can be easily implemented by inheriting from `ActionWrapper`, `ObservationWrapper`, or `RewardWrapper` and implementing the
respective transformation. If you need a wrapper to do more complicated tasks, you can inherit from the `Wrapper` class directly.
The code that is presented in the following sections can also be found in
the [gym-examples](https://github.com/Farama-Foundation/gym-examples) repository
Note: Note:
Don't forget to call ``super().__init__(env)`` if the subclass overrides :meth:`__init__`. Don't forget to call ``super().__init__(env)``
""" """
def __init__(self, env: Env): def __init__(self, env: Env):
@@ -233,7 +321,7 @@ class Wrapper(Env[ObsType, ActType]):
@property @property
def spec(self): def spec(self):
"""Returns the environment specification.""" """Returns the :attr:`Env` :attr:`spec` attribute."""
return self.env.spec return self.env.spec
@classmethod @classmethod
@@ -243,7 +331,7 @@ class Wrapper(Env[ObsType, ActType]):
@property @property
def action_space(self) -> spaces.Space[ActType]: def action_space(self) -> spaces.Space[ActType]:
"""Returns the action space of the environment.""" """Return the :attr:`Env` :attr:`action_space` unless overwritten then the wrapper :attr:`action_space` is used."""
if self._action_space is None: if self._action_space is None:
return self.env.action_space return self.env.action_space
return self._action_space return self._action_space
@@ -254,7 +342,7 @@ class Wrapper(Env[ObsType, ActType]):
@property @property
def observation_space(self) -> spaces.Space: def observation_space(self) -> spaces.Space:
"""Returns the observation space of the environment.""" """Return the :attr:`Env` :attr:`observation_space` unless overwritten then the wrapper :attr:`observation_space` is used."""
if self._observation_space is None: if self._observation_space is None:
return self.env.observation_space return self.env.observation_space
return self._observation_space return self._observation_space
@@ -265,7 +353,7 @@ class Wrapper(Env[ObsType, ActType]):
@property @property
def reward_range(self) -> Tuple[SupportsFloat, SupportsFloat]: def reward_range(self) -> Tuple[SupportsFloat, SupportsFloat]:
"""Return the reward range of the environment.""" """Return the :attr:`Env` :attr:`reward_range` unless overwritten then the wrapper :attr:`reward_range` is used."""
if self._reward_range is None: if self._reward_range is None:
return self.env.reward_range return self.env.reward_range
return self._reward_range return self._reward_range
@@ -276,7 +364,7 @@ class Wrapper(Env[ObsType, ActType]):
@property @property
def metadata(self) -> dict: def metadata(self) -> dict:
"""Returns the environment metadata.""" """Returns the :attr:`Env` :attr:`metadata`."""
if self._metadata is None: if self._metadata is None:
return self.env.metadata return self.env.metadata
return self._metadata return self._metadata
@@ -287,12 +375,12 @@ class Wrapper(Env[ObsType, ActType]):
@property @property
def render_mode(self) -> Optional[str]: def render_mode(self) -> Optional[str]:
"""Returns the environment render_mode.""" """Returns the :attr:`Env` :attr:`render_mode`."""
return self.env.render_mode return self.env.render_mode
@property @property
def np_random(self) -> np.random.Generator: def np_random(self) -> np.random.Generator:
"""Returns the environment np_random.""" """Returns the :attr:`Env` :attr:`np_random` attribute."""
return self.env.np_random return self.env.np_random
@np_random.setter @np_random.setter
@@ -306,25 +394,25 @@ class Wrapper(Env[ObsType, ActType]):
) )
def step(self, action: ActType) -> Tuple[ObsType, float, bool, bool, dict]: def step(self, action: ActType) -> Tuple[ObsType, float, bool, bool, dict]:
"""Steps through the environment with action.""" """Uses the :meth:`step` of the :attr:`env` that can be overwritten to change the returned data."""
return self.env.step(action) return self.env.step(action)
def reset(self, **kwargs) -> Tuple[ObsType, dict]: def reset(self, **kwargs) -> Tuple[ObsType, dict]:
"""Resets the environment with kwargs.""" """Uses the :meth:`reset` of the :attr:`env` that can be overwritten to change the returned data."""
return self.env.reset(**kwargs) return self.env.reset(**kwargs)
def render( def render(
self, *args, **kwargs self, *args, **kwargs
) -> Optional[Union[RenderFrame, List[RenderFrame]]]: ) -> Optional[Union[RenderFrame, List[RenderFrame]]]:
"""Renders the environment.""" """Uses the :meth:`render` of the :attr:`env` that can be overwritten to change the returned data."""
return self.env.render(*args, **kwargs) return self.env.render(*args, **kwargs)
def close(self): def close(self):
"""Closes the environment.""" """Closes the wrapper and :attr:`env`."""
return self.env.close() return self.env.close()
def __str__(self): def __str__(self):
"""Returns the wrapper name and the unwrapped environment string.""" """Returns the wrapper name and the :attr:`env` representation string."""
return f"<{type(self).__name__}{self.env}>" return f"<{type(self).__name__}{self.env}>"
def __repr__(self): def __repr__(self):
@@ -340,12 +428,11 @@ class Wrapper(Env[ObsType, ActType]):
class ObservationWrapper(Wrapper): class ObservationWrapper(Wrapper):
"""Superclass of wrappers that can modify observations using :meth:`observation` for :meth:`reset` and :meth:`step`. """Superclass of wrappers that can modify observations using :meth:`observation` for :meth:`reset` and :meth:`step`.
If you would like to apply a function to the observation that is returned by the base environment before If you would like to apply a function to only the observation before
passing it to the learning code, you can simply inherit from :class:`ObservationWrapper` and overwrite the method passing it to the learning code, you can simply inherit from :class:`ObservationWrapper` and overwrite the method
:meth:`observation` to implement that transformation. The transformation defined in that method must be :meth:`observation` to implement that transformation. The transformation defined in that method must be
defined on the base environments observation space. However, it may take values in a different space. reflected by the :attr:`env` observation space. Otherwise, you need to specify the new observation space of the
In that case, you need to specify the new observation space of the wrapper by setting :attr:`self.observation_space` wrapper by setting :attr:`self.observation_space` in the :meth:`__init__` method of your wrapper.
in the :meth:`__init__` method of your wrapper.
For example, you might have a 2D navigation task where the environment returns dictionaries as observations with For example, you might have a 2D navigation task where the environment returns dictionaries as observations with
keys ``"agent_position"`` and ``"target_position"``. A common thing to do might be to throw away some degrees of keys ``"agent_position"`` and ``"target_position"``. A common thing to do might be to throw away some degrees of
@@ -366,17 +453,24 @@ class ObservationWrapper(Wrapper):
""" """
def reset(self, **kwargs): def reset(self, **kwargs):
"""Resets the environment, returning a modified observation using :meth:`self.observation`.""" """Modifies the :attr:`env` after calling :meth:`reset`, returning a modified observation using :meth:`self.observation`."""
obs, info = self.env.reset(**kwargs) obs, info = self.env.reset(**kwargs)
return self.observation(obs), info return self.observation(obs), info
def step(self, action): def step(self, action):
"""Returns a modified observation using :meth:`self.observation` after calling :meth:`env.step`.""" """Modifies the :attr:`env` after calling :meth:`step` using :meth:`self.observation` on the returned observations."""
observation, reward, terminated, truncated, info = self.env.step(action) observation, reward, terminated, truncated, info = self.env.step(action)
return self.observation(observation), reward, terminated, truncated, info return self.observation(observation), reward, terminated, truncated, info
def observation(self, observation): def observation(self, observation):
"""Returns a modified observation.""" """Returns a modified observation.
Args:
observation: The :attr:`env` observation
Returns:
The modified observation
"""
raise NotImplementedError raise NotImplementedError
@@ -386,14 +480,14 @@ class RewardWrapper(Wrapper):
If you would like to apply a function to the reward that is returned by the base environment before If you would like to apply a function to the reward that is returned by the base environment before
passing it to learning code, you can simply inherit from :class:`RewardWrapper` and overwrite the method passing it to learning code, you can simply inherit from :class:`RewardWrapper` and overwrite the method
:meth:`reward` to implement that transformation. :meth:`reward` to implement that transformation.
This transformation might change the reward range; to specify the reward range of your wrapper, This transformation might change the :attr:`reward_range`; to specify the :attr:`reward_range` of your wrapper,
you can simply define :attr:`self.reward_range` in :meth:`__init__`. you can simply define :attr:`self.reward_range` in :meth:`__init__`.
Let us look at an example: Sometimes (especially when we do not have control over the reward Let us look at an example: Sometimes (especially when we do not have control over the reward
because it is intrinsic), we want to clip the reward to a range to gain some numerical stability. because it is intrinsic), we want to clip the reward to a range to gain some numerical stability.
To do that, we could, for instance, implement the following wrapper:: To do that, we could, for instance, implement the following wrapper::
class ClipReward(gym.RewardWrapper): class ClipReward(gymnasium.RewardWrapper):
def __init__(self, env, min_reward, max_reward): def __init__(self, env, min_reward, max_reward):
super().__init__(env) super().__init__(env)
self.min_reward = min_reward self.min_reward = min_reward
@@ -405,12 +499,19 @@ class RewardWrapper(Wrapper):
""" """
def step(self, action): def step(self, action):
"""Modifies the reward using :meth:`self.reward` after the environment :meth:`env.step`.""" """Modifies the :attr:`env` :meth:`step` reward using :meth:`self.reward`."""
observation, reward, terminated, truncated, info = self.env.step(action) observation, reward, terminated, truncated, info = self.env.step(action)
return observation, self.reward(reward), terminated, truncated, info return observation, self.reward(reward), terminated, truncated, info
def reward(self, reward): def reward(self, reward):
"""Returns a modified ``reward``.""" """Returns a modified environment ``reward``.
Args:
reward: The :attr:`env` :meth:`step` reward
Returns:
The modified `reward`
"""
raise NotImplementedError raise NotImplementedError
@@ -442,18 +543,20 @@ class ActionWrapper(Wrapper):
np.array([0,1]), np.array([0,-1])]) np.array([0,1]), np.array([0,-1])])
print(wrapped_env.action_space) #Discrete(4) print(wrapped_env.action_space) #Discrete(4)
Among others, Gymnasium provides the action wrappers :class:`ClipAction` and :class:`RescaleAction` for clipping and rescaling actions.
Among others, Gymnasium provides the action wrappers :class:`ClipAction` and :class:`RescaleAction`.
""" """
def step(self, action): def step(self, action):
"""Runs the environment :meth:`env.step` using the modified ``action`` from :meth:`self.action`.""" """Runs the :attr:`env` :meth:`env.step` using the modified ``action`` from :meth:`self.action`."""
return self.env.step(self.action(action)) return self.env.step(self.action(action))
def action(self, action): def action(self, action):
"""Returns a modified action before :meth:`env.step` is called.""" """Returns a modified action before :meth:`env.step` is called.
raise NotImplementedError
def reverse_action(self, action): Args:
"""Returns a reversed ``action``.""" action: The original :meth:`step` actions
Returns:
The modified actions
"""
raise NotImplementedError raise NotImplementedError

View File

@@ -102,7 +102,7 @@ class ContactDetector(contactListener):
class BipedalWalker(gym.Env, EzPickle): class BipedalWalker(gym.Env, EzPickle):
""" """
### Description ## Description
This is a simple 4-joint walker robot environment. This is a simple 4-joint walker robot environment.
There are two versions: There are two versions:
- Normal, with slightly uneven terrain. - Normal, with slightly uneven terrain.
@@ -117,30 +117,30 @@ class BipedalWalker(gym.Env, EzPickle):
python gymnasium/envs/box2d/bipedal_walker.py python gymnasium/envs/box2d/bipedal_walker.py
``` ```
### Action Space ## Action Space
Actions are motor speed values in the [-1, 1] range for each of the Actions are motor speed values in the [-1, 1] range for each of the
4 joints at both hips and knees. 4 joints at both hips and knees.
### Observation Space ## Observation Space
State consists of hull angle speed, angular velocity, horizontal speed, State consists of hull angle speed, angular velocity, horizontal speed,
vertical speed, position of joints and joints angular speed, legs contact vertical speed, position of joints and joints angular speed, legs contact
with ground, and 10 lidar rangefinder measurements. There are no coordinates with ground, and 10 lidar rangefinder measurements. There are no coordinates
in the state vector. in the state vector.
### Rewards ## Rewards
Reward is given for moving forward, totaling 300+ points up to the far end. Reward is given for moving forward, totaling 300+ points up to the far end.
If the robot falls, it gets -100. Applying motor torque costs a small If the robot falls, it gets -100. Applying motor torque costs a small
amount of points. A more optimal agent will get a better score. amount of points. A more optimal agent will get a better score.
### Starting State ## Starting State
The walker starts standing at the left end of the terrain with the hull The walker starts standing at the left end of the terrain with the hull
horizontal, and both legs in the same position with a slight knee angle. horizontal, and both legs in the same position with a slight knee angle.
### Episode Termination ## Episode Termination
The episode will terminate if the hull gets in contact with the ground or The episode will terminate if the hull gets in contact with the ground or
if the walker exceeds the right end of the terrain length. if the walker exceeds the right end of the terrain length.
### Arguments ## Arguments
To use to the _hardcore_ environment, you need to specify the To use to the _hardcore_ environment, you need to specify the
`hardcore=True` argument like below: `hardcore=True` argument like below:
```python ```python
@@ -148,7 +148,7 @@ class BipedalWalker(gym.Env, EzPickle):
env = gym.make("BipedalWalker-v3", hardcore=True) env = gym.make("BipedalWalker-v3", hardcore=True)
``` ```
### Version History ## Version History
- v3: Returns the closest lidar trace instead of furthest; - v3: Returns the closest lidar trace instead of furthest;
faster video recording faster video recording
- v2: Count energy spent - v2: Count energy spent
@@ -157,9 +157,9 @@ class BipedalWalker(gym.Env, EzPickle):
- v0: Initial version - v0: Initial version
<!-- ### References --> <!-- ## References -->
### Credits ## Credits
Created by Oleg Klimov Created by Oleg Klimov
""" """

View File

@@ -106,7 +106,7 @@ class FrictionDetector(contactListener):
class CarRacing(gym.Env, EzPickle): class CarRacing(gym.Env, EzPickle):
""" """
### Description ## Description
The easiest control task to learn from pixels - a top-down The easiest control task to learn from pixels - a top-down
racing environment. The generated track is random every episode. racing environment. The generated track is random every episode.
@@ -120,31 +120,31 @@ class CarRacing(gym.Env, EzPickle):
Remember: it's a powerful rear-wheel drive car - don't press the accelerator Remember: it's a powerful rear-wheel drive car - don't press the accelerator
and turn at the same time. and turn at the same time.
### Action Space ## Action Space
If continuous: If continuous:
There are 3 actions: steering (-1 is full left, +1 is full right), gas, and breaking. There are 3 actions: steering (-1 is full left, +1 is full right), gas, and breaking.
If discrete: If discrete:
There are 5 actions: do nothing, steer left, steer right, gas, brake. There are 5 actions: do nothing, steer left, steer right, gas, brake.
### Observation Space ## Observation Space
A top-down 96x96 RGB image of the car and race track. A top-down 96x96 RGB image of the car and race track.
### Rewards ## Rewards
The reward is -0.1 every frame and +1000/N for every track tile visited, The reward is -0.1 every frame and +1000/N for every track tile visited,
where N is the total number of tiles visited in the track. For example, where N is the total number of tiles visited in the track. For example,
if you have finished in 732 frames, your reward is if you have finished in 732 frames, your reward is
1000 - 0.1*732 = 926.8 points. 1000 - 0.1*732 = 926.8 points.
### Starting State ## Starting State
The car starts at rest in the center of the road. The car starts at rest in the center of the road.
### Episode Termination ## Episode Termination
The episode finishes when all the tiles are visited. The car can also go The episode finishes when all the tiles are visited. The car can also go
outside the playfield - that is, far off the track, in which case it will outside the playfield - that is, far off the track, in which case it will
receive -100 reward and die. receive -100 reward and die.
### Arguments ## Arguments
`lap_complete_percent` dictates the percentage of tiles that must be visited by `lap_complete_percent` dictates the percentage of tiles that must be visited by
the agent before a lap is considered complete. the agent before a lap is considered complete.
@@ -154,7 +154,7 @@ class CarRacing(gym.Env, EzPickle):
Passing `continuous=False` converts the environment to use discrete action space. Passing `continuous=False` converts the environment to use discrete action space.
The discrete action space has 5 actions: [do nothing, left, right, gas, brake]. The discrete action space has 5 actions: [do nothing, left, right, gas, brake].
### Reset Arguments ## Reset Arguments
Passing the option `options["randomize"] = True` will change the current colour of the environment on demand. Passing the option `options["randomize"] = True` will change the current colour of the environment on demand.
Correspondingly, passing the option `options["randomize"] = False` will not change the current colour of the environment. Correspondingly, passing the option `options["randomize"] = False` will not change the current colour of the environment.
`domain_randomize` must be `True` on init for this argument to work. `domain_randomize` must be `True` on init for this argument to work.
@@ -173,14 +173,14 @@ class CarRacing(gym.Env, EzPickle):
env.reset(options={"randomize": False}) env.reset(options={"randomize": False})
``` ```
### Version History ## Version History
- v1: Change track completion logic and add domain randomization (0.24.0) - v1: Change track completion logic and add domain randomization (0.24.0)
- v0: Original version - v0: Original version
### References ## References
- Chris Campbell (2014), http://www.iforce2d.net/b2dtut/top-down-car. - Chris Campbell (2014), http://www.iforce2d.net/b2dtut/top-down-car.
### Credits ## Credits
Created by Oleg Klimov Created by Oleg Klimov
""" """

View File

@@ -76,7 +76,7 @@ class ContactDetector(contactListener):
class LunarLander(gym.Env, EzPickle): class LunarLander(gym.Env, EzPickle):
""" """
### Description ## Description
This environment is a classic rocket trajectory optimization problem. This environment is a classic rocket trajectory optimization problem.
According to Pontryagin's maximum principle, it is optimal to fire the According to Pontryagin's maximum principle, it is optimal to fire the
engine at full throttle or turn it off. This is the reason why this engine at full throttle or turn it off. This is the reason why this
@@ -95,16 +95,16 @@ class LunarLander(gym.Env, EzPickle):
<!-- To play yourself, run: --> <!-- To play yourself, run: -->
<!-- python examples/agents/keyboard_agent.py LunarLander-v2 --> <!-- python examples/agents/keyboard_agent.py LunarLander-v2 -->
### Action Space ## Action Space
There are four discrete actions available: do nothing, fire left There are four discrete actions available: do nothing, fire left
orientation engine, fire main engine, fire right orientation engine. orientation engine, fire main engine, fire right orientation engine.
### Observation Space ## Observation Space
The state is an 8-dimensional vector: the coordinates of the lander in `x` & `y`, its linear The state is an 8-dimensional vector: the coordinates of the lander in `x` & `y`, its linear
velocities in `x` & `y`, its angle, its angular velocity, and two booleans velocities in `x` & `y`, its angle, its angular velocity, and two booleans
that represent whether each leg is in contact with the ground or not. that represent whether each leg is in contact with the ground or not.
### Rewards ## Rewards
After every step a reward is granted. The total reward of an episode is the After every step a reward is granted. The total reward of an episode is the
sum of the rewards for all the steps within that episode. sum of the rewards for all the steps within that episode.
@@ -120,11 +120,11 @@ class LunarLander(gym.Env, EzPickle):
An episode is considered a solution if it scores at least 200 points. An episode is considered a solution if it scores at least 200 points.
### Starting State ## Starting State
The lander starts at the top center of the viewport with a random initial The lander starts at the top center of the viewport with a random initial
force applied to its center of mass. force applied to its center of mass.
### Episode Termination ## Episode Termination
The episode finishes if: The episode finishes if:
1) the lander crashes (the lander body gets in contact with the moon); 1) the lander crashes (the lander body gets in contact with the moon);
2) the lander gets outside of the viewport (`x` coordinate is greater than 1); 2) the lander gets outside of the viewport (`x` coordinate is greater than 1);
@@ -137,7 +137,7 @@ class LunarLander(gym.Env, EzPickle):
> wakes up. Bodies will also wake up if a joint or contact attached to > wakes up. Bodies will also wake up if a joint or contact attached to
> them is destroyed. > them is destroyed.
### Arguments ## Arguments
To use to the _continuous_ environment, you need to specify the To use to the _continuous_ environment, you need to specify the
`continuous=True` argument like below: `continuous=True` argument like below:
```python ```python
@@ -172,16 +172,16 @@ class LunarLander(gym.Env, EzPickle):
`wind_power` dictates the maximum magnitude of linear wind applied to the craft. The recommended value for `wind_power` is between 0.0 and 20.0. `wind_power` dictates the maximum magnitude of linear wind applied to the craft. The recommended value for `wind_power` is between 0.0 and 20.0.
`turbulence_power` dictates the maximum magnitude of rotational wind applied to the craft. The recommended value for `turbulence_power` is between 0.0 and 2.0. `turbulence_power` dictates the maximum magnitude of rotational wind applied to the craft. The recommended value for `turbulence_power` is between 0.0 and 2.0.
### Version History ## Version History
- v2: Count energy spent and in v0.24, added turbulence with wind power and turbulence_power parameters - v2: Count energy spent and in v0.24, added turbulence with wind power and turbulence_power parameters
- v1: Legs contact with ground added in state vector; contact with ground - v1: Legs contact with ground added in state vector; contact with ground
give +10 reward points, and -10 if then lose contact; reward give +10 reward points, and -10 if then lose contact; reward
renormalized to 200; harder initial random push. renormalized to 200; harder initial random push.
- v0: Initial version - v0: Initial version
<!-- ### References --> <!-- ## References -->
### Credits ## Credits
Created by Oleg Klimov Created by Oleg Klimov
""" """

View File

@@ -26,7 +26,7 @@ __author__ = "Christoph Dann <cdann@cdann.de>"
class AcrobotEnv(Env): class AcrobotEnv(Env):
""" """
### Description ## Description
The Acrobot environment is based on Sutton's work in The Acrobot environment is based on Sutton's work in
["Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding"](https://papers.nips.cc/paper/1995/hash/8f1d43620bc6bb580df6e80b0dc05c48-Abstract.html) ["Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding"](https://papers.nips.cc/paper/1995/hash/8f1d43620bc6bb580df6e80b0dc05c48-Abstract.html)
@@ -41,7 +41,7 @@ class AcrobotEnv(Env):
to reach the target height (black horizontal line above system) by applying torque on to reach the target height (black horizontal line above system) by applying torque on
the actuator. the actuator.
### Action Space ## Action Space
The action is discrete, deterministic, and represents the torque applied on the actuated The action is discrete, deterministic, and represents the torque applied on the actuated
joint between the two links. joint between the two links.
@@ -52,7 +52,7 @@ class AcrobotEnv(Env):
| 1 | apply 0 torque to the actuated joint | torque (N m) | | 1 | apply 0 torque to the actuated joint | torque (N m) |
| 2 | apply 1 torque to the actuated joint | torque (N m) | | 2 | apply 1 torque to the actuated joint | torque (N m) |
### Observation Space ## Observation Space
The observation is a `ndarray` with shape `(6,)` that provides information about the The observation is a `ndarray` with shape `(6,)` that provides information about the
two rotational joint angles as well as their angular velocities: two rotational joint angles as well as their angular velocities:
@@ -75,25 +75,25 @@ class AcrobotEnv(Env):
The angular velocities of `theta1` and `theta2` are bounded at ±4π, and ±9π rad/s respectively. The angular velocities of `theta1` and `theta2` are bounded at ±4π, and ±9π rad/s respectively.
A state of `[1, 0, 1, 0, ..., ...]` indicates that both links are pointing downwards. A state of `[1, 0, 1, 0, ..., ...]` indicates that both links are pointing downwards.
### Rewards ## Rewards
The goal is to have the free end reach a designated target height in as few steps as possible, The goal is to have the free end reach a designated target height in as few steps as possible,
and as such all steps that do not reach the goal incur a reward of -1. and as such all steps that do not reach the goal incur a reward of -1.
Achieving the target height results in termination with a reward of 0. The reward threshold is -100. Achieving the target height results in termination with a reward of 0. The reward threshold is -100.
### Starting State ## Starting State
Each parameter in the underlying state (`theta1`, `theta2`, and the two angular velocities) is initialized Each parameter in the underlying state (`theta1`, `theta2`, and the two angular velocities) is initialized
uniformly between -0.1 and 0.1. This means both links are pointing downwards with some initial stochasticity. uniformly between -0.1 and 0.1. This means both links are pointing downwards with some initial stochasticity.
### Episode End ## Episode End
The episode ends if one of the following occurs: The episode ends if one of the following occurs:
1. Termination: The free end reaches the target height, which is constructed as: 1. Termination: The free end reaches the target height, which is constructed as:
`-cos(theta1) - cos(theta2 + theta1) > 1.0` `-cos(theta1) - cos(theta2 + theta1) > 1.0`
2. Truncation: Episode length is greater than 500 (200 for v0) 2. Truncation: Episode length is greater than 500 (200 for v0)
### Arguments ## Arguments
No additional arguments are currently supported during construction. No additional arguments are currently supported during construction.
@@ -125,14 +125,14 @@ class AcrobotEnv(Env):
by setting `book_or_nips = 'nips'` by setting `book_or_nips = 'nips'`
### Version History ## Version History
- v1: Maximum number of steps increased from 200 to 500. The observation space for v0 provided direct readings of - v1: Maximum number of steps increased from 200 to 500. The observation space for v0 provided direct readings of
`theta1` and `theta2` in radians, having a range of `[-pi, pi]`. The v1 observation space as described here provides the `theta1` and `theta2` in radians, having a range of `[-pi, pi]`. The v1 observation space as described here provides the
sine and cosine of each angle instead. sine and cosine of each angle instead.
- v0: Initial versions release (1.0.0) (removed from gymnasium for v1) - v0: Initial versions release (1.0.0) (removed from gymnasium for v1)
### References ## References
- Sutton, R. S. (1996). Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. - Sutton, R. S. (1996). Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding.
In D. Touretzky, M. C. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8). In D. Touretzky, M. C. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8).
MIT Press. https://proceedings.neurips.cc/paper/1995/file/8f1d43620bc6bb580df6e80b0dc05c48-Paper.pdf MIT Press. https://proceedings.neurips.cc/paper/1995/file/8f1d43620bc6bb580df6e80b0dc05c48-Paper.pdf

View File

@@ -16,7 +16,7 @@ from gymnasium.error import DependencyNotInstalled
class CartPoleEnv(gym.Env[np.ndarray, Union[int, np.ndarray]]): class CartPoleEnv(gym.Env[np.ndarray, Union[int, np.ndarray]]):
""" """
### Description ## Description
This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson in This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson in
["Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem"](https://ieeexplore.ieee.org/document/6313077). ["Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem"](https://ieeexplore.ieee.org/document/6313077).
@@ -24,7 +24,7 @@ class CartPoleEnv(gym.Env[np.ndarray, Union[int, np.ndarray]]):
The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces
in the left and right direction on the cart. in the left and right direction on the cart.
### Action Space ## Action Space
The action is a `ndarray` with shape `(1,)` which can take values `{0, 1}` indicating the direction The action is a `ndarray` with shape `(1,)` which can take values `{0, 1}` indicating the direction
of the fixed force the cart is pushed with. of the fixed force the cart is pushed with.
@@ -37,7 +37,7 @@ class CartPoleEnv(gym.Env[np.ndarray, Union[int, np.ndarray]]):
**Note**: The velocity that is reduced or increased by the applied force is not fixed and it depends on the angle **Note**: The velocity that is reduced or increased by the applied force is not fixed and it depends on the angle
the pole is pointing. The center of gravity of the pole varies the amount of energy needed to move the cart underneath it the pole is pointing. The center of gravity of the pole varies the amount of energy needed to move the cart underneath it
### Observation Space ## Observation Space
The observation is a `ndarray` with shape `(4,)` with the values corresponding to the following positions and velocities: The observation is a `ndarray` with shape `(4,)` with the values corresponding to the following positions and velocities:
@@ -55,16 +55,16 @@ class CartPoleEnv(gym.Env[np.ndarray, Union[int, np.ndarray]]):
- The pole angle can be observed between `(-.418, .418)` radians (or **±24°**), but the episode terminates - The pole angle can be observed between `(-.418, .418)` radians (or **±24°**), but the episode terminates
if the pole angle is not in the range `(-.2095, .2095)` (or **±12°**) if the pole angle is not in the range `(-.2095, .2095)` (or **±12°**)
### Rewards ## Rewards
Since the goal is to keep the pole upright for as long as possible, a reward of `+1` for every step taken, Since the goal is to keep the pole upright for as long as possible, a reward of `+1` for every step taken,
including the termination step, is allotted. The threshold for rewards is 475 for v1. including the termination step, is allotted. The threshold for rewards is 475 for v1.
### Starting State ## Starting State
All observations are assigned a uniformly random value in `(-0.05, 0.05)` All observations are assigned a uniformly random value in `(-0.05, 0.05)`
### Episode End ## Episode End
The episode ends if any one of the following occurs: The episode ends if any one of the following occurs:
@@ -72,7 +72,7 @@ class CartPoleEnv(gym.Env[np.ndarray, Union[int, np.ndarray]]):
2. Termination: Cart Position is greater than ±2.4 (center of the cart reaches the edge of the display) 2. Termination: Cart Position is greater than ±2.4 (center of the cart reaches the edge of the display)
3. Truncation: Episode length is greater than 500 (200 for v0) 3. Truncation: Episode length is greater than 500 (200 for v0)
### Arguments ## Arguments
```python ```python
import gymnasium as gym import gymnasium as gym

View File

@@ -26,7 +26,7 @@ from gymnasium.error import DependencyNotInstalled
class Continuous_MountainCarEnv(gym.Env): class Continuous_MountainCarEnv(gym.Env):
""" """
### Description ## Description
The Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically The Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically
at the bottom of a sinusoidal valley, with the only possible actions being the accelerations at the bottom of a sinusoidal valley, with the only possible actions being the accelerations
@@ -46,7 +46,7 @@ class Continuous_MountainCarEnv(gym.Env):
} }
``` ```
### Observation Space ## Observation Space
The observation is a `ndarray` with shape `(2,)` where the elements correspond to the following: The observation is a `ndarray` with shape `(2,)` where the elements correspond to the following:
@@ -55,12 +55,12 @@ class Continuous_MountainCarEnv(gym.Env):
| 0 | position of the car along the x-axis | -Inf | Inf | position (m) | | 0 | position of the car along the x-axis | -Inf | Inf | position (m) |
| 1 | velocity of the car | -Inf | Inf | position (m) | | 1 | velocity of the car | -Inf | Inf | position (m) |
### Action Space ## Action Space
The action is a `ndarray` with shape `(1,)`, representing the directional force applied on the car. The action is a `ndarray` with shape `(1,)`, representing the directional force applied on the car.
The action is clipped in the range `[-1,1]` and multiplied by a power of 0.0015. The action is clipped in the range `[-1,1]` and multiplied by a power of 0.0015.
### Transition Dynamics: ## Transition Dynamics:
Given an action, the mountain car follows the following transition dynamics: Given an action, the mountain car follows the following transition dynamics:
@@ -72,24 +72,24 @@ class Continuous_MountainCarEnv(gym.Env):
The collisions at either end are inelastic with the velocity set to 0 upon collision with the wall. The collisions at either end are inelastic with the velocity set to 0 upon collision with the wall.
The position is clipped to the range [-1.2, 0.6] and velocity is clipped to the range [-0.07, 0.07]. The position is clipped to the range [-1.2, 0.6] and velocity is clipped to the range [-0.07, 0.07].
### Reward ## Reward
A negative reward of *-0.1 * action<sup>2</sup>* is received at each timestep to penalise for A negative reward of *-0.1 * action<sup>2</sup>* is received at each timestep to penalise for
taking actions of large magnitude. If the mountain car reaches the goal then a positive reward of +100 taking actions of large magnitude. If the mountain car reaches the goal then a positive reward of +100
is added to the negative reward for that timestep. is added to the negative reward for that timestep.
### Starting State ## Starting State
The position of the car is assigned a uniform random value in `[-0.6 , -0.4]`. The position of the car is assigned a uniform random value in `[-0.6 , -0.4]`.
The starting velocity of the car is always assigned to 0. The starting velocity of the car is always assigned to 0.
### Episode End ## Episode End
The episode ends if either of the following happens: The episode ends if either of the following happens:
1. Termination: The position of the car is greater than or equal to 0.45 (the goal position on top of the right hill) 1. Termination: The position of the car is greater than or equal to 0.45 (the goal position on top of the right hill)
2. Truncation: The length of the episode is 999. 2. Truncation: The length of the episode is 999.
### Arguments ## Arguments
```python ```python
import gymnasium as gym import gymnasium as gym
@@ -99,7 +99,7 @@ class Continuous_MountainCarEnv(gym.Env):
On reset, the `options` parameter allows the user to change the bounds used to determine On reset, the `options` parameter allows the user to change the bounds used to determine
the new random state. the new random state.
### Version History ## Version History
* v0: Initial versions release (1.0.0) * v0: Initial versions release (1.0.0)
""" """

View File

@@ -15,7 +15,7 @@ from gymnasium.error import DependencyNotInstalled
class MountainCarEnv(gym.Env): class MountainCarEnv(gym.Env):
""" """
### Description ## Description
The Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically The Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically
at the bottom of a sinusoidal valley, with the only possible actions being the accelerations at the bottom of a sinusoidal valley, with the only possible actions being the accelerations
@@ -35,7 +35,7 @@ class MountainCarEnv(gym.Env):
} }
``` ```
### Observation Space ## Observation Space
The observation is a `ndarray` with shape `(2,)` where the elements correspond to the following: The observation is a `ndarray` with shape `(2,)` where the elements correspond to the following:
@@ -44,7 +44,7 @@ class MountainCarEnv(gym.Env):
| 0 | position of the car along the x-axis | -Inf | Inf | position (m) | | 0 | position of the car along the x-axis | -Inf | Inf | position (m) |
| 1 | velocity of the car | -Inf | Inf | position (m) | | 1 | velocity of the car | -Inf | Inf | position (m) |
### Action Space ## Action Space
There are 3 discrete deterministic actions: There are 3 discrete deterministic actions:
@@ -54,7 +54,7 @@ class MountainCarEnv(gym.Env):
| 1 | Don't accelerate | Inf | position (m) | | 1 | Don't accelerate | Inf | position (m) |
| 2 | Accelerate to the right | Inf | position (m) | | 2 | Accelerate to the right | Inf | position (m) |
### Transition Dynamics: ## Transition Dynamics:
Given an action, the mountain car follows the following transition dynamics: Given an action, the mountain car follows the following transition dynamics:
@@ -66,24 +66,24 @@ class MountainCarEnv(gym.Env):
upon collision with the wall. The position is clipped to the range `[-1.2, 0.6]` and upon collision with the wall. The position is clipped to the range `[-1.2, 0.6]` and
velocity is clipped to the range `[-0.07, 0.07]`. velocity is clipped to the range `[-0.07, 0.07]`.
### Reward: ## Reward:
The goal is to reach the flag placed on top of the right hill as quickly as possible, as such the agent is The goal is to reach the flag placed on top of the right hill as quickly as possible, as such the agent is
penalised with a reward of -1 for each timestep. penalised with a reward of -1 for each timestep.
### Starting State ## Starting State
The position of the car is assigned a uniform random value in *[-0.6 , -0.4]*. The position of the car is assigned a uniform random value in *[-0.6 , -0.4]*.
The starting velocity of the car is always assigned to 0. The starting velocity of the car is always assigned to 0.
### Episode End ## Episode End
The episode ends if either of the following happens: The episode ends if either of the following happens:
1. Termination: The position of the car is greater than or equal to 0.5 (the goal position on top of the right hill) 1. Termination: The position of the car is greater than or equal to 0.5 (the goal position on top of the right hill)
2. Truncation: The length of the episode is 200. 2. Truncation: The length of the episode is 200.
### Arguments ## Arguments
```python ```python
import gymnasium as gym import gymnasium as gym
@@ -93,7 +93,7 @@ class MountainCarEnv(gym.Env):
On reset, the `options` parameter allows the user to change the bounds used to determine On reset, the `options` parameter allows the user to change the bounds used to determine
the new random state. the new random state.
### Version History ## Version History
* v0: Initial versions release (1.0.0) * v0: Initial versions release (1.0.0)
""" """

View File

@@ -16,7 +16,7 @@ DEFAULT_Y = 1.0
class PendulumEnv(gym.Env): class PendulumEnv(gym.Env):
""" """
### Description ## Description
The inverted pendulum swingup problem is based on the classic problem in control theory. The inverted pendulum swingup problem is based on the classic problem in control theory.
The system consists of a pendulum attached at one end to a fixed point, and the other end being free. The system consists of a pendulum attached at one end to a fixed point, and the other end being free.
@@ -26,13 +26,13 @@ class PendulumEnv(gym.Env):
The diagram below specifies the coordinate system used for the implementation of the pendulum's The diagram below specifies the coordinate system used for the implementation of the pendulum's
dynamic equations. dynamic equations.
![Pendulum Coordinate System](./diagrams/pendulum.png) ![Pendulum Coordinate System](/_static/diagrams/pendulum.png)
- `x-y`: cartesian coordinates of the pendulum's end in meters. - `x-y`: cartesian coordinates of the pendulum's end in meters.
- `theta` : angle in radians. - `theta` : angle in radians.
- `tau`: torque in `N m`. Defined as positive _counter-clockwise_. - `tau`: torque in `N m`. Defined as positive _counter-clockwise_.
### Action Space ## Action Space
The action is a `ndarray` with shape `(1,)` representing the torque applied to free end of the pendulum. The action is a `ndarray` with shape `(1,)` representing the torque applied to free end of the pendulum.
@@ -41,7 +41,7 @@ class PendulumEnv(gym.Env):
| 0 | Torque | -2.0 | 2.0 | | 0 | Torque | -2.0 | 2.0 |
### Observation Space ## Observation Space
The observation is a `ndarray` with shape `(3,)` representing the x-y coordinates of the pendulum's free The observation is a `ndarray` with shape `(3,)` representing the x-y coordinates of the pendulum's free
end and its angular velocity. end and its angular velocity.
@@ -52,7 +52,7 @@ class PendulumEnv(gym.Env):
| 1 | y = sin(theta) | -1.0 | 1.0 | | 1 | y = sin(theta) | -1.0 | 1.0 |
| 2 | Angular Velocity | -8.0 | 8.0 | | 2 | Angular Velocity | -8.0 | 8.0 |
### Rewards ## Rewards
The reward function is defined as: The reward function is defined as:
@@ -63,15 +63,15 @@ class PendulumEnv(gym.Env):
*-(pi<sup>2</sup> + 0.1 * 8<sup>2</sup> + 0.001 * 2<sup>2</sup>) = -16.2736044*, *-(pi<sup>2</sup> + 0.1 * 8<sup>2</sup> + 0.001 * 2<sup>2</sup>) = -16.2736044*,
while the maximum reward is zero (pendulum is upright with zero velocity and no torque applied). while the maximum reward is zero (pendulum is upright with zero velocity and no torque applied).
### Starting State ## Starting State
The starting state is a random angle in *[-pi, pi]* and a random angular velocity in *[-1,1]*. The starting state is a random angle in *[-pi, pi]* and a random angular velocity in *[-1,1]*.
### Episode Truncation ## Episode Truncation
The episode truncates at 200 time steps. The episode truncates at 200 time steps.
### Arguments ## Arguments
- `g`: acceleration of gravity measured in *(m s<sup>-2</sup>)* used to calculate the pendulum dynamics. - `g`: acceleration of gravity measured in *(m s<sup>-2</sup>)* used to calculate the pendulum dynamics.
The default value is g = 10.0 . The default value is g = 10.0 .
@@ -84,7 +84,7 @@ class PendulumEnv(gym.Env):
On reset, the `options` parameter allows the user to change the bounds used to determine On reset, the `options` parameter allows the user to change the bounds used to determine
the new random state. the new random state.
### Version History ## Version History
* v1: Simplify the math equations, no difference in behavior. * v1: Simplify the math equations, no difference in behavior.
* v0: Initial versions release (1.0.0) * v0: Initial versions release (1.0.0)

View File

@@ -11,7 +11,7 @@ DEFAULT_CAMERA_CONFIG = {
class AntEnv(MujocoEnv, utils.EzPickle): class AntEnv(MujocoEnv, utils.EzPickle):
""" """
### Description ## Description
This environment is based on the environment introduced by Schulman, This environment is based on the environment introduced by Schulman,
Moritz, Levine, Jordan and Abbeel in ["High-Dimensional Continuous Control Moritz, Levine, Jordan and Abbeel in ["High-Dimensional Continuous Control
@@ -22,7 +22,7 @@ class AntEnv(MujocoEnv, utils.EzPickle):
torques on the eight hinges connecting the two links of each leg and the torso torques on the eight hinges connecting the two links of each leg and the torso
(nine parts and eight hinges). (nine parts and eight hinges).
### Action Space ## Action Space
The action space is a `Box(-1, 1, (8,), float32)`. An action represents the torques applied at the hinge joints. The action space is a `Box(-1, 1, (8,), float32)`. An action represents the torques applied at the hinge joints.
| Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit | | Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit |
@@ -36,7 +36,7 @@ class AntEnv(MujocoEnv, utils.EzPickle):
| 6 | Torque applied on the rotor between the torso and back right hip | -1 | 1 | hip_4 (right_back_leg) | hinge | torque (N m) | | 6 | Torque applied on the rotor between the torso and back right hip | -1 | 1 | hip_4 (right_back_leg) | hinge | torque (N m) |
| 7 | Torque applied on the rotor between the back right two links | -1 | 1 | angle_4 (right_back_leg) | hinge | torque (N m) | | 7 | Torque applied on the rotor between the back right two links | -1 | 1 | angle_4 (right_back_leg) | hinge | torque (N m) |
### Observation Space ## Observation Space
Observations consist of positional values of different body parts of the ant, Observations consist of positional values of different body parts of the ant,
followed by the velocities of those individual parts (their derivatives) with all followed by the velocities of those individual parts (their derivatives) with all
@@ -98,7 +98,7 @@ class AntEnv(MujocoEnv, utils.EzPickle):
when using the Ant environment if you would like to report results with contact forces (if when using the Ant environment if you would like to report results with contact forces (if
contact forces are not used in your experiments, you can use version > 2.0). contact forces are not used in your experiments, you can use version > 2.0).
### Rewards ## Rewards
The reward consists of three parts: The reward consists of three parts:
- *healthy_reward*: Every timestep that the ant is healthy (see definition in section "Episode Termination"), it gets a reward of fixed value `healthy_reward` - *healthy_reward*: Every timestep that the ant is healthy (see definition in section "Episode Termination"), it gets a reward of fixed value `healthy_reward`
- *forward_reward*: A reward of moving forward which is measured as - *forward_reward*: A reward of moving forward which is measured as
@@ -115,7 +115,7 @@ class AntEnv(MujocoEnv, utils.EzPickle):
The total reward returned is ***reward*** *=* *healthy_reward + forward_reward - ctrl_cost - contact_cost* and `info` will also contain the individual reward terms. The total reward returned is ***reward*** *=* *healthy_reward + forward_reward - ctrl_cost - contact_cost* and `info` will also contain the individual reward terms.
### Starting State ## Starting State
All observations start in state All observations start in state
(0.0, 0.0, 0.75, 1.0, 0.0 ... 0.0) with a uniform noise in the range (0.0, 0.0, 0.75, 1.0, 0.0 ... 0.0) with a uniform noise in the range
of [-`reset_noise_scale`, `reset_noise_scale`] added to the positional values and standard normal noise of [-`reset_noise_scale`, `reset_noise_scale`] added to the positional values and standard normal noise
@@ -124,7 +124,7 @@ class AntEnv(MujocoEnv, utils.EzPickle):
to be slightly high, thereby indicating a standing up ant. The initial orientation to be slightly high, thereby indicating a standing up ant. The initial orientation
is designed to make it face forward as well. is designed to make it face forward as well.
### Episode End ## Episode End
The ant is said to be unhealthy if any of the following happens: The ant is said to be unhealthy if any of the following happens:
1. Any of the state space values is no longer finite 1. Any of the state space values is no longer finite
@@ -138,7 +138,7 @@ class AntEnv(MujocoEnv, utils.EzPickle):
If `terminate_when_unhealthy=False` is passed, the episode is ended only when 1000 timesteps are exceeded. If `terminate_when_unhealthy=False` is passed, the episode is ended only when 1000 timesteps are exceeded.
### Arguments ## Arguments
No additional arguments are currently supported in v2 and lower. No additional arguments are currently supported in v2 and lower.
@@ -166,7 +166,7 @@ class AntEnv(MujocoEnv, utils.EzPickle):
| `reset_noise_scale` | **float** | `0.1` | Scale of random perturbations of initial position and velocity (see section on Starting State) | | `reset_noise_scale` | **float** | `0.1` | Scale of random perturbations of initial position and velocity (see section on Starting State) |
| `exclude_current_positions_from_observation`| **bool** | `True`| Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies | | `exclude_current_positions_from_observation`| **bool** | `True`| Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies |
### Version History ## Version History
* v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3 * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
* v3: support for `gymnasium.make` kwargs such as `xml_file`, `ctrl_cost_weight`, `reset_noise_scale`, etc. rgb rendering comes from tracking camera (so agent does not run away from screen) * v3: support for `gymnasium.make` kwargs such as `xml_file`, `ctrl_cost_weight`, `reset_noise_scale`, etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
* v2: All continuous control environments now use mujoco_py >= 1.50 * v2: All continuous control environments now use mujoco_py >= 1.50

View File

@@ -13,7 +13,7 @@ DEFAULT_CAMERA_CONFIG = {
class HalfCheetahEnv(MujocoEnv, utils.EzPickle): class HalfCheetahEnv(MujocoEnv, utils.EzPickle):
""" """
### Description ## Description
This environment is based on the work by P. Wawrzyński in This environment is based on the work by P. Wawrzyński in
["A Cat-Like Robot Real-Time Learning to Run"](http://staff.elka.pw.edu.pl/~pwawrzyn/pub-s/0812_LSCLRR.pdf). ["A Cat-Like Robot Real-Time Learning to Run"](http://staff.elka.pw.edu.pl/~pwawrzyn/pub-s/0812_LSCLRR.pdf).
@@ -26,7 +26,7 @@ class HalfCheetahEnv(MujocoEnv, utils.EzPickle):
over the front and back thighs (connecting to the torso), shins over the front and back thighs (connecting to the torso), shins
(connecting to the thighs) and feet (connecting to the shins). (connecting to the thighs) and feet (connecting to the shins).
### Action Space ## Action Space
The action space is a `Box(-1, 1, (6,), float32)`. An action represents the torques applied between *links*. The action space is a `Box(-1, 1, (6,), float32)`. An action represents the torques applied between *links*.
| Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit | | Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit |
@@ -39,7 +39,7 @@ class HalfCheetahEnv(MujocoEnv, utils.EzPickle):
| 5 | Torque applied on the front foot rotor | -1 | 1 | ffoot | hinge | torque (N m) | | 5 | Torque applied on the front foot rotor | -1 | 1 | ffoot | hinge | torque (N m) |
### Observation Space ## Observation Space
Observations consist of positional values of different body parts of the Observations consist of positional values of different body parts of the
cheetah, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. cheetah, followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities.
@@ -74,7 +74,7 @@ class HalfCheetahEnv(MujocoEnv, utils.EzPickle):
| 15 | angular velocity of front tip | -Inf | Inf | fshin | hinge | angular velocity (rad/s) | | 15 | angular velocity of front tip | -Inf | Inf | fshin | hinge | angular velocity (rad/s) |
| 16 | angular velocity of second rotor | -Inf | Inf | ffoot | hinge | angular velocity (rad/s) | | 16 | angular velocity of second rotor | -Inf | Inf | ffoot | hinge | angular velocity (rad/s) |
### Rewards ## Rewards
The reward consists of two parts: The reward consists of two parts:
- *forward_reward*: A reward of moving forward which is measured - *forward_reward*: A reward of moving forward which is measured
as *`forward_reward_weight` * (x-coordinate before action - x-coordinate after action)/dt*. *dt* is as *`forward_reward_weight` * (x-coordinate before action - x-coordinate after action)/dt*. *dt* is
@@ -89,7 +89,7 @@ class HalfCheetahEnv(MujocoEnv, utils.EzPickle):
The total reward returned is ***reward*** *=* *forward_reward - ctrl_cost* and `info` will also contain the individual reward terms The total reward returned is ***reward*** *=* *forward_reward - ctrl_cost* and `info` will also contain the individual reward terms
### Starting State ## Starting State
All observations start in state (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, All observations start in state (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,) with a noise added to the 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,) with a noise added to the
initial state for stochasticity. As seen before, the first 8 values in the initial state for stochasticity. As seen before, the first 8 values in the
@@ -98,10 +98,10 @@ class HalfCheetahEnv(MujocoEnv, utils.EzPickle):
normal noise with a mean of 0 and standard deviation of `reset_noise_scale` is added to the normal noise with a mean of 0 and standard deviation of `reset_noise_scale` is added to the
initial velocity values of all zeros. initial velocity values of all zeros.
### Episode End ## Episode End
The episode truncates when the episode length is greater than 1000. The episode truncates when the episode length is greater than 1000.
### Arguments ## Arguments
No additional arguments are currently supported in v2 and lower. No additional arguments are currently supported in v2 and lower.
@@ -125,7 +125,7 @@ class HalfCheetahEnv(MujocoEnv, utils.EzPickle):
| `reset_noise_scale` | **float** | `0.1` | Scale of random perturbations of initial position and velocity (see section on Starting State) | | `reset_noise_scale` | **float** | `0.1` | Scale of random perturbations of initial position and velocity (see section on Starting State) |
| `exclude_current_positions_from_observation` | **bool** | `True` | Whether or not to omit the x-coordinate from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies | | `exclude_current_positions_from_observation` | **bool** | `True` | Whether or not to omit the x-coordinate from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies |
### Version History ## Version History
* v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3 * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
* v3: support for `gymnasium.make` kwargs such as `xml_file`, `ctrl_cost_weight`, `reset_noise_scale`, etc. rgb rendering comes from tracking camera (so agent does not run away from screen) * v3: support for `gymnasium.make` kwargs such as `xml_file`, `ctrl_cost_weight`, `reset_noise_scale`, etc. rgb rendering comes from tracking camera (so agent does not run away from screen)

View File

@@ -14,7 +14,7 @@ DEFAULT_CAMERA_CONFIG = {
class HopperEnv(MujocoEnv, utils.EzPickle): class HopperEnv(MujocoEnv, utils.EzPickle):
""" """
### Description ## Description
This environment is based on the work done by Erez, Tassa, and Todorov in This environment is based on the work done by Erez, Tassa, and Todorov in
["Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks"](http://www.roboticsproceedings.org/rss07/p10.pdf). The environment aims to ["Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks"](http://www.roboticsproceedings.org/rss07/p10.pdf). The environment aims to
@@ -26,7 +26,7 @@ class HopperEnv(MujocoEnv, utils.EzPickle):
forward (right) direction by applying torques on the three hinges forward (right) direction by applying torques on the three hinges
connecting the four body parts. connecting the four body parts.
### Action Space ## Action Space
The action space is a `Box(-1, 1, (3,), float32)`. An action represents the torques applied between *links* The action space is a `Box(-1, 1, (3,), float32)`. An action represents the torques applied between *links*
| Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit | | Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit |
@@ -35,7 +35,7 @@ class HopperEnv(MujocoEnv, utils.EzPickle):
| 1 | Torque applied on the leg rotor | -1 | 1 | leg_joint | hinge | torque (N m) | | 1 | Torque applied on the leg rotor | -1 | 1 | leg_joint | hinge | torque (N m) |
| 3 | Torque applied on the foot rotor | -1 | 1 | foot_joint | hinge | torque (N m) | | 3 | Torque applied on the foot rotor | -1 | 1 | foot_joint | hinge | torque (N m) |
### Observation Space ## Observation Space
Observations consist of positional values of different body parts of the Observations consist of positional values of different body parts of the
hopper, followed by the velocities of those individual parts hopper, followed by the velocities of those individual parts
@@ -66,7 +66,7 @@ class HopperEnv(MujocoEnv, utils.EzPickle):
| 10 | angular velocity of the foot hinge | -Inf | Inf | foot_joint | hinge | angular velocity (rad/s) | | 10 | angular velocity of the foot hinge | -Inf | Inf | foot_joint | hinge | angular velocity (rad/s) |
### Rewards ## Rewards
The reward consists of three parts: The reward consists of three parts:
- *healthy_reward*: Every timestep that the hopper is healthy (see definition in section "Episode Termination"), it gets a reward of fixed value `healthy_reward`. - *healthy_reward*: Every timestep that the hopper is healthy (see definition in section "Episode Termination"), it gets a reward of fixed value `healthy_reward`.
- *forward_reward*: A reward of hopping forward which is measured - *forward_reward*: A reward of hopping forward which is measured
@@ -82,12 +82,12 @@ class HopperEnv(MujocoEnv, utils.EzPickle):
The total reward returned is ***reward*** *=* *healthy_reward + forward_reward - ctrl_cost* and `info` will also contain the individual reward terms The total reward returned is ***reward*** *=* *healthy_reward + forward_reward - ctrl_cost* and `info` will also contain the individual reward terms
### Starting State ## Starting State
All observations start in state All observations start in state
(0.0, 1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform noise (0.0, 1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform noise
in the range of [-`reset_noise_scale`, `reset_noise_scale`] added to the values for stochasticity. in the range of [-`reset_noise_scale`, `reset_noise_scale`] added to the values for stochasticity.
### Episode End ## Episode End
The hopper is said to be unhealthy if any of the following happens: The hopper is said to be unhealthy if any of the following happens:
1. An element of `observation[1:]` (if `exclude_current_positions_from_observation=True`, else `observation[2:]`) is no longer contained in the closed interval specified by the argument `healthy_state_range` 1. An element of `observation[1:]` (if `exclude_current_positions_from_observation=True`, else `observation[2:]`) is no longer contained in the closed interval specified by the argument `healthy_state_range`
@@ -102,7 +102,7 @@ class HopperEnv(MujocoEnv, utils.EzPickle):
If `terminate_when_unhealthy=False` is passed, the episode is ended only when 1000 timesteps are exceeded. If `terminate_when_unhealthy=False` is passed, the episode is ended only when 1000 timesteps are exceeded.
### Arguments ## Arguments
No additional arguments are currently supported in v2 and lower. No additional arguments are currently supported in v2 and lower.
@@ -131,7 +131,7 @@ class HopperEnv(MujocoEnv, utils.EzPickle):
| `reset_noise_scale` | **float** | `5e-3` | Scale of random perturbations of initial position and velocity (see section on Starting State) | | `reset_noise_scale` | **float** | `5e-3` | Scale of random perturbations of initial position and velocity (see section on Starting State) |
| `exclude_current_positions_from_observation` | **bool** | `True` | Whether or not to omit the x-coordinate from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies | | `exclude_current_positions_from_observation` | **bool** | `True` | Whether or not to omit the x-coordinate from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies |
### Version History ## Version History
* v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3 * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
* v3: support for `gymnasium.make` kwargs such as `xml_file`, `ctrl_cost_weight`, `reset_noise_scale`, etc. rgb rendering comes from tracking camera (so agent does not run away from screen) * v3: support for `gymnasium.make` kwargs such as `xml_file`, `ctrl_cost_weight`, `reset_noise_scale`, etc. rgb rendering comes from tracking camera (so agent does not run away from screen)

View File

@@ -20,7 +20,7 @@ def mass_center(model, data):
class HumanoidEnv(MujocoEnv, utils.EzPickle): class HumanoidEnv(MujocoEnv, utils.EzPickle):
""" """
### Description ## Description
This environment is based on the environment introduced by Tassa, Erez and Todorov This environment is based on the environment introduced by Tassa, Erez and Todorov
in ["Synthesis and stabilization of complex behaviors through online trajectory optimization"](https://ieeexplore.ieee.org/document/6386025). in ["Synthesis and stabilization of complex behaviors through online trajectory optimization"](https://ieeexplore.ieee.org/document/6386025).
@@ -28,7 +28,7 @@ class HumanoidEnv(MujocoEnv, utils.EzPickle):
legs and arms. The legs each consist of two links, and so the arms (representing the knees and legs and arms. The legs each consist of two links, and so the arms (representing the knees and
elbows respectively). The goal of the environment is to walk forward as fast as possible without falling over. elbows respectively). The goal of the environment is to walk forward as fast as possible without falling over.
### Action Space ## Action Space
The action space is a `Box(-1, 1, (17,), float32)`. An action represents the torques applied at the hinge joints. The action space is a `Box(-1, 1, (17,), float32)`. An action represents the torques applied at the hinge joints.
| Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit | | Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit |
@@ -51,7 +51,7 @@ class HumanoidEnv(MujocoEnv, utils.EzPickle):
| 15 | Torque applied on the rotor between the torso and left upper arm (coordinate -2) | -0.4 | 0.4 | left_shoulder2 | hinge | torque (N m) | | 15 | Torque applied on the rotor between the torso and left upper arm (coordinate -2) | -0.4 | 0.4 | left_shoulder2 | hinge | torque (N m) |
| 16 | Torque applied on the rotor between the left upper arm and left lower arm | -0.4 | 0.4 | left_elbow | hinge | torque (N m) | | 16 | Torque applied on the rotor between the left upper arm and left lower arm | -0.4 | 0.4 | left_elbow | hinge | torque (N m) |
### Observation Space ## Observation Space
Observations consist of positional values of different body parts of the Humanoid, Observations consist of positional values of different body parts of the Humanoid,
followed by the velocities of those individual parts (their derivatives) with all the followed by the velocities of those individual parts (their derivatives) with all the
@@ -139,7 +139,7 @@ class HumanoidEnv(MujocoEnv, utils.EzPickle):
with contact forces (if contact forces are not used in your experiments, you can use with contact forces (if contact forces are not used in your experiments, you can use
version > 2.0). version > 2.0).
### Rewards ## Rewards
The reward consists of three parts: The reward consists of three parts:
- *healthy_reward*: Every timestep that the humanoid is alive (see section Episode Termination for definition), it gets a reward of fixed value `healthy_reward` - *healthy_reward*: Every timestep that the humanoid is alive (see section Episode Termination for definition), it gets a reward of fixed value `healthy_reward`
- *forward_reward*: A reward of walking forward which is measured as *`forward_reward_weight` * - *forward_reward*: A reward of walking forward which is measured as *`forward_reward_weight` *
@@ -157,7 +157,7 @@ class HumanoidEnv(MujocoEnv, utils.EzPickle):
The total reward returned is ***reward*** *=* *healthy_reward + forward_reward - ctrl_cost - contact_cost* and `info` will also contain the individual reward terms The total reward returned is ***reward*** *=* *healthy_reward + forward_reward - ctrl_cost - contact_cost* and `info` will also contain the individual reward terms
### Starting State ## Starting State
All observations start in state All observations start in state
(0.0, 0.0, 1.4, 1.0, 0.0 ... 0.0) with a uniform noise in the range (0.0, 0.0, 1.4, 1.0, 0.0 ... 0.0) with a uniform noise in the range
of [-`reset_noise_scale`, `reset_noise_scale`] added to the positional and velocity values (values in the table) of [-`reset_noise_scale`, `reset_noise_scale`] added to the positional and velocity values (values in the table)
@@ -165,7 +165,7 @@ class HumanoidEnv(MujocoEnv, utils.EzPickle):
selected to be high, thereby indicating a standing up humanoid. The initial selected to be high, thereby indicating a standing up humanoid. The initial
orientation is designed to make it face forward as well. orientation is designed to make it face forward as well.
### Episode End ## Episode End
The humanoid is said to be unhealthy if the z-position of the torso is no longer contained in the The humanoid is said to be unhealthy if the z-position of the torso is no longer contained in the
closed interval specified by the argument `healthy_z_range`. closed interval specified by the argument `healthy_z_range`.
@@ -177,7 +177,7 @@ class HumanoidEnv(MujocoEnv, utils.EzPickle):
If `terminate_when_unhealthy=False` is passed, the episode is ended only when 1000 timesteps are exceeded. If `terminate_when_unhealthy=False` is passed, the episode is ended only when 1000 timesteps are exceeded.
### Arguments ## Arguments
No additional arguments are currently supported in v2 and lower. No additional arguments are currently supported in v2 and lower.
@@ -205,7 +205,7 @@ class HumanoidEnv(MujocoEnv, utils.EzPickle):
| `reset_noise_scale` | **float** | `1e-2` | Scale of random perturbations of initial position and velocity (see section on Starting State) | | `reset_noise_scale` | **float** | `1e-2` | Scale of random perturbations of initial position and velocity (see section on Starting State) |
| `exclude_current_positions_from_observation` | **bool** | `True` | Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies | | `exclude_current_positions_from_observation` | **bool** | `True` | Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies |
### Version History ## Version History
* v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3 * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
* v3: support for `gymnasium.make` kwargs such as `xml_file`, `ctrl_cost_weight`, `reset_noise_scale`, etc. rgb rendering comes from tracking camera (so agent does not run away from screen) * v3: support for `gymnasium.make` kwargs such as `xml_file`, `ctrl_cost_weight`, `reset_noise_scale`, etc. rgb rendering comes from tracking camera (so agent does not run away from screen)

View File

@@ -7,7 +7,7 @@ from gymnasium.spaces import Box
class HumanoidStandupEnv(MujocoEnv, utils.EzPickle): class HumanoidStandupEnv(MujocoEnv, utils.EzPickle):
""" """
### Description ## Description
This environment is based on the environment introduced by Tassa, Erez and Todorov This environment is based on the environment introduced by Tassa, Erez and Todorov
in ["Synthesis and stabilization of complex behaviors through online trajectory optimization"](https://ieeexplore.ieee.org/document/6386025). in ["Synthesis and stabilization of complex behaviors through online trajectory optimization"](https://ieeexplore.ieee.org/document/6386025).
@@ -17,7 +17,7 @@ class HumanoidStandupEnv(MujocoEnv, utils.EzPickle):
and then the goal of the environment is to make the humanoid standup and then keep it standing and then the goal of the environment is to make the humanoid standup and then keep it standing
by applying torques on the various hinges. by applying torques on the various hinges.
### Action Space ## Action Space
The agent take a 17-element vector for actions. The agent take a 17-element vector for actions.
The action space is a continuous `(action, ...)` all in `[-1, 1]`, where `action` The action space is a continuous `(action, ...)` all in `[-1, 1]`, where `action`
@@ -43,7 +43,7 @@ class HumanoidStandupEnv(MujocoEnv, utils.EzPickle):
| 15 | Torque applied on the rotor between the torso and left upper arm (coordinate -2) | -0.4 | 0.4 | left_shoulder2 | hinge | torque (N m) | | 15 | Torque applied on the rotor between the torso and left upper arm (coordinate -2) | -0.4 | 0.4 | left_shoulder2 | hinge | torque (N m) |
| 16 | Torque applied on the rotor between the left upper arm and left lower arm | -0.4 | 0.4 | left_elbow | hinge | torque (N m) | | 16 | Torque applied on the rotor between the left upper arm and left lower arm | -0.4 | 0.4 | left_elbow | hinge | torque (N m) |
### Observation Space ## Observation Space
The state space consists of positional values of different body parts of the Humanoid, The state space consists of positional values of different body parts of the Humanoid,
followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities. followed by the velocities of those individual parts (their derivatives) with all the positions ordered before all the velocities.
@@ -125,7 +125,7 @@ class HumanoidStandupEnv(MujocoEnv, utils.EzPickle):
when using the Humanoid environment if you would like to report results with contact forces when using the Humanoid environment if you would like to report results with contact forces
(if contact forces are not used in your experiments, you can use version > 2.0). (if contact forces are not used in your experiments, you can use version > 2.0).
### Rewards ## Rewards
The reward consists of three parts: The reward consists of three parts:
- *uph_cost*: A reward for moving upward (in an attempt to stand up). This is not a relative - *uph_cost*: A reward for moving upward (in an attempt to stand up). This is not a relative
reward which measures how much upward it has moved from the last timestep, but it is an reward which measures how much upward it has moved from the last timestep, but it is an
@@ -143,7 +143,7 @@ class HumanoidStandupEnv(MujocoEnv, utils.EzPickle):
The total reward returned is ***reward*** *=* *uph_cost + 1 - quad_ctrl_cost - quad_impact_cost* The total reward returned is ***reward*** *=* *uph_cost + 1 - quad_ctrl_cost - quad_impact_cost*
### Starting State ## Starting State
All observations start in state All observations start in state
(0.0, 0.0, 0.105, 1.0, 0.0 ... 0.0) with a uniform noise in the range of (0.0, 0.0, 0.105, 1.0, 0.0 ... 0.0) with a uniform noise in the range of
[-0.01, 0.01] added to the positional and velocity values (values in the table) [-0.01, 0.01] added to the positional and velocity values (values in the table)
@@ -151,13 +151,13 @@ class HumanoidStandupEnv(MujocoEnv, utils.EzPickle):
to be low, thereby indicating a laying down humanoid. The initial orientation is to be low, thereby indicating a laying down humanoid. The initial orientation is
designed to make it face forward as well. designed to make it face forward as well.
### Episode End ## Episode End
The episode ends when any of the following happens: The episode ends when any of the following happens:
1. Truncation: The episode duration reaches a 1000 timesteps 1. Truncation: The episode duration reaches a 1000 timesteps
2. Termination: Any of the state space values is no longer finite 2. Termination: Any of the state space values is no longer finite
### Arguments ## Arguments
No additional arguments are currently supported. No additional arguments are currently supported.
@@ -174,7 +174,7 @@ class HumanoidStandupEnv(MujocoEnv, utils.EzPickle):
env = gym.make('HumanoidStandup-v2') env = gym.make('HumanoidStandup-v2')
``` ```
### Version History ## Version History
* v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3 * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
* v3: support for `gymnasium.make` kwargs such as `xml_file`, `ctrl_cost_weight`, `reset_noise_scale`, etc. rgb rendering comes from tracking camera (so agent does not run away from screen) * v3: support for `gymnasium.make` kwargs such as `xml_file`, `ctrl_cost_weight`, `reset_noise_scale`, etc. rgb rendering comes from tracking camera (so agent does not run away from screen)

Some files were not shown because too many files have changed in this diff Show More