Files
Gymnasium/docs/environments/atari.md

334 lines
23 KiB
Markdown
Raw Normal View History

2022-09-13 20:27:34 +01:00
---
firstpage:
lastpage:
---
# Atari
2023-02-15 01:30:47 +00:00
A set of Atari 2600 environments simulated through [Stella](https://github.com/stella-emu/stella) and the [Arcade Learning Environment](https://github.com/mgbellemare/Arcade-Learning-Environment).
2022-09-13 20:27:34 +01:00
```{toctree}
:hidden:
2022-10-03 19:01:14 +01:00
atari/adventure
atari/air_raid
atari/alien
atari/amidar
atari/assault
atari/asterix
atari/asteroids
atari/atlantis
atari/bank_heist
atari/battle_zone
atari/beam_rider
atari/berzerk
atari/bowling
atari/boxing
atari/breakout
atari/carnival
atari/centipede
atari/chopper_command
atari/crazy_climber
atari/defender
atari/demon_attack
atari/double_dunk
atari/elevator_action
atari/enduro
atari/fishing_derby
atari/freeway
atari/frostbite
atari/gopher
atari/gravitar
atari/hero
atari/ice_hockey
atari/jamesbond
atari/journey_escape
atari/kangaroo
atari/krull
atari/kung_fu_master
atari/montezuma_revenge
atari/ms_pacman
atari/name_this_game
atari/phoenix
atari/pitfall
atari/pong
atari/pooyan
atari/private_eye
atari/qbert
atari/riverraid
atari/road_runner
atari/robotank
atari/seaquest
atari/skiing
atari/solaris
atari/space_invaders
atari/star_gunner
atari/tennis
atari/time_pilot
atari/tutankham
atari/up_n_down
atari/venture
atari/video_pinball
atari/wizard_of_wor
atari/yars_revenge
atari/zaxxon
2022-09-13 20:27:34 +01:00
```
```{raw} html
2022-10-03 19:01:14 +01:00
:file: atari/list.html
2022-09-13 20:27:34 +01:00
```
2023-02-15 01:30:47 +00:00
Atari environments are simulated via the Arcade Learning Environment (ALE) [[1]](#1) through the Stella emulator.
2022-09-13 20:27:34 +01:00
## AutoROM (installing the ROMs)
2022-09-25 15:43:00 +01:00
2022-10-10 17:23:40 +02:00
ALE-py doesn't include the atari ROMs (`pip install gymnasium[atari]`) which are necessary to make any of the atari environments.
To install the atari ROM, use `pip install gymnasium[accept-rom-license]` which will install AutoROM and download the ROMs, install them in the default location.
2023-02-15 01:30:47 +00:00
In doing so, you agree to own a license to these Atari 2600 ROMs and agree to not distribution these ROMS.
2022-09-25 15:43:00 +01:00
2023-02-15 01:30:47 +00:00
It is possible to install the ROMs in an alternative location, [AutoROM](https://github.com/Farama-Foundation/AutoROM) has more information.
2022-09-25 15:43:00 +01:00
## Action Space
2022-09-13 20:27:34 +01:00
2023-02-15 01:30:47 +00:00
Each environment will use a sub-set of the full action space listed below:
2022-09-13 20:27:34 +01:00
2022-09-19 21:14:09 +01:00
| Num | Action |
|-----|---------------|
| 0 | NOOP |
| 1 | FIRE |
| 2 | UP |
| 3 | RIGHT |
| 4 | LEFT |
| 5 | DOWN |
| 6 | UPRIGHT |
| 7 | UPLEFT |
| 8 | DOWNRIGHT |
| 9 | DOWNLEFT |
| 10 | UPFIRE |
| 11 | RIGHTFIRE |
| 12 | LEFTFIRE |
| 13 | DOWNFIRE |
| 14 | UPRIGHTFIRE |
| 15 | UPLEFTFIRE |
| 16 | DOWNRIGHTFIRE |
| 17 | DOWNLEFTFIRE |
2022-09-13 20:27:34 +01:00
2023-02-15 01:30:47 +00:00
By default, most environments use a smaller subset of the legal actions excluding any actions that don't have an effect in the game.
If users are interested in using all possible actions, pass the keyword argument `full_action_space=True` to `gymnasium.make`.
2022-09-13 20:27:34 +01:00
## Observation Space
2023-02-15 01:30:47 +00:00
The Atari environments observation can be
1. The RGB image that is displayed to a human player using `obs_type="rgb"` with observation space `Box(0, 255, (210, 160, 3), np.uint8)`
2. The grayscale version of the RGB image using `obs_type="grayscale"` with observation space `Box(0, 255, (210, 160), np.uint8)`
3. The RAM state (128 bytes) from the console using `obs_type="ram"` with observation space `Box(0, 255, (128), np.uint8)`
2022-09-13 20:27:34 +01:00
## Rewards
2022-09-13 20:27:34 +01:00
The exact reward dynamics depend on the environment and are usually documented in the game's manual. You can
find these manuals on [AtariAge](https://atariage.com/).
## Stochasticity
2023-02-15 01:30:47 +00:00
As the Atari games are entirely deterministic, agents could achieve
2022-10-10 17:23:40 +02:00
state-of-the-art performance by simply memorizing an optimal sequence of actions while completely ignoring observations from the environment.
2022-09-13 20:27:34 +01:00
2023-02-15 01:30:47 +00:00
To avoid this, there are several methods to avoid this.
1. Sticky actions: Instead of always simulating the action passed to the environment, there is a small
probability that the previously executed action is used instead. In the v0 and v5 environments, the probability of
repeating an action is `25%` while in v4 environments, the probability is `0%`. Users can specify the repeat action
probability using `repeat_action_probability` to `make`.
2. Frameskipping: On each environment step, the action can be repeated for a random number of frames. This behavior
may be altered by setting the keyword argument `frameskip` to either a positive integer or
2022-10-03 19:01:14 +01:00
a tuple of two positive integers. If `frameskip` is an integer, frame skipping is deterministic, and in each step the action is
repeated `frameskip` many times. Otherwise, if `frameskip` is a tuple, the number of skipped frames is chosen uniformly at
2022-09-13 20:27:34 +01:00
random between `frameskip[0]` (inclusive) and `frameskip[1]` (exclusive) in each environment step.
## Common Arguments
2022-10-03 19:01:14 +01:00
When initializing Atari environments via `gymnasium.make`, you may pass some additional arguments. These work for any
2023-02-15 01:30:47 +00:00
Atari environment.
2022-09-13 20:27:34 +01:00
- **mode**: `int`. Game mode, see [[2]](#2). Legal values depend on the environment and are listed in the table above.
2022-10-10 17:23:40 +02:00
- **difficulty**: `int`. The difficulty of the game, see [[2]](#2). Legal values depend on the environment and are listed in
2022-09-13 20:27:34 +01:00
the table above. Together with `mode`, this determines the "flavor" of the game.
- **obs_type**: `str`. This argument determines what observations are returned by the environment. Its values are:
2023-02-15 01:30:47 +00:00
- "ram": The 128 Bytes of RAM are returned
- "rgb": An RGB rendering of the game is returned
- "grayscale": A grayscale rendering is returned
2022-09-13 20:27:34 +01:00
- **frameskip**: `int` or a tuple of two `int`s. This argument controls stochastic frame skipping, as described in the section on stochasticity.
2023-02-15 01:30:47 +00:00
- **repeat_action_probability**: `float`. The probability that an action is repeated, also called "sticky actions", as described in the section on stochasticity.
2022-09-13 20:27:34 +01:00
- **full_action_space**: `bool`. If set to `True`, the action space consists of all legal actions on the console. Otherwise, the
action space will be reduced to a subset.
- **render_mode**: `str`. Specifies the rendering mode. Its values are:
2023-02-15 01:30:47 +00:00
- human: Display the screen and enable game sounds. This will lock emulation to the ROMs specified FPS
- rgb_array: Returns the current environment RGB frame of the environment.
2022-09-13 20:27:34 +01:00
## Version History and Naming Schemes
2022-09-13 20:27:34 +01:00
All Atari games are available in three versions. They differ in the default settings of the arguments above.
The differences are listed in the following table:
2022-09-19 21:14:09 +01:00
| Version | `frameskip=` | `repeat_action_probability=` | `full_action_space=` |
|---------|--------------|------------------------------|----------------------|
| v0 | `(2, 5,)` | `0.25` | `False` |
| v4 | `(2, 5,)` | `0.0` | `False` |
2023-02-15 01:30:47 +00:00
| v5 | `5` | `0.25` | `False` |
2022-09-13 20:27:34 +01:00
> Version v5 follows the best practices outlined in [[2]](#2). Thus, it is recommended to transition to v5 and
2023-02-15 01:30:47 +00:00
customize the environment using the arguments above, if necessary.
2022-09-13 20:27:34 +01:00
For each Atari game, several different configurations are registered in Gymnasium. The naming schemes are analogous for
v0 and v4. Let us take a look at all variations of Amidar-v0 that are registered with gymnasium:
2023-02-15 01:30:47 +00:00
| Name | `obs_type=` | `frameskip=` | `repeat_action_probability=` |
|----------------------------|-------------|--------------|------------------------------|
| Amidar-v0 | `"rgb"` | `(2, 5,)` | `0.25` |
| AmidarDeterministic-v0 | `"rgb"` | `4` | `0.0` |
| AmidarNoframeskip-v0 | `"rgb"` | `1` | `0.25` |
| Amidar-ram-v0 | `"ram"` | `(2, 5,)` | `0.25` |
| Amidar-ramDeterministic-v0 | `"ram"` | `4` | `0.0` |
| Amidar-ramNoframeskip-v0 | `"ram"` | `1` | `0.25` |
2022-09-13 20:27:34 +01:00
Things change in v5: The suffixes "Deterministic" and "NoFrameskip" are no longer available. Instead, you must specify the
environment configuration via arguments passed to `gymnasium.make`. Moreover, the v5 environments
are in the "ALE" namespace. The suffix "-ram" is still available. Thus, we get the following table:
2023-02-15 01:30:47 +00:00
| Name | `obs_type=` | `frameskip=` | `repeat_action_probability=` |
|-------------------|-------------|--------------|------------------------------|
| ALE/Amidar-v5 | `"rgb"` | `5` | `0.25` |
| ALE/Amidar-ram-v5 | `"ram"` | `5` | `0.25` |
2022-09-13 20:27:34 +01:00
## Flavors
2022-09-13 20:27:34 +01:00
Some games allow the user to set a difficulty level and a game mode. Different modes/difficulties may have different
game dynamics and (if a reduced action space is used) different action spaces. We follow the convention of [[2]](#2) and
2022-10-10 17:23:40 +02:00
refer to the combination of difficulty level and game mode as a flavor of a game. The following table shows
2022-09-13 20:27:34 +01:00
the available modes and difficulty levels for different Atari games:
2023-02-15 01:30:47 +00:00
| Environment | Possible Modes | Default Mode | Possible Difficulties | Default Difficulty |
|------------------|-----------------------------------------------|----------------|-------------------------|----------------------|
| Adventure | [0, 1, 2] | 0 | [0, 1, 2, 3] | 0 |
| AirRaid | [1, ..., 8] | 1 | [0] | 0 |
| Alien | [0, 1, 2, 3] | 0 | [0, 1, 2, 3] | 0 |
| Amidar | [0] | 0 | [0, 3] | 0 |
| Assault | [0] | 0 | [0] | 0 |
| Asterix | [0] | 0 | [0] | 0 |
| Asteroids | [0, ..., 31, 128] | 0 | [0, 3] | 0 |
| Atlantis | [0, 1, 2, 3] | 0 | [0] | 0 |
| Atlantis2 | [0] | 0 | [0] | 0 |
| Backgammon | [0] | 0 | [3] | 0 |
| BankHeist | [0, 4, 8, 12, 16, 20, 24, 28] | 0 | [0, 1, 2, 3] | 0 |
| BasicMath | [5, 6, 7, 8] | 5 | [0, 2, 3] | 0 |
| BattleZone | [1, 2, 3] | 1 | [0] | 0 |
| BeamRider | [0] | 0 | [0, 1] | 0 |
| Berzerk | [1, ..., 9, 16, 17, 18] | 1 | [0] | 0 |
| Blackjack | [0] | 0 | [0, 1, 2, 3] | 0 |
| Bowling | [0, 2, 4] | 0 | [0, 1] | 0 |
| Boxing | [0] | 0 | [0, 1, 2, 3] | 0 |
| Breakout | [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44] | 0 | [0, 1] | 0 |
| Carnival | [0] | 0 | [0] | 0 |
| Casino | [0, 2, 3] | 0 | [0, 1, 2, 3] | 0 |
| Centipede | [22, 86] | 22 | [0] | 0 |
| ChopperCommand | [0, 2] | 0 | [0, 1] | 0 |
| CrazyClimber | [0, 1, 2, 3] | 0 | [0, 1] | 0 |
| Crossbow | [0, 2, 4, 6] | 0 | [0, 1] | 0 |
| Darkchambers | [0] | 0 | [0] | 0 |
| Defender | [1, ..., 9, 16] | 1 | [0, 1] | 0 |
| DemonAttack | [1, 3, 5, 7] | 1 | [0, 1] | 0 |
| DonkeyKong | [0] | 0 | [0] | 0 |
| DoubleDunk | [0, ..., 15] | 0 | [0] | 0 |
| Earthworld | [0] | 0 | [0] | 0 |
| ElevatorAction | [0] | 0 | [0] | 0 |
| Enduro | [0] | 0 | [0] | 0 |
| Entombed | [0] | 0 | [0, 2] | 0 |
| Et | [0, 1, 2] | 0 | [0, 1, 2, 3] | 0 |
| FishingDerby | [0] | 0 | [0, 1, 2, 3] | 0 |
| FlagCapture | [8, 9, 10] | 8 | [0] | 0 |
| Freeway | [0, ..., 7] | 0 | [0, 1] | 0 |
| Frogger | [0, 1, 2] | 0 | [0, 1] | 0 |
| Frostbite | [0, 2] | 0 | [0] | 0 |
| Galaxian | [1, ..., 9] | 1 | [0, 1] | 0 |
| Gopher | [0, 2] | 0 | [0, 1] | 0 |
| Gravitar | [0, 1, 2, 3, 4] | 0 | [0] | 0 |
| Hangman | [0, 1, 2, 3] | 0 | [0, 1] | 0 |
| HauntedHouse | [0, ..., 8] | 0 | [0, 1] | 0 |
| Hero | [0, 1, 2, 3, 4] | 0 | [0] | 0 |
| HumanCannonball | [0, ..., 7] | 0 | [0, 1] | 0 |
| IceHockey | [0, 2] | 0 | [0, 1, 2, 3] | 0 |
| Jamesbond | [0, 1] | 0 | [0] | 0 |
| JourneyEscape | [0] | 0 | [0, 1] | 0 |
| Kaboom | [0] | 0 | [0] | 0 |
| Kangaroo | [0, 1] | 0 | [0] | 0 |
| KeystoneKapers | [0] | 0 | [0] | 0 |
| KingKong | [0, 1, 2, 3] | 0 | [0] | 0 |
| Klax | [0, 1, 2] | 0 | [0] | 0 |
| Koolaid | [0] | 0 | [0] | 0 |
| Krull | [0] | 0 | [0] | 0 |
| KungFuMaster | [0] | 0 | [0] | 0 |
| LaserGates | [0] | 0 | [0] | 0 |
| LostLuggage | [0, 1] | 0 | [0, 1] | 0 |
| MarioBros | [0, 2, 4, 6] | 0 | [0] | 0 |
| MiniatureGolf | [0] | 0 | [0, 1] | 0 |
| MontezumaRevenge | [0] | 0 | [0] | 0 |
| MrDo | [0, 1, 2, 3] | 0 | [0] | 0 |
| MsPacman | [0, 1, 2, 3] | 0 | [0] | 0 |
| NameThisGame | [8, 24, 40] | 8 | [0, 1] | 0 |
| Othello | [0, 1, 2] | 0 | [0, 2] | 0 |
| Pacman | [0, ..., 7] | 0 | [0, 1] | 0 |
| Phoenix | [0] | 0 | [0] | 0 |
| Pitfall | [0] | 0 | [0] | 0 |
| Pitfall2 | [0] | 0 | [0] | 0 |
| Pong | [0, 1] | 0 | [0, 1, 2, 3] | 0 |
| Pooyan | [10, 30, 50, 70] | 10 | [0] | 0 |
| PrivateEye | [0, 1, 2, 3, 4] | 0 | [0, 1, 2, 3] | 0 |
| Qbert | [0] | 0 | [0, 1] | 0 |
| Riverraid | [0] | 0 | [0, 1] | 0 |
| RoadRunner | [0] | 0 | [0] | 0 |
| Robotank | [0] | 0 | [0] | 0 |
| Seaquest | [0] | 0 | [0, 1] | 0 |
| SirLancelot | [0] | 0 | [0] | 0 |
| Skiing | [0] | 0 | [0] | 0 |
| Solaris | [0] | 0 | [0] | 0 |
| SpaceInvaders | [0, ..., 15] | 0 | [0, 1] | 0 |
| SpaceWar | [6, ..., 17] | 6 | [0] | 0 |
| StarGunner | [0, 1, 2, 3] | 0 | [0] | 0 |
| Superman | [0] | 0 | [0, 1, 2, 3] | 0 |
| Surround | [0, 2] | 0 | [0, 1, 2, 3] | 0 |
| Tennis | [0, 2] | 0 | [0, 1, 2, 3] | 0 |
| Tetris | [0] | 0 | [0] | 0 |
| TimePilot | [0] | 0 | [0, 1, 2] | 0 |
| Trondead | [0] | 0 | [0, 1] | 0 |
| Turmoil | [0, ..., 8] | 0 | [0] | 0 |
| Tutankham | [0, 4, 8, 12] | 0 | [0] | 0 |
| UpNDown | [0] | 0 | [0, 1, 2, 3] | 0 |
| Venture | [0] | 0 | [0, 1, 2, 3] | 0 |
| VideoCheckers | [1, ..., 9, 11, ..., 19] | 1 | [0] | 0 |
| VideoPinball | [0, 2] | 0 | [0, 1] | 0 |
| WizardOfWor | [0] | 0 | [0, 1] | 0 |
| WordZapper | [0, ..., 23] | 0 | [0, 1, 2, 3] | 0 |
| YarsRevenge | [0, 32, 64, 96] | 0 | [0, 1] | 0 |
| Zaxxon | [0, 8, 16, 24] | 0 | [0] | 0 |
2022-09-13 20:27:34 +01:00
## References
2022-09-13 20:27:34 +01:00
(#1)=
2022-10-03 19:01:14 +01:00
<a id="1">[1]</a>
MG Bellemare, Y Naddaf, J Veness, and M Bowling.
"The arcade learning environment: An evaluation platform for general agents."
Journal of Artificial Intelligence Research (2012).
2022-09-13 20:27:34 +01:00
(#2)=
2022-10-03 19:01:14 +01:00
<a id="2">[2]</a>
Machado et al.
2022-09-13 20:27:34 +01:00
"Revisiting the Arcade Learning Environment: Evaluation Protocols
2022-10-03 19:01:14 +01:00
and Open Problems for General Agents"
Journal of Artificial Intelligence Research (2018)
URL: https://jair.org/index.php/jair/article/view/11182