diff --git a/.github/workflows/build-docs.yml b/.github/workflows/build-docs.yml index 7b5dad395..882e39626 100644 --- a/.github/workflows/build-docs.yml +++ b/.github/workflows/build-docs.yml @@ -8,22 +8,22 @@ jobs: docs: name: Generate Website runs-on: ubuntu-latest - + steps: - uses: actions/checkout@v3 - uses: actions/setup-python@v4 with: - python-version: '3.9' + python-version: '3.9' - name: Install dependencies run: pip install -r docs/requirements.txt - + - name: Install Gymnasium run: pip install mujoco && pip install .[atari,accept-rom-license,box2d] - + - name: Build Envs Docs - run: python docs/scripts/gen_mds.py + run: python docs/scripts/gen_mds.py && python docs/scripts/gen_envs_display.py - name: Build run: sphinx-build -b dirhtml -v docs _build diff --git a/docs/.gitignore b/docs/.gitignore index 0f0266979..324bc98c4 100644 --- a/docs/.gitignore +++ b/docs/.gitignore @@ -2,4 +2,13 @@ __pycache__ .vscode/ build/ -_build/ \ No newline at end of file +_build/ + +environments/**/list.html +environments/**/complete_list.html + +environments/box2d/*.md +environments/classic_control/*.md +environments/mujoco/*.md +environments/third_party_environments/*.md +environments/toy_text/*.md \ No newline at end of file diff --git a/docs/environments/atari/index.md b/docs/environments/atari.md similarity index 88% rename from docs/environments/atari/index.md rename to docs/environments/atari.md index ccdc71c6f..2ccb52610 100644 --- a/docs/environments/atari/index.md +++ b/docs/environments/atari.md @@ -9,75 +9,75 @@ A set of Atari 2600 environment simulated through Stella and the Arcade Learning ```{toctree} :hidden: -adventure -air_raid -alien -amidar -assault -asterix -asteroids -atlantis -bank_heist -battle_zone -beam_rider -berzerk -bowling -boxing -breakout -carnival -centipede -chopper_command -crazy_climber -defender -demon_attack -double_dunk -elevator_action -enduro -fishing_derby -freeway -frostbite -gopher -gravitar -hero -ice_hockey -jamesbond -journey_escape -kangaroo -krull -kung_fu_master -montezuma_revenge -ms_pacman -name_this_game -phoenix -pitfall -pong -pooyan -private_eye -qbert -riverraid -road_runner -robotank -seaquest -skiing -solaris -space_invaders -star_gunner -tennis -time_pilot -tutankham -up_n_down -venture -video_pinball -wizard_of_wor -yars_revenge -zaxxon +atari/adventure +atari/air_raid +atari/alien +atari/amidar +atari/assault +atari/asterix +atari/asteroids +atari/atlantis +atari/bank_heist +atari/battle_zone +atari/beam_rider +atari/berzerk +atari/bowling +atari/boxing +atari/breakout +atari/carnival +atari/centipede +atari/chopper_command +atari/crazy_climber +atari/defender +atari/demon_attack +atari/double_dunk +atari/elevator_action +atari/enduro +atari/fishing_derby +atari/freeway +atari/frostbite +atari/gopher +atari/gravitar +atari/hero +atari/ice_hockey +atari/jamesbond +atari/journey_escape +atari/kangaroo +atari/krull +atari/kung_fu_master +atari/montezuma_revenge +atari/ms_pacman +atari/name_this_game +atari/phoenix +atari/pitfall +atari/pong +atari/pooyan +atari/private_eye +atari/qbert +atari/riverraid +atari/road_runner +atari/robotank +atari/seaquest +atari/skiing +atari/solaris +atari/space_invaders +atari/star_gunner +atari/tennis +atari/time_pilot +atari/tutankham +atari/up_n_down +atari/venture +atari/video_pinball +atari/wizard_of_wor +atari/yars_revenge +atari/zaxxon ``` ```{raw} html - :file: index.html + :file: atari/list.html ``` -Atari environments are simulated via the Arcade Learning Environment (ALE) [[1]](#1). +Atari environments are simulated via the Arcade Learning Environment (ALE) [[1]](#1). ### AutoROM (installing the ROMs) @@ -113,12 +113,12 @@ The action space is a subset of the following discrete set of legal actions: | 17 | DOWNLEFTFIRE | If you use v0 or v4 and the environment is initialized via `make`, the action space will usually be much smaller since most legal actions don't have -any effect. Thus, the enumeration of the actions will differ. The action space can be expanded to the full +any effect. Thus, the enumeration of the actions will differ. The action space can be expanded to the full legal space by passing the keyword argument `full_action_space=True` to `make`. -The reduced action space of an Atari environment may depend on the "flavor" of the game. You can specify the flavor by providing +The reduced action space of an Atari environment may depend on the "flavor" of the game. You can specify the flavor by providing the arguments `difficulty` and `mode` when constructing the environment. This documentation only provides details on the -action spaces of default flavor choices. +action spaces of default flavor choices. ### Observation Space The observation issued by an Atari environment may be: @@ -131,26 +131,26 @@ The exact reward dynamics depend on the environment and are usually documented i find these manuals on [AtariAge](https://atariage.com/). ### Stochasticity -It was pointed out in [[1]](#1) that Atari games are entirely deterministic. Thus, agents could achieve +It was pointed out in [[1]](#1) that Atari games are entirely deterministic. Thus, agents could achieve state of the art performance by simply memorizing an optimal sequence of actions while completely ignoring observations from the environment. To avoid this, ALE implements sticky actions: Instead of always simulating the action passed to the environment, there is a small probability that the previously executed action is used instead. On top of this, Gymnasium implements stochastic frame skipping: In each environment step, the action is repeated for a random -number of frames. This behavior may be altered by setting the keyword argument `frameskip` to either a positive integer or -a tuple of two positive integers. If `frameskip` is an integer, frame skipping is deterministic, and in each step the action is -repeated `frameskip` many times. Otherwise, if `frameskip` is a tuple, the number of skipped frames is chosen uniformly at +number of frames. This behavior may be altered by setting the keyword argument `frameskip` to either a positive integer or +a tuple of two positive integers. If `frameskip` is an integer, frame skipping is deterministic, and in each step the action is +repeated `frameskip` many times. Otherwise, if `frameskip` is a tuple, the number of skipped frames is chosen uniformly at random between `frameskip[0]` (inclusive) and `frameskip[1]` (exclusive) in each environment step. ### Common Arguments -When initializing Atari environments via `gymnasium.make`, you may pass some additional arguments. These work for any +When initializing Atari environments via `gymnasium.make`, you may pass some additional arguments. These work for any Atari environment. However, legal values for `mode` and `difficulty` depend on the environment. - **mode**: `int`. Game mode, see [[2]](#2). Legal values depend on the environment and are listed in the table above. -- **difficulty**: `int`. Difficulty of the game, see [[2]](#2). Legal values depend on the environment and are listed in +- **difficulty**: `int`. Difficulty of the game, see [[2]](#2). Legal values depend on the environment and are listed in the table above. Together with `mode`, this determines the "flavor" of the game. - **obs_type**: `str`. This argument determines what observations are returned by the environment. Its values are: @@ -168,7 +168,7 @@ action space will be reduced to a subset. - **render_mode**: `str`. Specifies the rendering mode. Its values are: - human: We'll interactively display the screen and enable game sounds. This will lock emulation to the ROMs specified FPS - rgb_array: we'll return the `rgb` key in step metadata with the current environment RGB frame. -> It is highly recommended to specify `render_mode` during construction instead of calling `env.render()`. +> It is highly recommended to specify `render_mode` during construction instead of calling `env.render()`. > This will guarantee proper scaling, audio support, and proper framerates @@ -282,15 +282,15 @@ the available modes and difficulty levels for different Atari games: ### References (#1)= -[1] -MG Bellemare, Y Naddaf, J Veness, and M Bowling. -"The arcade learning environment: An evaluation platform for general agents." -Journal of Artificial Intelligence Research (2012). +[1] +MG Bellemare, Y Naddaf, J Veness, and M Bowling. +"The arcade learning environment: An evaluation platform for general agents." +Journal of Artificial Intelligence Research (2012). (#2)= -[2] -Machado et al. +[2] +Machado et al. "Revisiting the Arcade Learning Environment: Evaluation Protocols -and Open Problems for General Agents" -Journal of Artificial Intelligence Research (2018) -URL: https://jair.org/index.php/jair/article/view/11182 \ No newline at end of file +and Open Problems for General Agents" +Journal of Artificial Intelligence Research (2018) +URL: https://jair.org/index.php/jair/article/view/11182 \ No newline at end of file diff --git a/docs/environments/atari/complete_list.html b/docs/environments/atari/complete_list.html deleted file mode 100644 index 14ea1a2b1..000000000 --- a/docs/environments/atari/complete_list.html +++ /dev/null @@ -1,749 +0,0 @@ - -
- - -
-
- -
-
- Adventure -
-
-
- - - -
-
- -
-
- Air Raid -
-
-
- - - -
-
- -
-
- Alien -
-
-
- - - -
-
- -
-
- Amidar -
-
-
- - - -
-
- -
-
- Assault -
-
-
- - - -
-
- -
-
- Asterix -
-
-
- - - -
-
- -
-
- Asteroids -
-
-
- - - -
-
- -
-
- Atlantis -
-
-
- - - -
-
- -
-
- Bank Heist -
-
-
- - - -
-
- -
-
- Battle Zone -
-
-
- - - -
-
- -
-
- Beam Rider -
-
-
- - - -
-
- -
-
- Berzerk -
-
-
- - - -
-
- -
-
- Bowling -
-
-
- - - -
-
- -
-
- Boxing -
-
-
- - - -
-
- -
-
- Breakout -
-
-
- - - -
-
- -
-
- Carnival -
-
-
- - - -
-
- -
-
- Centipede -
-
-
- - - -
-
- -
-
- Chopper Command -
-
-
- - - -
-
- -
-
- Crazy Climber -
-
-
- - - -
-
- -
-
- Defender -
-
-
- - - -
-
- -
-
- Demon Attack -
-
-
- - - -
-
- -
-
- Double Dunk -
-
-
- - - -
-
- -
-
- Elevator Action -
-
-
- - - -
-
- -
-
- Enduro -
-
-
- - - -
-
- -
-
- Fishing Derby -
-
-
- - - -
-
- -
-
- Freeway -
-
-
- - - -
-
- -
-
- Frostbite -
-
-
- - - -
-
- -
-
- Gopher -
-
-
- - - -
-
- -
-
- Gravitar -
-
-
- - - -
-
- -
-
- Hero -
-
-
- - - -
-
- -
-
- Ice Hockey -
-
-
- - - -
-
- -
-
- Jamesbond -
-
-
- - - -
-
- -
-
- Journey Escape -
-
-
- - - -
-
- -
-
- Kangaroo -
-
-
- - - -
-
- -
-
- Krull -
-
-
- - - -
-
- -
-
- Kung Fu Master -
-
-
- - - -
-
- -
-
- Montezuma Revenge -
-
-
- - - -
-
- -
-
- Ms Pacman -
-
-
- - - -
-
- -
-
- Name This Game -
-
-
- - - -
-
- -
-
- Phoenix -
-
-
- - - -
-
- -
-
- Pitfall -
-
-
- - - -
-
- -
-
- Pong -
-
-
- - - -
-
- -
-
- Pooyan -
-
-
- - - -
-
- -
-
- Private Eye -
-
-
- - - -
-
- -
-
- Qbert -
-
-
- - - -
-
- -
-
- Riverraid -
-
-
- - - -
-
- -
-
- Road Runner -
-
-
- - - -
-
- -
-
- Robotank -
-
-
- - - -
-
- -
-
- Seaquest -
-
-
- - - -
-
- -
-
- Skiing -
-
-
- - - -
-
- -
-
- Solaris -
-
-
- - - -
-
- -
-
- Space Invaders -
-
-
- - - -
-
- -
-
- Star Gunner -
-
-
- - - -
-
- -
-
- Tennis -
-
-
- - - -
-
- -
-
- Time Pilot -
-
-
- - - -
-
- -
-
- Tutankham -
-
-
- - - -
-
- -
-
- Up N Down -
-
-
- - - -
-
- -
-
- Venture -
-
-
- - - -
-
- -
-
- Video Pinball -
-
-
- - - -
-
- -
-
- Wizard Of Wor -
-
-
- - - -
-
- -
-
- Yars Revenge -
-
-
- - - -
-
- -
-
- Zaxxon -
-
-
- -
- - \ No newline at end of file diff --git a/docs/environments/atari/index.html b/docs/environments/atari/index.html deleted file mode 100644 index 70b72988b..000000000 --- a/docs/environments/atari/index.html +++ /dev/null @@ -1,113 +0,0 @@ - -
- - -
-
- -
-
- Adventure -
-
-
- - - -
-
- -
-
- Air Raid -
-
-
- - - -
-
- -
-
- Alien -
-
-
- - - -
-
- -
-
- Amidar -
-
-
- - - -
-
- -
-
- Assault -
-
-
- - - -
-
- -
-
- Asterix -
-
-
- - - -
-
- -
-
- Asteroids -
-
-
- - - -
-
- -
-
- Atlantis -
-
-
- - - -
-
- -
-
- Bank Heist -
-
-
- -
- - \ No newline at end of file diff --git a/docs/environments/box2d/index.md b/docs/environments/box2d.md similarity index 86% rename from docs/environments/box2d/index.md rename to docs/environments/box2d.md index bfffe20a0..2d2de10c4 100644 --- a/docs/environments/box2d/index.md +++ b/docs/environments/box2d.md @@ -8,17 +8,17 @@ lastpage: ```{toctree} :hidden: -bipedal_walker -car_racing -lunar_lander -``` - -```{raw} html - :file: index.html +box2d/bipedal_walker +box2d/car_racing +box2d/lunar_lander ``` - + +```{raw} html + :file: box2d/list.html +``` + These environments all involve toy games based around physics control, using [box2d](https://box2d.org/) based physics and PyGame based rendering. These environments were contributed back in the early days of Gymnasium by Oleg Klimov, and have become popular toy benchmarks ever since. All environments are highly configurable via arguments specified in each environment's documentation. - + The unique dependencies for this set of environments can be installed via: ````bash diff --git a/docs/environments/box2d/.gitkeep b/docs/environments/box2d/.gitkeep new file mode 100644 index 000000000..e69de29bb diff --git a/docs/environments/box2d/index.html b/docs/environments/box2d/index.html deleted file mode 100644 index 1a16f1c17..000000000 --- a/docs/environments/box2d/index.html +++ /dev/null @@ -1,41 +0,0 @@ - -
- - -
-
- -
-
- Bipedal Walker -
-
-
- - - -
-
- -
-
- Car Racing -
-
-
- - - -
-
- -
-
- Lunar Lander -
-
-
- -
- - \ No newline at end of file diff --git a/docs/environments/classic_control/index.md b/docs/environments/classic_control.md similarity index 82% rename from docs/environments/classic_control/index.md rename to docs/environments/classic_control.md index e705db21f..fe56f39cb 100644 --- a/docs/environments/classic_control/index.md +++ b/docs/environments/classic_control.md @@ -8,15 +8,15 @@ lastpage: ```{toctree} :hidden: -acrobot -cart_pole -mountain_car_continuous -mountain_car -pendulum -``` +classic_control/acrobot +classic_control/cart_pole +classic_control/mountain_car_continuous +classic_control/mountain_car +classic_control/pendulum +``` ```{raw} html - :file: index.html + :file: classic_control/list.html ``` The unique dependencies for this set of environments can be installed via: diff --git a/docs/environments/classic_control/.gitkeep b/docs/environments/classic_control/.gitkeep new file mode 100644 index 000000000..e69de29bb diff --git a/docs/environments/classic_control/index.html b/docs/environments/classic_control/index.html deleted file mode 100644 index c96e2ca2b..000000000 --- a/docs/environments/classic_control/index.html +++ /dev/null @@ -1,65 +0,0 @@ - -
- - -
-
- -
-
- Acrobot -
-
-
- - - -
-
- -
-
- Cart Pole -
-
-
- - - -
-
- -
-
- Mountain Car Continuous -
-
-
- - - -
-
- -
-
- Mountain Car -
-
-
- - - -
-
- -
-
- Pendulum -
-
-
- -
- - \ No newline at end of file diff --git a/docs/environments/mujoco/index.md b/docs/environments/mujoco.md similarity index 98% rename from docs/environments/mujoco/index.md rename to docs/environments/mujoco.md index ff2b3247a..e9bf3a11d 100644 --- a/docs/environments/mujoco/index.md +++ b/docs/environments/mujoco.md @@ -21,7 +21,7 @@ walker2d ``` ```{raw} html - :file: index.html + :file: mujoco/list.html ``` MuJoCo stands for Multi-Joint dynamics with Contact. It is a physics engine for faciliatating research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed. diff --git a/docs/environments/mujoco/.gitkeep b/docs/environments/mujoco/.gitkeep new file mode 100644 index 000000000..e69de29bb diff --git a/docs/environments/mujoco/index.html b/docs/environments/mujoco/index.html deleted file mode 100644 index bab47bd06..000000000 --- a/docs/environments/mujoco/index.html +++ /dev/null @@ -1,125 +0,0 @@ - -
- - -
-
- -
-
- Ant -
-
-
- - - -
-
- -
-
- Half Cheetah -
-
-
- - - -
-
- -
-
- Hopper -
-
-
- - - -
-
- -
-
- Humanoid Standup -
-
-
- - - -
-
- -
-
- Humanoid -
-
-
- - - -
-
- -
-
- Inverted Double Pendulum -
-
-
- - - -
-
- -
-
- Inverted Pendulum -
-
-
- - - -
-
- -
-
- Reacher -
-
-
- - - -
-
- -
-
- Swimmer -
-
-
- - - -
-
- -
-
- Walker2D -
-
-
- -
- - \ No newline at end of file diff --git a/docs/environments/third_party_environments/index.md b/docs/environments/third_party_environments.md similarity index 100% rename from docs/environments/third_party_environments/index.md rename to docs/environments/third_party_environments.md diff --git a/docs/environments/toy_text/index.md b/docs/environments/toy_text.md similarity index 71% rename from docs/environments/toy_text/index.md rename to docs/environments/toy_text.md index eb7cd344b..dfd4d06b1 100644 --- a/docs/environments/toy_text/index.md +++ b/docs/environments/toy_text.md @@ -8,18 +8,18 @@ lastpage: ```{toctree} :hidden: -blackjack.md -taxi.md -cliff_walking.md -frozen_lake.md +toy_text/blackjack.md +toy_text/taxi.md +toy_text/cliff_walking.md +toy_text/frozen_lake.md ``` ```{raw} html - :file: index.html + :file: toy_text/list.html ``` -All toy text environments were created by us using native Python libraries such as StringIO. +All toy text environments were created by us using native Python libraries such as StringIO. -These environments are designed to be extremely simple, with small discrete state and action spaces, and hence easy to learn. As a result, they are suitable for debugging implementations of reinforcement learning algorithms. +These environments are designed to be extremely simple, with small discrete state and action spaces, and hence easy to learn. As a result, they are suitable for debugging implementations of reinforcement learning algorithms. All environments are configurable via arguments specified in each environment's documentation. diff --git a/docs/environments/toy_text/.gitkeep b/docs/environments/toy_text/.gitkeep new file mode 100644 index 000000000..e69de29bb diff --git a/docs/environments/toy_text/index.html b/docs/environments/toy_text/index.html deleted file mode 100644 index 84adffbae..000000000 --- a/docs/environments/toy_text/index.html +++ /dev/null @@ -1,29 +0,0 @@ - -
- - -
-
- -
-
- Blackjack -
-
-
- - - -
-
- -
-
- Frozen Lake -
-
-
- -
- - \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index b7b5871a3..405be3369 100644 --- a/docs/index.md +++ b/docs/index.md @@ -26,7 +26,7 @@ for _ in range(1000): if terminated or truncated: observation, info = env.reset() env.close() -``` +``` ```{toctree} :hidden: @@ -50,12 +50,12 @@ api/utils :hidden: :caption: Environments -environments/atari/index -environments/mujoco/index -environments/toy_text/index -environments/classic_control/index -environments/box2d/index -environments/third_party_environments/index +environments/atari +environments/mujoco +environments/toy_text +environments/classic_control +environments/box2d +environments/third_party_environments ``` ```{toctree} diff --git a/docs/scripts/gen_envs_display.py b/docs/scripts/gen_envs_display.py index 0873ac85e..2f79cbb58 100644 --- a/docs/scripts/gen_envs_display.py +++ b/docs/scripts/gen_envs_display.py @@ -1,3 +1,4 @@ +import os import sys all_envs = [ @@ -16,7 +17,7 @@ all_envs = [ "walker2d", ], }, - {"id": "toy_text", "list": ["blackjack", "frozen_lake"]}, + {"id": "toy_text", "list": ["blackjack", "cliff_walking", "frozen_lake", "taxi"]}, {"id": "box2d", "list": ["bipedal_walker", "car_racing", "lunar_lander"]}, { "id": "classic_control", @@ -124,11 +125,13 @@ def generate_page(env, limit=-1, base_path=""): cells = "\n".join(cells[:limit]) more_btn = ( - """ - - """ + """ + + + +""" if not non_limited_page else "" ) @@ -160,16 +163,32 @@ if __name__ == "__main__": envs_path = f"../environments/{type_id}" if len(type_dict["list"]) > 20: page = generate_page(type_dict, limit=9) - fp = open(f"{envs_path}/index.html", "w+", encoding="utf-8") + fp = open( + os.path.join(os.path.dirname(__file__), envs_path, "list.html"), + "w", + encoding="utf-8", + ) fp.write(page) fp.close() page = generate_page(type_dict, base_path="../") - fp = open(f"{envs_path}/complete_list.html", "w+", encoding="utf-8") + fp = open( + os.path.join( + os.path.dirname(__file__), envs_path, "complete_list.html" + ), + "w", + encoding="utf-8", + ) fp.write(page) fp.close() - fp = open(f"{envs_path}/complete_list.md", "w+", encoding="utf-8") + fp = open( + os.path.join( + os.path.dirname(__file__), envs_path, "complete_list.html" + ), + "w", + encoding="utf-8", + ) env_name = " ".join(type_id.split("_")).title() fp.write( f"# Complete List - {env_name}\n" @@ -178,6 +197,10 @@ if __name__ == "__main__": fp.close() else: page = generate_page(type_dict) - fp = open(f"{envs_path}/index.html", "w+", encoding="utf-8") + fp = open( + os.path.join(os.path.dirname(__file__), envs_path, "list.html"), + "w", + encoding="utf-8", + ) fp.write(page) fp.close() diff --git a/gymnasium/envs/toy_text/cliffwalking.py b/gymnasium/envs/toy_text/cliffwalking.py index 477380486..f8d70f192 100644 --- a/gymnasium/envs/toy_text/cliffwalking.py +++ b/gymnasium/envs/toy_text/cliffwalking.py @@ -24,8 +24,7 @@ class CliffWalkingEnv(Env): by Sutton and Barto](http://incompleteideas.net/book/bookdraft2018jan1.pdf). With inspiration from: - [https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py] - (https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py) + [https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py](https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py) ### Description The board is a 4x12 matrix, with (using NumPy matrix indexing): @@ -287,5 +286,6 @@ class CliffWalkingEnv(Env): with closing(outfile): return outfile.getvalue() + # Elf and stool from https://franuka.itch.io/rpg-snow-tileset # All other assets by ____