Docs automation (#16)

2025-08-30 01:50:19 +00:00 · 2022-09-15 09:49:24 +01:00
parent fdb7045453
commit 4d61477b7c
25 changed files with 43 additions and 2848 deletions
--- a/docs/environments/box2d/bipedal_walker.md
+++ b/docs/environments/box2d/bipedal_walker.md
@@ -1,83 +0,0 @@
---
-AUTOGENERATED: DO NOT EDIT FILE DIRECTLY
-title: Bipedal Walker
-firstpage:
---
-
-# Bipedal Walker
-
-```{figure} ../../_static/videos/box2d/bipedal_walker.gif 
-:width: 200px
-:name: bipedal_walker
-```
-
-This environment is part of the <a href='..'>Box2D environments</a>. Please read that page first for general information.
-
-|   |   |
-|---|---|
-| Action Space | Box(-1.0, 1.0, (4,), float32) |
-| Observation Shape | (24,) |
-| Observation High | [3.14 5.   5.   5.   3.14 5.   3.14 5.   5.   3.14 5.   3.14 5.   5.  1.   1.   1.   1.   1.   1.   1.   1.   1.   1.  ] |
-| Observation Low | [-3.14 -5.   -5.   -5.   -3.14 -5.   -3.14 -5.   -0.   -3.14 -5.   -3.14  -5.   -0.   -1.   -1.   -1.   -1.   -1.   -1.   -1.   -1.   -1.   -1.  ] |
-| Import | `gymnasium.make("BipedalWalker-v3")` | 
-
-
-### Description
-This is a simple 4-joint walker robot environment.
-There are two versions:
- Normal, with slightly uneven terrain.
- Hardcore, with ladders, stumps, pitfalls.
-
-To solve the normal version, you need to get 300 points in 1600 time steps.
-To solve the hardcore version, you need 300 points in 2000 time steps.
-
-A heuristic is provided for testing. It's also useful to get demonstrations
-to learn from. To run the heuristic:
-```
-python gymnasium/envs/box2d/bipedal_walker.py
-```
-
-### Action Space
-Actions are motor speed values in the [-1, 1] range for each of the
-4 joints at both hips and knees.
-
-### Observation Space
-State consists of hull angle speed, angular velocity, horizontal speed,
-vertical speed, position of joints and joints angular speed, legs contact
-with ground, and 10 lidar rangefinder measurements. There are no coordinates
-in the state vector.
-
-### Rewards
-Reward is given for moving forward, totaling 300+ points up to the far end.
-If the robot falls, it gets -100. Applying motor torque costs a small
-amount of points. A more optimal agent will get a better score.
-
-### Starting State
-The walker starts standing at the left end of the terrain with the hull
-horizontal, and both legs in the same position with a slight knee angle.
-
-### Episode Termination
-The episode will terminate if the hull gets in contact with the ground or
-if the walker exceeds the right end of the terrain length.
-
-### Arguments
-To use to the _hardcore_ environment, you need to specify the
-`hardcore=True` argument like below:
-```python
-import gymnasium
-env = gymnasium.make("BipedalWalker-v3", hardcore=True)
-```
-
-### Version History
- v3: returns closest lidar trace instead of furthest;
-    faster video recording
- v2: Count energy spent
- v1: Legs now report contact with ground; motors have higher torque and
-    speed; ground has higher friction; lidar rendered less nervously.
- v0: Initial version
-
-
-<!-- ### References -->
-
-### Credits
-Created by Oleg Klimov
--- a/docs/environments/box2d/car_racing.md
+++ b/docs/environments/box2d/car_racing.md
@@ -1,97 +0,0 @@
---
-AUTOGENERATED: DO NOT EDIT FILE DIRECTLY
-title: Car Racing
---
-
-# Car Racing
-
-```{figure} ../../_static/videos/box2d/car_racing.gif 
-:width: 200px
-:name: car_racing
-```
-
-This environment is part of the <a href='..'>Box2D environments</a>. Please read that page first for general information.
-
-|   |   |
-|---|---|
-| Action Space | Box([-1.  0.  0.], 1.0, (3,), float32) |
-| Observation Shape | (96, 96, 3) |
-| Observation High | 255 |
-| Observation Low | 0 |
-| Import | `gymnasium.make("CarRacing-v2")` | 
-
-
-### Description
-The easiest control task to learn from pixels - a top-down
-racing environment. The generated track is random every episode.
-
-Some indicators are shown at the bottom of the window along with the
-state RGB buffer. From left to right: true speed, four ABS sensors,
-steering wheel position, and gyroscope.
-To play yourself (it's rather fast for humans), type:
-```
-python gymnasium/envs/box2d/car_racing.py
-```
-Remember: it's a powerful rear-wheel drive car - don't press the accelerator
-and turn at the same time.
-
-### Action Space
-If continuous:
-    There are 3 actions: steering (-1 is full left, +1 is full right), gas, and breaking.
-If discrete:
-    There are 5 actions: do nothing, steer left, steer right, gas, brake.
-
-### Observation Space
-State consists of 96x96 pixels.
-
-### Rewards
-The reward is -0.1 every frame and +1000/N for every track tile visited,
-where N is the total number of tiles visited in the track. For example,
-if you have finished in 732 frames, your reward is
-1000 - 0.1*732 = 926.8 points.
-
-### Starting State
-The car starts at rest in the center of the road.
-
-### Episode Termination
-The episode finishes when all of the tiles are visited. The car can also go
-outside of the playfield - that is, far off the track, in which case it will
-receive -100 reward and die.
-
-### Arguments
-`lap_complete_percent` dictates the percentage of tiles that must be visited by
-the agent before a lap is considered complete.
-
-Passing `domain_randomize=True` enables the domain randomized variant of the environment.
-In this scenario, the background and track colours are different on every reset.
-
-Passing `continuous=False` converts the environment to use discrete action space.
-The discrete action space has 5 actions: [do nothing, left, right, gas, brake].
-
-### Reset Arguments
-Passing the option `options["randomize"] = True` will change the current colour of the environment on demand.
-Correspondingly, passing the option `options["randomize"] = False` will not change the current colour of the environment.
-`domain_randomize` must be `True` on init for this argument to work.
-Example usage:
-```py
-    env = gymnasium.make("CarRacing-v1", domain_randomize=True)
-
-    # normal reset, this changes the colour scheme by default
-    env.reset()
-
-    # reset with colour scheme change
-    env.reset(options={"randomize": True})
-
-    # reset with no colour scheme change
-    env.reset(options={"randomize": False})
-```
-
-### Version History
- v1: Change track completion logic and add domain randomization (0.24.0)
- v0: Original version
-
-### References
- Chris Campbell (2014), http://www.iforce2d.net/b2dtut/top-down-car.
-
-### Credits
-Created by Oleg Klimov
--- a/docs/environments/box2d/lunar_lander.md
+++ b/docs/environments/box2d/lunar_lander.md
@@ -1,125 +0,0 @@
---
-AUTOGENERATED: DO NOT EDIT FILE DIRECTLY
-title: Lunar Lander
-lastpage:
---
-
-# Lunar Lander
-
-```{figure} ../../_static/videos/box2d/lunar_lander.gif 
-:width: 200px
-:name: lunar_lander
-```
-
-This environment is part of the <a href='..'>Box2D environments</a>. Please read that page first for general information.
-
-|   |   |
-|---|---|
-| Action Space | Discrete(4) |
-| Observation Shape | (8,) |
-| Observation High | [1.5  1.5  5.   5.   3.14 5.   1.   1.  ] |
-| Observation Low | [-1.5  -1.5  -5.   -5.   -3.14 -5.   -0.   -0.  ] |
-| Import | `gymnasium.make("LunarLander-v2")` | 
-
-
-### Description
-This environment is a classic rocket trajectory optimization problem.
-According to Pontryagin's maximum principle, it is optimal to fire the
-engine at full throttle or turn it off. This is the reason why this
-environment has discrete actions: engine on or off.
-
-There are two environment versions: discrete or continuous.
-The landing pad is always at coordinates (0,0). The coordinates are the
-first two numbers in the state vector.
-Landing outside of the landing pad is possible. Fuel is infinite, so an agent
-can learn to fly and then land on its first attempt.
-
-To see a heuristic landing, run:
-```
-python gymnasium/envs/box2d/lunar_lander.py
-```
-<!-- To play yourself, run: -->
-<!-- python examples/agents/keyboard_agent.py LunarLander-v2 -->
-
-### Action Space
-There are four discrete actions available: do nothing, fire left
-orientation engine, fire main engine, fire right orientation engine.
-
-### Observation Space
-The state is an 8-dimensional vector: the coordinates of the lander in `x` & `y`, its linear
-velocities in `x` & `y`, its angle, its angular velocity, and two booleans
-that represent whether each leg is in contact with the ground or not.
-
-### Rewards
-Reward for moving from the top of the screen to the landing pad and coming
-to rest is about 100-140 points.
-If the lander moves away from the landing pad, it loses reward.
-If the lander crashes, it receives an additional -100 points. If it comes
-to rest, it receives an additional +100 points. Each leg with ground
-contact is +10 points.
-Firing the main engine is -0.3 points each frame. Firing the side engine
-is -0.03 points each frame. Solved is 200 points.
-
-### Starting State
-The lander starts at the top center of the viewport with a random initial
-force applied to its center of mass.
-
-### Episode Termination
-The episode finishes if:
-1) the lander crashes (the lander body gets in contact with the moon);
-2) the lander gets outside of the viewport (`x` coordinate is greater than 1);
-3) the lander is not awake. From the [Box2D docs](https://box2d.org/documentation/md__d_1__git_hub_box2d_docs_dynamics.html#autotoc_md61),
-    a body which is not awake is a body which doesn't move and doesn't
-    collide with any other body:
-> When Box2D determines that a body (or group of bodies) has come to rest,
-> the body enters a sleep state which has very little CPU overhead. If a
-> body is awake and collides with a sleeping body, then the sleeping body
-> wakes up. Bodies will also wake up if a joint or contact attached to
-> them is destroyed.
-
-### Arguments
-To use to the _continuous_ environment, you need to specify the
-`continuous=True` argument like below:
-```python
-import gymnasium
-env = gymnasium.make(
-    "LunarLander-v2",
-    continuous: bool = False,
-    gravity: float = -10.0,
-    enable_wind: bool = False,
-    wind_power: float = 15.0,
-    turbulence_power: float = 1.5,
-)
-```
-If `continuous=True` is passed, continuous actions (corresponding to the throttle of the engines) will be used and the
-action space will be `Box(-1, +1, (2,), dtype=np.float32)`.
-The first coordinate of an action determines the throttle of the main engine, while the second
-coordinate specifies the throttle of the lateral boosters.
-Given an action `np.array([main, lateral])`, the main engine will be turned off completely if
-`main < 0` and the throttle scales affinely from 50% to 100% for `0 <= main <= 1` (in particular, the
-main engine doesn't work  with less than 50% power).
-Similarly, if `-0.5 < lateral < 0.5`, the lateral boosters will not fire at all. If `lateral < -0.5`, the left
-booster will fire, and if `lateral > 0.5`, the right booster will fire. Again, the throttle scales affinely
-from 50% to 100% between -1 and -0.5 (and 0.5 and 1, respectively).
-
-`gravity` dictates the gravitational constant, this is bounded to be within 0 and -12.
-
-If `enable_wind=True` is passed, there will be wind effects applied to the lander.
-The wind is generated using the function `tanh(sin(2 k (t+C)) + sin(pi k (t+C)))`.
-`k` is set to 0.01.
-`C` is sampled randomly between -9999 and 9999.
-
-`wind_power` dictates the maximum magnitude of linear wind applied to the craft. The recommended value for `wind_power` is between 0.0 and 20.0.
-`turbulence_power` dictates the maximum magnitude of rotational wind applied to the craft. The recommended value for `turbulence_power` is between 0.0 and 2.0.
-
-### Version History
- v2: Count energy spent and in v0.24, added turbulance with wind power and turbulence_power parameters
- v1: Legs contact with ground added in state vector; contact with ground
-    give +10 reward points, and -10 if then lose contact; reward
-    renormalized to 200; harder initial random push.
- v0: Initial version
-
-<!-- ### References -->
-
-### Credits
-Created by Oleg Klimov