Docs automation (#16)

This commit is contained in:
Manuel Goulão
2022-09-15 09:49:24 +01:00
committed by GitHub
parent fdb7045453
commit 4d61477b7c
25 changed files with 43 additions and 2848 deletions

View File

@@ -1,83 +0,0 @@
---
AUTOGENERATED: DO NOT EDIT FILE DIRECTLY
title: Bipedal Walker
firstpage:
---
# Bipedal Walker
```{figure} ../../_static/videos/box2d/bipedal_walker.gif
:width: 200px
:name: bipedal_walker
```
This environment is part of the <a href='..'>Box2D environments</a>. Please read that page first for general information.
| | |
|---|---|
| Action Space | Box(-1.0, 1.0, (4,), float32) |
| Observation Shape | (24,) |
| Observation High | [3.14 5. 5. 5. 3.14 5. 3.14 5. 5. 3.14 5. 3.14 5. 5. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. ] |
| Observation Low | [-3.14 -5. -5. -5. -3.14 -5. -3.14 -5. -0. -3.14 -5. -3.14 -5. -0. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. ] |
| Import | `gymnasium.make("BipedalWalker-v3")` |
### Description
This is a simple 4-joint walker robot environment.
There are two versions:
- Normal, with slightly uneven terrain.
- Hardcore, with ladders, stumps, pitfalls.
To solve the normal version, you need to get 300 points in 1600 time steps.
To solve the hardcore version, you need 300 points in 2000 time steps.
A heuristic is provided for testing. It's also useful to get demonstrations
to learn from. To run the heuristic:
```
python gymnasium/envs/box2d/bipedal_walker.py
```
### Action Space
Actions are motor speed values in the [-1, 1] range for each of the
4 joints at both hips and knees.
### Observation Space
State consists of hull angle speed, angular velocity, horizontal speed,
vertical speed, position of joints and joints angular speed, legs contact
with ground, and 10 lidar rangefinder measurements. There are no coordinates
in the state vector.
### Rewards
Reward is given for moving forward, totaling 300+ points up to the far end.
If the robot falls, it gets -100. Applying motor torque costs a small
amount of points. A more optimal agent will get a better score.
### Starting State
The walker starts standing at the left end of the terrain with the hull
horizontal, and both legs in the same position with a slight knee angle.
### Episode Termination
The episode will terminate if the hull gets in contact with the ground or
if the walker exceeds the right end of the terrain length.
### Arguments
To use to the _hardcore_ environment, you need to specify the
`hardcore=True` argument like below:
```python
import gymnasium
env = gymnasium.make("BipedalWalker-v3", hardcore=True)
```
### Version History
- v3: returns closest lidar trace instead of furthest;
faster video recording
- v2: Count energy spent
- v1: Legs now report contact with ground; motors have higher torque and
speed; ground has higher friction; lidar rendered less nervously.
- v0: Initial version
<!-- ### References -->
### Credits
Created by Oleg Klimov

View File

@@ -1,97 +0,0 @@
---
AUTOGENERATED: DO NOT EDIT FILE DIRECTLY
title: Car Racing
---
# Car Racing
```{figure} ../../_static/videos/box2d/car_racing.gif
:width: 200px
:name: car_racing
```
This environment is part of the <a href='..'>Box2D environments</a>. Please read that page first for general information.
| | |
|---|---|
| Action Space | Box([-1. 0. 0.], 1.0, (3,), float32) |
| Observation Shape | (96, 96, 3) |
| Observation High | 255 |
| Observation Low | 0 |
| Import | `gymnasium.make("CarRacing-v2")` |
### Description
The easiest control task to learn from pixels - a top-down
racing environment. The generated track is random every episode.
Some indicators are shown at the bottom of the window along with the
state RGB buffer. From left to right: true speed, four ABS sensors,
steering wheel position, and gyroscope.
To play yourself (it's rather fast for humans), type:
```
python gymnasium/envs/box2d/car_racing.py
```
Remember: it's a powerful rear-wheel drive car - don't press the accelerator
and turn at the same time.
### Action Space
If continuous:
There are 3 actions: steering (-1 is full left, +1 is full right), gas, and breaking.
If discrete:
There are 5 actions: do nothing, steer left, steer right, gas, brake.
### Observation Space
State consists of 96x96 pixels.
### Rewards
The reward is -0.1 every frame and +1000/N for every track tile visited,
where N is the total number of tiles visited in the track. For example,
if you have finished in 732 frames, your reward is
1000 - 0.1*732 = 926.8 points.
### Starting State
The car starts at rest in the center of the road.
### Episode Termination
The episode finishes when all of the tiles are visited. The car can also go
outside of the playfield - that is, far off the track, in which case it will
receive -100 reward and die.
### Arguments
`lap_complete_percent` dictates the percentage of tiles that must be visited by
the agent before a lap is considered complete.
Passing `domain_randomize=True` enables the domain randomized variant of the environment.
In this scenario, the background and track colours are different on every reset.
Passing `continuous=False` converts the environment to use discrete action space.
The discrete action space has 5 actions: [do nothing, left, right, gas, brake].
### Reset Arguments
Passing the option `options["randomize"] = True` will change the current colour of the environment on demand.
Correspondingly, passing the option `options["randomize"] = False` will not change the current colour of the environment.
`domain_randomize` must be `True` on init for this argument to work.
Example usage:
```py
env = gymnasium.make("CarRacing-v1", domain_randomize=True)
# normal reset, this changes the colour scheme by default
env.reset()
# reset with colour scheme change
env.reset(options={"randomize": True})
# reset with no colour scheme change
env.reset(options={"randomize": False})
```
### Version History
- v1: Change track completion logic and add domain randomization (0.24.0)
- v0: Original version
### References
- Chris Campbell (2014), http://www.iforce2d.net/b2dtut/top-down-car.
### Credits
Created by Oleg Klimov

View File

@@ -1,125 +0,0 @@
---
AUTOGENERATED: DO NOT EDIT FILE DIRECTLY
title: Lunar Lander
lastpage:
---
# Lunar Lander
```{figure} ../../_static/videos/box2d/lunar_lander.gif
:width: 200px
:name: lunar_lander
```
This environment is part of the <a href='..'>Box2D environments</a>. Please read that page first for general information.
| | |
|---|---|
| Action Space | Discrete(4) |
| Observation Shape | (8,) |
| Observation High | [1.5 1.5 5. 5. 3.14 5. 1. 1. ] |
| Observation Low | [-1.5 -1.5 -5. -5. -3.14 -5. -0. -0. ] |
| Import | `gymnasium.make("LunarLander-v2")` |
### Description
This environment is a classic rocket trajectory optimization problem.
According to Pontryagin's maximum principle, it is optimal to fire the
engine at full throttle or turn it off. This is the reason why this
environment has discrete actions: engine on or off.
There are two environment versions: discrete or continuous.
The landing pad is always at coordinates (0,0). The coordinates are the
first two numbers in the state vector.
Landing outside of the landing pad is possible. Fuel is infinite, so an agent
can learn to fly and then land on its first attempt.
To see a heuristic landing, run:
```
python gymnasium/envs/box2d/lunar_lander.py
```
<!-- To play yourself, run: -->
<!-- python examples/agents/keyboard_agent.py LunarLander-v2 -->
### Action Space
There are four discrete actions available: do nothing, fire left
orientation engine, fire main engine, fire right orientation engine.
### Observation Space
The state is an 8-dimensional vector: the coordinates of the lander in `x` & `y`, its linear
velocities in `x` & `y`, its angle, its angular velocity, and two booleans
that represent whether each leg is in contact with the ground or not.
### Rewards
Reward for moving from the top of the screen to the landing pad and coming
to rest is about 100-140 points.
If the lander moves away from the landing pad, it loses reward.
If the lander crashes, it receives an additional -100 points. If it comes
to rest, it receives an additional +100 points. Each leg with ground
contact is +10 points.
Firing the main engine is -0.3 points each frame. Firing the side engine
is -0.03 points each frame. Solved is 200 points.
### Starting State
The lander starts at the top center of the viewport with a random initial
force applied to its center of mass.
### Episode Termination
The episode finishes if:
1) the lander crashes (the lander body gets in contact with the moon);
2) the lander gets outside of the viewport (`x` coordinate is greater than 1);
3) the lander is not awake. From the [Box2D docs](https://box2d.org/documentation/md__d_1__git_hub_box2d_docs_dynamics.html#autotoc_md61),
a body which is not awake is a body which doesn't move and doesn't
collide with any other body:
> When Box2D determines that a body (or group of bodies) has come to rest,
> the body enters a sleep state which has very little CPU overhead. If a
> body is awake and collides with a sleeping body, then the sleeping body
> wakes up. Bodies will also wake up if a joint or contact attached to
> them is destroyed.
### Arguments
To use to the _continuous_ environment, you need to specify the
`continuous=True` argument like below:
```python
import gymnasium
env = gymnasium.make(
"LunarLander-v2",
continuous: bool = False,
gravity: float = -10.0,
enable_wind: bool = False,
wind_power: float = 15.0,
turbulence_power: float = 1.5,
)
```
If `continuous=True` is passed, continuous actions (corresponding to the throttle of the engines) will be used and the
action space will be `Box(-1, +1, (2,), dtype=np.float32)`.
The first coordinate of an action determines the throttle of the main engine, while the second
coordinate specifies the throttle of the lateral boosters.
Given an action `np.array([main, lateral])`, the main engine will be turned off completely if
`main < 0` and the throttle scales affinely from 50% to 100% for `0 <= main <= 1` (in particular, the
main engine doesn't work with less than 50% power).
Similarly, if `-0.5 < lateral < 0.5`, the lateral boosters will not fire at all. If `lateral < -0.5`, the left
booster will fire, and if `lateral > 0.5`, the right booster will fire. Again, the throttle scales affinely
from 50% to 100% between -1 and -0.5 (and 0.5 and 1, respectively).
`gravity` dictates the gravitational constant, this is bounded to be within 0 and -12.
If `enable_wind=True` is passed, there will be wind effects applied to the lander.
The wind is generated using the function `tanh(sin(2 k (t+C)) + sin(pi k (t+C)))`.
`k` is set to 0.01.
`C` is sampled randomly between -9999 and 9999.
`wind_power` dictates the maximum magnitude of linear wind applied to the craft. The recommended value for `wind_power` is between 0.0 and 20.0.
`turbulence_power` dictates the maximum magnitude of rotational wind applied to the craft. The recommended value for `turbulence_power` is between 0.0 and 2.0.
### Version History
- v2: Count energy spent and in v0.24, added turbulance with wind power and turbulence_power parameters
- v1: Legs contact with ground added in state vector; contact with ground
give +10 reward points, and -10 if then lose contact; reward
renormalized to 200; harder initial random push.
- v0: Initial version
<!-- ### References -->
### Credits
Created by Oleg Klimov