mirror of
https://github.com/Farama-Foundation/Gymnasium.git
synced 2025-08-30 01:50:19 +00:00
Docs automation (#16)
This commit is contained in:
@@ -1,83 +0,0 @@
|
||||
---
|
||||
AUTOGENERATED: DO NOT EDIT FILE DIRECTLY
|
||||
title: Bipedal Walker
|
||||
firstpage:
|
||||
---
|
||||
|
||||
# Bipedal Walker
|
||||
|
||||
```{figure} ../../_static/videos/box2d/bipedal_walker.gif
|
||||
:width: 200px
|
||||
:name: bipedal_walker
|
||||
```
|
||||
|
||||
This environment is part of the <a href='..'>Box2D environments</a>. Please read that page first for general information.
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Action Space | Box(-1.0, 1.0, (4,), float32) |
|
||||
| Observation Shape | (24,) |
|
||||
| Observation High | [3.14 5. 5. 5. 3.14 5. 3.14 5. 5. 3.14 5. 3.14 5. 5. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. ] |
|
||||
| Observation Low | [-3.14 -5. -5. -5. -3.14 -5. -3.14 -5. -0. -3.14 -5. -3.14 -5. -0. -1. -1. -1. -1. -1. -1. -1. -1. -1. -1. ] |
|
||||
| Import | `gymnasium.make("BipedalWalker-v3")` |
|
||||
|
||||
|
||||
### Description
|
||||
This is a simple 4-joint walker robot environment.
|
||||
There are two versions:
|
||||
- Normal, with slightly uneven terrain.
|
||||
- Hardcore, with ladders, stumps, pitfalls.
|
||||
|
||||
To solve the normal version, you need to get 300 points in 1600 time steps.
|
||||
To solve the hardcore version, you need 300 points in 2000 time steps.
|
||||
|
||||
A heuristic is provided for testing. It's also useful to get demonstrations
|
||||
to learn from. To run the heuristic:
|
||||
```
|
||||
python gymnasium/envs/box2d/bipedal_walker.py
|
||||
```
|
||||
|
||||
### Action Space
|
||||
Actions are motor speed values in the [-1, 1] range for each of the
|
||||
4 joints at both hips and knees.
|
||||
|
||||
### Observation Space
|
||||
State consists of hull angle speed, angular velocity, horizontal speed,
|
||||
vertical speed, position of joints and joints angular speed, legs contact
|
||||
with ground, and 10 lidar rangefinder measurements. There are no coordinates
|
||||
in the state vector.
|
||||
|
||||
### Rewards
|
||||
Reward is given for moving forward, totaling 300+ points up to the far end.
|
||||
If the robot falls, it gets -100. Applying motor torque costs a small
|
||||
amount of points. A more optimal agent will get a better score.
|
||||
|
||||
### Starting State
|
||||
The walker starts standing at the left end of the terrain with the hull
|
||||
horizontal, and both legs in the same position with a slight knee angle.
|
||||
|
||||
### Episode Termination
|
||||
The episode will terminate if the hull gets in contact with the ground or
|
||||
if the walker exceeds the right end of the terrain length.
|
||||
|
||||
### Arguments
|
||||
To use to the _hardcore_ environment, you need to specify the
|
||||
`hardcore=True` argument like below:
|
||||
```python
|
||||
import gymnasium
|
||||
env = gymnasium.make("BipedalWalker-v3", hardcore=True)
|
||||
```
|
||||
|
||||
### Version History
|
||||
- v3: returns closest lidar trace instead of furthest;
|
||||
faster video recording
|
||||
- v2: Count energy spent
|
||||
- v1: Legs now report contact with ground; motors have higher torque and
|
||||
speed; ground has higher friction; lidar rendered less nervously.
|
||||
- v0: Initial version
|
||||
|
||||
|
||||
<!-- ### References -->
|
||||
|
||||
### Credits
|
||||
Created by Oleg Klimov
|
@@ -1,97 +0,0 @@
|
||||
---
|
||||
AUTOGENERATED: DO NOT EDIT FILE DIRECTLY
|
||||
title: Car Racing
|
||||
---
|
||||
|
||||
# Car Racing
|
||||
|
||||
```{figure} ../../_static/videos/box2d/car_racing.gif
|
||||
:width: 200px
|
||||
:name: car_racing
|
||||
```
|
||||
|
||||
This environment is part of the <a href='..'>Box2D environments</a>. Please read that page first for general information.
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Action Space | Box([-1. 0. 0.], 1.0, (3,), float32) |
|
||||
| Observation Shape | (96, 96, 3) |
|
||||
| Observation High | 255 |
|
||||
| Observation Low | 0 |
|
||||
| Import | `gymnasium.make("CarRacing-v2")` |
|
||||
|
||||
|
||||
### Description
|
||||
The easiest control task to learn from pixels - a top-down
|
||||
racing environment. The generated track is random every episode.
|
||||
|
||||
Some indicators are shown at the bottom of the window along with the
|
||||
state RGB buffer. From left to right: true speed, four ABS sensors,
|
||||
steering wheel position, and gyroscope.
|
||||
To play yourself (it's rather fast for humans), type:
|
||||
```
|
||||
python gymnasium/envs/box2d/car_racing.py
|
||||
```
|
||||
Remember: it's a powerful rear-wheel drive car - don't press the accelerator
|
||||
and turn at the same time.
|
||||
|
||||
### Action Space
|
||||
If continuous:
|
||||
There are 3 actions: steering (-1 is full left, +1 is full right), gas, and breaking.
|
||||
If discrete:
|
||||
There are 5 actions: do nothing, steer left, steer right, gas, brake.
|
||||
|
||||
### Observation Space
|
||||
State consists of 96x96 pixels.
|
||||
|
||||
### Rewards
|
||||
The reward is -0.1 every frame and +1000/N for every track tile visited,
|
||||
where N is the total number of tiles visited in the track. For example,
|
||||
if you have finished in 732 frames, your reward is
|
||||
1000 - 0.1*732 = 926.8 points.
|
||||
|
||||
### Starting State
|
||||
The car starts at rest in the center of the road.
|
||||
|
||||
### Episode Termination
|
||||
The episode finishes when all of the tiles are visited. The car can also go
|
||||
outside of the playfield - that is, far off the track, in which case it will
|
||||
receive -100 reward and die.
|
||||
|
||||
### Arguments
|
||||
`lap_complete_percent` dictates the percentage of tiles that must be visited by
|
||||
the agent before a lap is considered complete.
|
||||
|
||||
Passing `domain_randomize=True` enables the domain randomized variant of the environment.
|
||||
In this scenario, the background and track colours are different on every reset.
|
||||
|
||||
Passing `continuous=False` converts the environment to use discrete action space.
|
||||
The discrete action space has 5 actions: [do nothing, left, right, gas, brake].
|
||||
|
||||
### Reset Arguments
|
||||
Passing the option `options["randomize"] = True` will change the current colour of the environment on demand.
|
||||
Correspondingly, passing the option `options["randomize"] = False` will not change the current colour of the environment.
|
||||
`domain_randomize` must be `True` on init for this argument to work.
|
||||
Example usage:
|
||||
```py
|
||||
env = gymnasium.make("CarRacing-v1", domain_randomize=True)
|
||||
|
||||
# normal reset, this changes the colour scheme by default
|
||||
env.reset()
|
||||
|
||||
# reset with colour scheme change
|
||||
env.reset(options={"randomize": True})
|
||||
|
||||
# reset with no colour scheme change
|
||||
env.reset(options={"randomize": False})
|
||||
```
|
||||
|
||||
### Version History
|
||||
- v1: Change track completion logic and add domain randomization (0.24.0)
|
||||
- v0: Original version
|
||||
|
||||
### References
|
||||
- Chris Campbell (2014), http://www.iforce2d.net/b2dtut/top-down-car.
|
||||
|
||||
### Credits
|
||||
Created by Oleg Klimov
|
@@ -1,125 +0,0 @@
|
||||
---
|
||||
AUTOGENERATED: DO NOT EDIT FILE DIRECTLY
|
||||
title: Lunar Lander
|
||||
lastpage:
|
||||
---
|
||||
|
||||
# Lunar Lander
|
||||
|
||||
```{figure} ../../_static/videos/box2d/lunar_lander.gif
|
||||
:width: 200px
|
||||
:name: lunar_lander
|
||||
```
|
||||
|
||||
This environment is part of the <a href='..'>Box2D environments</a>. Please read that page first for general information.
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| Action Space | Discrete(4) |
|
||||
| Observation Shape | (8,) |
|
||||
| Observation High | [1.5 1.5 5. 5. 3.14 5. 1. 1. ] |
|
||||
| Observation Low | [-1.5 -1.5 -5. -5. -3.14 -5. -0. -0. ] |
|
||||
| Import | `gymnasium.make("LunarLander-v2")` |
|
||||
|
||||
|
||||
### Description
|
||||
This environment is a classic rocket trajectory optimization problem.
|
||||
According to Pontryagin's maximum principle, it is optimal to fire the
|
||||
engine at full throttle or turn it off. This is the reason why this
|
||||
environment has discrete actions: engine on or off.
|
||||
|
||||
There are two environment versions: discrete or continuous.
|
||||
The landing pad is always at coordinates (0,0). The coordinates are the
|
||||
first two numbers in the state vector.
|
||||
Landing outside of the landing pad is possible. Fuel is infinite, so an agent
|
||||
can learn to fly and then land on its first attempt.
|
||||
|
||||
To see a heuristic landing, run:
|
||||
```
|
||||
python gymnasium/envs/box2d/lunar_lander.py
|
||||
```
|
||||
<!-- To play yourself, run: -->
|
||||
<!-- python examples/agents/keyboard_agent.py LunarLander-v2 -->
|
||||
|
||||
### Action Space
|
||||
There are four discrete actions available: do nothing, fire left
|
||||
orientation engine, fire main engine, fire right orientation engine.
|
||||
|
||||
### Observation Space
|
||||
The state is an 8-dimensional vector: the coordinates of the lander in `x` & `y`, its linear
|
||||
velocities in `x` & `y`, its angle, its angular velocity, and two booleans
|
||||
that represent whether each leg is in contact with the ground or not.
|
||||
|
||||
### Rewards
|
||||
Reward for moving from the top of the screen to the landing pad and coming
|
||||
to rest is about 100-140 points.
|
||||
If the lander moves away from the landing pad, it loses reward.
|
||||
If the lander crashes, it receives an additional -100 points. If it comes
|
||||
to rest, it receives an additional +100 points. Each leg with ground
|
||||
contact is +10 points.
|
||||
Firing the main engine is -0.3 points each frame. Firing the side engine
|
||||
is -0.03 points each frame. Solved is 200 points.
|
||||
|
||||
### Starting State
|
||||
The lander starts at the top center of the viewport with a random initial
|
||||
force applied to its center of mass.
|
||||
|
||||
### Episode Termination
|
||||
The episode finishes if:
|
||||
1) the lander crashes (the lander body gets in contact with the moon);
|
||||
2) the lander gets outside of the viewport (`x` coordinate is greater than 1);
|
||||
3) the lander is not awake. From the [Box2D docs](https://box2d.org/documentation/md__d_1__git_hub_box2d_docs_dynamics.html#autotoc_md61),
|
||||
a body which is not awake is a body which doesn't move and doesn't
|
||||
collide with any other body:
|
||||
> When Box2D determines that a body (or group of bodies) has come to rest,
|
||||
> the body enters a sleep state which has very little CPU overhead. If a
|
||||
> body is awake and collides with a sleeping body, then the sleeping body
|
||||
> wakes up. Bodies will also wake up if a joint or contact attached to
|
||||
> them is destroyed.
|
||||
|
||||
### Arguments
|
||||
To use to the _continuous_ environment, you need to specify the
|
||||
`continuous=True` argument like below:
|
||||
```python
|
||||
import gymnasium
|
||||
env = gymnasium.make(
|
||||
"LunarLander-v2",
|
||||
continuous: bool = False,
|
||||
gravity: float = -10.0,
|
||||
enable_wind: bool = False,
|
||||
wind_power: float = 15.0,
|
||||
turbulence_power: float = 1.5,
|
||||
)
|
||||
```
|
||||
If `continuous=True` is passed, continuous actions (corresponding to the throttle of the engines) will be used and the
|
||||
action space will be `Box(-1, +1, (2,), dtype=np.float32)`.
|
||||
The first coordinate of an action determines the throttle of the main engine, while the second
|
||||
coordinate specifies the throttle of the lateral boosters.
|
||||
Given an action `np.array([main, lateral])`, the main engine will be turned off completely if
|
||||
`main < 0` and the throttle scales affinely from 50% to 100% for `0 <= main <= 1` (in particular, the
|
||||
main engine doesn't work with less than 50% power).
|
||||
Similarly, if `-0.5 < lateral < 0.5`, the lateral boosters will not fire at all. If `lateral < -0.5`, the left
|
||||
booster will fire, and if `lateral > 0.5`, the right booster will fire. Again, the throttle scales affinely
|
||||
from 50% to 100% between -1 and -0.5 (and 0.5 and 1, respectively).
|
||||
|
||||
`gravity` dictates the gravitational constant, this is bounded to be within 0 and -12.
|
||||
|
||||
If `enable_wind=True` is passed, there will be wind effects applied to the lander.
|
||||
The wind is generated using the function `tanh(sin(2 k (t+C)) + sin(pi k (t+C)))`.
|
||||
`k` is set to 0.01.
|
||||
`C` is sampled randomly between -9999 and 9999.
|
||||
|
||||
`wind_power` dictates the maximum magnitude of linear wind applied to the craft. The recommended value for `wind_power` is between 0.0 and 20.0.
|
||||
`turbulence_power` dictates the maximum magnitude of rotational wind applied to the craft. The recommended value for `turbulence_power` is between 0.0 and 2.0.
|
||||
|
||||
### Version History
|
||||
- v2: Count energy spent and in v0.24, added turbulance with wind power and turbulence_power parameters
|
||||
- v1: Legs contact with ground added in state vector; contact with ground
|
||||
give +10 reward points, and -10 if then lose contact; reward
|
||||
renormalized to 200; harder initial random push.
|
||||
- v0: Initial version
|
||||
|
||||
<!-- ### References -->
|
||||
|
||||
### Credits
|
||||
Created by Oleg Klimov
|
Reference in New Issue
Block a user