Reformat some docstrings, remove unneeded image links (#2578)

* docs+credits

* docs: refactor box2d + comment version history

* fix mujoco line lengths

* fix more env line lengths

* black

* typos

* put docstrings in base environments rather than highest version

* fix richer reacher

* black

* correct black version

* continuous mountain car docstring to markdown

* remove unneeded images

* black

Co-authored-by: Andrea PIERRÉ <andrea_pierre@brown.edu>
This commit is contained in:
trigaten
2022-01-27 15:36:50 -05:00
committed by GitHub
parent 91d278f2dd
commit b9e8b6c587
6 changed files with 50 additions and 51 deletions

View File

@@ -11,7 +11,7 @@ from os import path
class PendulumEnv(gym.Env):
"""
### Description
## Description
The inverted pendulum swingup problem is a classic problem in the control literature. In this
version of the problem, the pendulum starts in a random position, and the goal is to swing it up so
@@ -26,7 +26,7 @@ class PendulumEnv(gym.Env):
- `theta`: angle in radians.
- `tau`: torque in `N * m`. Defined as positive _counter-clockwise_.
### Action Space
## Action Space
The action is the torque applied to the pendulum.
| Num | Action | Min | Max |
@@ -34,7 +34,7 @@ class PendulumEnv(gym.Env):
| 0 | Torque | -2.0 | 2.0 |
### Observation Space
## Observation Space
The observations correspond to the x-y coordinate of the pendulum's end, and its angular velocity.
| Num | Observation | Min | Max |
@@ -43,7 +43,7 @@ class PendulumEnv(gym.Env):
| 1 | y = sin(angle) | -1.0 | 1.0 |
| 2 | Angular Velocity | -8.0 | 8.0 |
### Rewards
## Rewards
The reward is defined as:
```
r = -(theta^2 + 0.1*theta_dt^2 + 0.001*torque^2)
@@ -53,13 +53,13 @@ class PendulumEnv(gym.Env):
0.001*2^2) = -16.2736044`, while the maximum reward is zero (pendulum is
upright with zero velocity and no torque being applied).
### Starting State
## Starting State
The starting state is a random angle in `[-pi, pi]` and a random angular velocity in `[-1,1]`.
### Episode Termination
## Episode Termination
An episode terminates after 200 steps. There's no other criteria for termination.
### Arguments
## Arguments
- `g`: acceleration of gravity measured in `(m/s^2)` used to calculate the pendulum dynamics. The default is
`g=10.0`.
@@ -67,7 +67,7 @@ class PendulumEnv(gym.Env):
gym.make('CartPole-v1', g=9.81)
```
### Version History
## Version History
* v1: Simplify the math equations, no difference in behavior.
* v0: Initial versions release (1.0.0)