Adding documentation for CartPole-v1 in docs/classic_control (#2509)

* Adding documentation for CartPole-v1 in docs/classic_control

* typo

Co-authored-by: J K Terry <justinkterry@gmail.com>
This commit is contained in:
Brendan King
2021-12-19 23:06:24 -08:00
committed by GitHub
parent a30568a06e
commit 9c0808eb9a
3 changed files with 69 additions and 4 deletions

View File

@@ -0,0 +1,65 @@
CartPole-v1
---
|Title|Action Type|Action Shape|Action Values|Observation Type| Observation Shape|Observation Values|Average Total Reward|Import|
| ----------- | -----------| ----------- | -----------|-----------| ----------- | -----------| ----------- | -----------|
|CartPole-v1|Discrete|(1,)|(0,1)| Box |(4,)|[(-4.8,4.8),(-inf,inf), (~ -0.2095, ~ 0.2095), (-inf, inf)]| |`from gym.envs.classic_control import cartpole`|
---
### Description
This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson in ["Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem"](https://ieeexplore.ieee.org/document/6313077). A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum starts
upright, and the goal is to prevent it from falling over by increasing and reducing the cart's velocity.
### Action Space
The agent take a 1-element vector for actions.
The action space is `(action)` in `[0, 1]`, where `action` is used to push the cart with a fixed amount of force:
| Num | Action |
|-----|------------------------|
| 0 | Push cart to the left |
| 1 | Push cart to the right |
Note: The amount the velocity is reduced or increased is not fixed as it depends on the angle the pole is pointing.
This is because the center of gravity of the pole increases the amount of energy needed to move the cart underneath it
### Observation Space
The observation is a `ndarray` with shape `(4,)` where the elements correspond to the following:
| Num | Observation | Min | Max |
|-----|-----------------------|----------------------|--------------------|
| 0 | Cart Position | -4.8* | 4.8* |
| 1 | Cart Velocity | -Inf | Inf |
| 2 | Pole Angle | ~ -0.418 rad (-24°)** | ~ 0.418 rad (24°)** |
| 3 | Pole Angular Velocity | -Inf | Inf |
**Note:** above denotes the ranges of possible observations for each element, but in two cases this range exceeds the
range of possible values in an un-terminated episode:
- `*`: the cart x-position can be observed between `(-4.8, 4.8)`, but an episode terminates if the cart leaves the
`(-2.4, 2.4)` range.
- `**`: Similarly, the pole angle can be observed between `(-.418, .418)` radians or precisely **±24°**, but an episode is
terminated if the pole angle is outside the `(-.2095, .2095)` range or precisely **±12°**
### Rewards
Reward is 1 for every step taken, including the termination step. The threshold is 475 for v1.
### Starting State
All observations are assigned a uniform random value between (-0.05, 0.05)
### Episode Termination
The episode terminates of one of the following occurs:
1. Pole Angle is more than ±12°
2. Cart Position is more than ±2.4 (center of the cart reaches the edge of the display)
3. Episode length is greater than 500 (200 for v0)
### Arguments
No additional arguments are currently supported.
```
gym.make('CartPole-v1')
```
### Version History
* v1: Maximum episode length increased from 200 to 500 steps, reward threshold increased from 195 to 475.
* v0: Initial versions release (1.0.0)

View File

@@ -2,7 +2,7 @@ Blackjack
---
|Title|Action Type|Action Shape|Action Values|Observation Shape|Observation Values|Average Total Reward|Import|
| ----------- | -----------| ----------- | -----------| ----------- | -----------| ----------- | -----------|
|Blackjack|Discrete|(1,)|(0,1)|(3,)|[(0,31),(0,10),(0,1)]| |from gym.envs.toy_text import blackjack|
|Blackjack|Discrete|(1,)|(0,1)|(3,)|[(0,31),(0,10),(0,1)]| |`from gym.envs.toy_text import blackjack`|
---
Blackjack is a card game where the goal is to obtain cards that sum to as near as possible to 21 without going over. They're playing against a fixed dealer.
@@ -50,7 +50,7 @@ Reward schedule:
### Arguments
```
gym.make('Blackjack-v0', natural=False)
gym.make('Blackjack-v1', natural=False)
```
<a id="nat">`natural`</a>: Whether to give an additional reward for starting with a natural blackjack, i.e. starting with an ace and ten (sum is 21).

View File

@@ -28,9 +28,9 @@ class CartPoleEnv(gym.Env):
Observation:
Type: Box(4)
Num Observation Min Max
0 Cart Position -4.8 4.8
0 Cart Position -2.4 2.4
1 Cart Velocity -Inf Inf
2 Pole Angle -0.418 rad (-24 deg) 0.418 rad (24 deg)
2 Pole Angle -0.209 rad (-12 deg) 0.209 rad (12 deg)
3 Pole Angular Velocity -Inf Inf
Actions: