mirror of
https://github.com/Farama-Foundation/Gymnasium.git
synced 2026-01-03 10:09:29 +00:00
Adding documentation for CartPole-v1 in docs/classic_control (#2509)
* Adding documentation for CartPole-v1 in docs/classic_control * typo Co-authored-by: J K Terry <justinkterry@gmail.com>
This commit is contained in:
65
docs/classic_control/cartpole.md
Normal file
65
docs/classic_control/cartpole.md
Normal file
@@ -0,0 +1,65 @@
|
||||
CartPole-v1
|
||||
---
|
||||
|Title|Action Type|Action Shape|Action Values|Observation Type| Observation Shape|Observation Values|Average Total Reward|Import|
|
||||
| ----------- | -----------| ----------- | -----------|-----------| ----------- | -----------| ----------- | -----------|
|
||||
|CartPole-v1|Discrete|(1,)|(0,1)| Box |(4,)|[(-4.8,4.8),(-inf,inf), (~ -0.2095, ~ 0.2095), (-inf, inf)]| |`from gym.envs.classic_control import cartpole`|
|
||||
---
|
||||
|
||||
### Description
|
||||
This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson in ["Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem"](https://ieeexplore.ieee.org/document/6313077). A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum starts
|
||||
upright, and the goal is to prevent it from falling over by increasing and reducing the cart's velocity.
|
||||
|
||||
### Action Space
|
||||
The agent take a 1-element vector for actions.
|
||||
The action space is `(action)` in `[0, 1]`, where `action` is used to push the cart with a fixed amount of force:
|
||||
|
||||
| Num | Action |
|
||||
|-----|------------------------|
|
||||
| 0 | Push cart to the left |
|
||||
| 1 | Push cart to the right |
|
||||
|
||||
Note: The amount the velocity is reduced or increased is not fixed as it depends on the angle the pole is pointing.
|
||||
This is because the center of gravity of the pole increases the amount of energy needed to move the cart underneath it
|
||||
|
||||
### Observation Space
|
||||
The observation is a `ndarray` with shape `(4,)` where the elements correspond to the following:
|
||||
|
||||
| Num | Observation | Min | Max |
|
||||
|-----|-----------------------|----------------------|--------------------|
|
||||
| 0 | Cart Position | -4.8* | 4.8* |
|
||||
| 1 | Cart Velocity | -Inf | Inf |
|
||||
| 2 | Pole Angle | ~ -0.418 rad (-24°)** | ~ 0.418 rad (24°)** |
|
||||
| 3 | Pole Angular Velocity | -Inf | Inf |
|
||||
|
||||
**Note:** above denotes the ranges of possible observations for each element, but in two cases this range exceeds the
|
||||
range of possible values in an un-terminated episode:
|
||||
- `*`: the cart x-position can be observed between `(-4.8, 4.8)`, but an episode terminates if the cart leaves the
|
||||
`(-2.4, 2.4)` range.
|
||||
- `**`: Similarly, the pole angle can be observed between `(-.418, .418)` radians or precisely **±24°**, but an episode is
|
||||
terminated if the pole angle is outside the `(-.2095, .2095)` range or precisely **±12°**
|
||||
|
||||
### Rewards
|
||||
Reward is 1 for every step taken, including the termination step. The threshold is 475 for v1.
|
||||
|
||||
### Starting State
|
||||
All observations are assigned a uniform random value between (-0.05, 0.05)
|
||||
|
||||
### Episode Termination
|
||||
The episode terminates of one of the following occurs:
|
||||
|
||||
1. Pole Angle is more than ±12°
|
||||
2. Cart Position is more than ±2.4 (center of the cart reaches the edge of the display)
|
||||
3. Episode length is greater than 500 (200 for v0)
|
||||
|
||||
### Arguments
|
||||
|
||||
No additional arguments are currently supported.
|
||||
|
||||
```
|
||||
gym.make('CartPole-v1')
|
||||
```
|
||||
|
||||
### Version History
|
||||
|
||||
* v1: Maximum episode length increased from 200 to 500 steps, reward threshold increased from 195 to 475.
|
||||
* v0: Initial versions release (1.0.0)
|
||||
@@ -2,7 +2,7 @@ Blackjack
|
||||
---
|
||||
|Title|Action Type|Action Shape|Action Values|Observation Shape|Observation Values|Average Total Reward|Import|
|
||||
| ----------- | -----------| ----------- | -----------| ----------- | -----------| ----------- | -----------|
|
||||
|Blackjack|Discrete|(1,)|(0,1)|(3,)|[(0,31),(0,10),(0,1)]| |from gym.envs.toy_text import blackjack|
|
||||
|Blackjack|Discrete|(1,)|(0,1)|(3,)|[(0,31),(0,10),(0,1)]| |`from gym.envs.toy_text import blackjack`|
|
||||
---
|
||||
|
||||
Blackjack is a card game where the goal is to obtain cards that sum to as near as possible to 21 without going over. They're playing against a fixed dealer.
|
||||
@@ -50,7 +50,7 @@ Reward schedule:
|
||||
### Arguments
|
||||
|
||||
```
|
||||
gym.make('Blackjack-v0', natural=False)
|
||||
gym.make('Blackjack-v1', natural=False)
|
||||
```
|
||||
|
||||
<a id="nat">`natural`</a>: Whether to give an additional reward for starting with a natural blackjack, i.e. starting with an ace and ten (sum is 21).
|
||||
|
||||
@@ -28,9 +28,9 @@ class CartPoleEnv(gym.Env):
|
||||
Observation:
|
||||
Type: Box(4)
|
||||
Num Observation Min Max
|
||||
0 Cart Position -4.8 4.8
|
||||
0 Cart Position -2.4 2.4
|
||||
1 Cart Velocity -Inf Inf
|
||||
2 Pole Angle -0.418 rad (-24 deg) 0.418 rad (24 deg)
|
||||
2 Pole Angle -0.209 rad (-12 deg) 0.209 rad (12 deg)
|
||||
3 Pole Angular Velocity -Inf Inf
|
||||
|
||||
Actions:
|
||||
|
||||
Reference in New Issue
Block a user