mirror of
https://github.com/Farama-Foundation/Gymnasium.git
synced 2025-09-02 02:32:50 +00:00
87 lines
3.3 KiB
Markdown
87 lines
3.3 KiB
Markdown
![]() |
---
|
||
|
AUTOGENERATED: DO NOT EDIT FILE DIRECTLY
|
||
|
title: Cart Pole
|
||
|
---
|
||
|
|
||
|
# Cart Pole
|
||
|
|
||
|
```{figure} ../../_static/videos/classic_control/cart_pole.gif
|
||
|
:width: 200px
|
||
|
:name: cart_pole
|
||
|
```
|
||
|
|
||
|
This environment is part of the <a href='..'>Classic Control environments</a>. Please read that page first for general information.
|
||
|
|
||
|
| | |
|
||
|
|---|---|
|
||
|
| Action Space | Discrete(2) |
|
||
|
| Observation Shape | (4,) |
|
||
|
| Observation High | [4.8 inf 0.42 inf] |
|
||
|
| Observation Low | [-4.8 -inf -0.42 -inf] |
|
||
|
| Import | `gymnasium.make("CartPole-v1")` |
|
||
|
|
||
|
|
||
|
### Description
|
||
|
|
||
|
This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson in
|
||
|
["Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem"](https://ieeexplore.ieee.org/document/6313077).
|
||
|
A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track.
|
||
|
The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces
|
||
|
in the left and right direction on the cart.
|
||
|
|
||
|
### Action Space
|
||
|
|
||
|
The action is a `ndarray` with shape `(1,)` which can take values `{0, 1}` indicating the direction
|
||
|
of the fixed force the cart is pushed with.
|
||
|
|
||
|
| Num | Action |
|
||
|
|-----|------------------------|
|
||
|
| 0 | Push cart to the left |
|
||
|
| 1 | Push cart to the right |
|
||
|
|
||
|
**Note**: The velocity that is reduced or increased by the applied force is not fixed and it depends on the angle
|
||
|
the pole is pointing. The center of gravity of the pole varies the amount of energy needed to move the cart underneath it
|
||
|
|
||
|
### Observation Space
|
||
|
|
||
|
The observation is a `ndarray` with shape `(4,)` with the values corresponding to the following positions and velocities:
|
||
|
|
||
|
| Num | Observation | Min | Max |
|
||
|
|-----|-----------------------|---------------------|-------------------|
|
||
|
| 0 | Cart Position | -4.8 | 4.8 |
|
||
|
| 1 | Cart Velocity | -Inf | Inf |
|
||
|
| 2 | Pole Angle | ~ -0.418 rad (-24°) | ~ 0.418 rad (24°) |
|
||
|
| 3 | Pole Angular Velocity | -Inf | Inf |
|
||
|
|
||
|
**Note:** While the ranges above denote the possible values for observation space of each element,
|
||
|
it is not reflective of the allowed values of the state space in an unterminated episode. Particularly:
|
||
|
- The cart x-position (index 0) can be take values between `(-4.8, 4.8)`, but the episode terminates
|
||
|
if the cart leaves the `(-2.4, 2.4)` range.
|
||
|
- The pole angle can be observed between `(-.418, .418)` radians (or **±24°**), but the episode terminates
|
||
|
if the pole angle is not in the range `(-.2095, .2095)` (or **±12°**)
|
||
|
|
||
|
### Rewards
|
||
|
|
||
|
Since the goal is to keep the pole upright for as long as possible, a reward of `+1` for every step taken,
|
||
|
including the termination step, is allotted. The threshold for rewards is 475 for v1.
|
||
|
|
||
|
### Starting State
|
||
|
|
||
|
All observations are assigned a uniformly random value in `(-0.05, 0.05)`
|
||
|
|
||
|
### Episode End
|
||
|
|
||
|
The episode ends if any one of the following occurs:
|
||
|
|
||
|
1. Termination: Pole Angle is greater than ±12°
|
||
|
2. Termination: Cart Position is greater than ±2.4 (center of the cart reaches the edge of the display)
|
||
|
3. Truncation: Episode length is greater than 500 (200 for v0)
|
||
|
|
||
|
### Arguments
|
||
|
|
||
|
```
|
||
|
gymnasium.make('CartPole-v1')
|
||
|
```
|
||
|
|
||
|
No additional arguments are currently supported.
|