1.9 KiB
AUTOGENERATED, title
AUTOGENERATED | title |
---|---|
DO NOT EDIT FILE DIRECTLY | Cliff Walking |
Cliff Walking
:width: 200px
:name: cliff_walking
This environment is part of the Toy Text environments. Please read that page first for general information.
Action Space | Discrete(4) |
Observation Space | Discrete(48) |
Import | gymnasium.make("CliffWalking-v0") |
This is a simple implementation of the Gridworld Cliff reinforcement learning task.
Adapted from Example 6.6 (page 106) from Reinforcement Learning: An Introduction by Sutton and Barto.
With inspiration from: [https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py] (https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py)
Description
The board is a 4x12 matrix, with (using NumPy matrix indexing):
- [3, 0] as the start at bottom-left
- [3, 11] as the goal at bottom-right
- [3, 1..10] as the cliff at bottom-center
If the agent steps on the cliff, it returns to the start. An episode terminates when the agent reaches the goal.
Actions
There are 4 discrete deterministic actions:
- 0: move up
- 1: move right
- 2: move down
- 3: move left
Observations
There are 3x12 + 1 possible states. In fact, the agent cannot be at the cliff, nor at the goal (as this results in the end of the episode). It remains all the positions of the first 3 rows plus the bottom-left cell. The observation is simply the current position encoded as flattened index.
Reward
Each time step incurs -1 reward, and stepping into the cliff incurs -100 reward.
Arguments
gymnasium.make('CliffWalking-v0')
Version History
- v0: Initial version release