mirror of
https://github.com/Farama-Foundation/Gymnasium.git
synced 2025-08-19 13:32:03 +00:00
Reformat some docstrings, remove unneeded image links (#2578)
* docs+credits * docs: refactor box2d + comment version history * fix mujoco line lengths * fix more env line lengths * black * typos * put docstrings in base environments rather than highest version * fix richer reacher * black * correct black version * continuous mountain car docstring to markdown * remove unneeded images * black Co-authored-by: Andrea PIERRÉ <andrea_pierre@brown.edu>
This commit is contained in:
@@ -108,8 +108,6 @@ class BipedalWalker(gym.Env, EzPickle):
|
|||||||
python gym/envs/box2d/bipedal_walker.py
|
python gym/envs/box2d/bipedal_walker.py
|
||||||
```
|
```
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
### Action Space
|
### Action Space
|
||||||
Actions are motor speed values in the [-1, 1] range for each of the
|
Actions are motor speed values in the [-1, 1] range for each of the
|
||||||
4 joints at both hips and knees.
|
4 joints at both hips and knees.
|
||||||
|
@@ -87,7 +87,7 @@ class FrictionDetector(contactListener):
|
|||||||
|
|
||||||
class CarRacing(gym.Env, EzPickle):
|
class CarRacing(gym.Env, EzPickle):
|
||||||
"""
|
"""
|
||||||
### Description
|
## Description
|
||||||
Easiest continuous control task to learn from pixels, a top-down
|
Easiest continuous control task to learn from pixels, a top-down
|
||||||
racing environment. Discreet control is reasonable in this environment as
|
racing environment. Discreet control is reasonable in this environment as
|
||||||
well, on/off discretisation is fine.
|
well, on/off discretisation is fine.
|
||||||
@@ -105,39 +105,37 @@ class CarRacing(gym.Env, EzPickle):
|
|||||||
Remember it's a powerful rear-wheel drive car - don't press the accelerator
|
Remember it's a powerful rear-wheel drive car - don't press the accelerator
|
||||||
and turn at the same time.
|
and turn at the same time.
|
||||||
|
|
||||||

|
## Action Space
|
||||||
|
|
||||||
### Action Space
|
|
||||||
There are 3 actions: steering (-1 is full left, +1 is full right), gas,
|
There are 3 actions: steering (-1 is full left, +1 is full right), gas,
|
||||||
and breaking.
|
and breaking.
|
||||||
|
|
||||||
### Observation Space
|
## Observation Space
|
||||||
State consists of 96x96 pixels.
|
State consists of 96x96 pixels.
|
||||||
|
|
||||||
### Rewards
|
## Rewards
|
||||||
The reward is -0.1 every frame and +1000/N for every track tile visited,
|
The reward is -0.1 every frame and +1000/N for every track tile visited,
|
||||||
where N is the total number of tiles visited in the track. For example,
|
where N is the total number of tiles visited in the track. For example,
|
||||||
if you have finished in 732 frames, your reward is
|
if you have finished in 732 frames, your reward is
|
||||||
1000 - 0.1*732 = 926.8 points.
|
1000 - 0.1*732 = 926.8 points.
|
||||||
|
|
||||||
### Starting State
|
## Starting State
|
||||||
The car starts stopped at the center of the road.
|
The car starts stopped at the center of the road.
|
||||||
|
|
||||||
### Episode Termination
|
## Episode Termination
|
||||||
The episode finishes when all the tiles are visited. The car also can go
|
The episode finishes when all the tiles are visited. The car also can go
|
||||||
outside of the playfield - that is far off the track, then it will
|
outside of the playfield - that is far off the track, then it will
|
||||||
get -100 and die.
|
get -100 and die.
|
||||||
|
|
||||||
### Arguments
|
## Arguments
|
||||||
There are no arguments supported in constructing the environment.
|
There are no arguments supported in constructing the environment.
|
||||||
|
|
||||||
### Version History
|
## Version History
|
||||||
- v0: current version
|
- v0: current version
|
||||||
|
|
||||||
### References
|
## References
|
||||||
- Chris Campbell (2014), http://www.iforce2d.net/b2dtut/top-down-car.
|
- Chris Campbell (2014), http://www.iforce2d.net/b2dtut/top-down-car.
|
||||||
|
|
||||||
### Credits
|
## Credits
|
||||||
Created by Oleg Klimov
|
Created by Oleg Klimov
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
@@ -83,18 +83,16 @@ class LunarLander(gym.Env, EzPickle):
|
|||||||
<!-- To play yourself, run: -->
|
<!-- To play yourself, run: -->
|
||||||
<!-- python examples/agents/keyboard_agent.py LunarLander-v2 -->
|
<!-- python examples/agents/keyboard_agent.py LunarLander-v2 -->
|
||||||
|
|
||||||

|
## Action Space
|
||||||
|
|
||||||
### Action Space
|
|
||||||
There are four discrete actions available: do nothing, fire left
|
There are four discrete actions available: do nothing, fire left
|
||||||
orientation engine, fire main engine, fire right orientation engine.
|
orientation engine, fire main engine, fire right orientation engine.
|
||||||
|
|
||||||
### Observation Space
|
## Observation Space
|
||||||
There are 8 states: the coordinates of the lander in `x` & `y`, its linear
|
There are 8 states: the coordinates of the lander in `x` & `y`, its linear
|
||||||
velocities in `x` & `y`, its angle, its angular velocity, and two boleans
|
velocities in `x` & `y`, its angle, its angular velocity, and two boleans
|
||||||
showing if each leg is in contact with the ground or not.
|
showing if each leg is in contact with the ground or not.
|
||||||
|
|
||||||
### Rewards
|
## Rewards
|
||||||
Reward for moving from the top of the screen to the landing pad and zero
|
Reward for moving from the top of the screen to the landing pad and zero
|
||||||
speed is about 100..140 points.
|
speed is about 100..140 points.
|
||||||
If the lander moves away from the landing pad it loses reward.
|
If the lander moves away from the landing pad it loses reward.
|
||||||
@@ -104,11 +102,11 @@ class LunarLander(gym.Env, EzPickle):
|
|||||||
Firing the main engine is -0.3 points each frame. Firing the side engine
|
Firing the main engine is -0.3 points each frame. Firing the side engine
|
||||||
is -0.03 points each frame. Solved is 200 points.
|
is -0.03 points each frame. Solved is 200 points.
|
||||||
|
|
||||||
### Starting State
|
## Starting State
|
||||||
The lander starts at the top center of the viewport with a random initial
|
The lander starts at the top center of the viewport with a random initial
|
||||||
force applied to its center of mass.
|
force applied to its center of mass.
|
||||||
|
|
||||||
### Episode Termination
|
## Episode Termination
|
||||||
The episode finishes if:
|
The episode finishes if:
|
||||||
1) the lander crashes (the lander body gets in contact with the moon);
|
1) the lander crashes (the lander body gets in contact with the moon);
|
||||||
2) the lander gets outside of the viewport (`x` coordinate is greater than 1);
|
2) the lander gets outside of the viewport (`x` coordinate is greater than 1);
|
||||||
@@ -121,7 +119,7 @@ class LunarLander(gym.Env, EzPickle):
|
|||||||
> wakes up. Bodies will also wake up if a joint or contact attached to
|
> wakes up. Bodies will also wake up if a joint or contact attached to
|
||||||
> them is destroyed.
|
> them is destroyed.
|
||||||
|
|
||||||
### Arguments
|
## Arguments
|
||||||
To use to the _continuous_ environment, you need to specify the
|
To use to the _continuous_ environment, you need to specify the
|
||||||
`continuous"=True` argument like below:
|
`continuous"=True` argument like below:
|
||||||
```python
|
```python
|
||||||
@@ -135,7 +133,7 @@ class LunarLander(gym.Env, EzPickle):
|
|||||||
|
|
||||||
<!-- ### References -->
|
<!-- ### References -->
|
||||||
|
|
||||||
### Credits
|
## Credits
|
||||||
Created by Oleg Klimov
|
Created by Oleg Klimov
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
@@ -24,19 +24,16 @@ __author__ = "Christoph Dann <cdann@cdann.de>"
|
|||||||
|
|
||||||
class AcrobotEnv(core.Env):
|
class AcrobotEnv(core.Env):
|
||||||
"""
|
"""
|
||||||
### Description
|
## Description
|
||||||
The Acrobot system includes two joints and two links, where the joint between the two links is actuated. Initially, the
|
The Acrobot system includes two joints and two links, where the joint between the two links is actuated. Initially, the
|
||||||
links are hanging downwards, and the goal is to swing the end of the lower link up to a given height by applying changes
|
links are hanging downwards, and the goal is to swing the end of the lower link up to a given height by applying changes
|
||||||
to torque on the actuated joint (middle).
|
to torque on the actuated joint (middle).
|
||||||
|
|
||||||
|
**Gif**: two blue pendulum links connected by two green joints. The joint in between the two pendulum links is acted
|
||||||

|
|
||||||
|
|
||||||
**Image**: two blue pendulum links connected by two green joints. The joint in between the two pendulum links is acted
|
|
||||||
upon by the agent via changes in torque. The goal is to swing the end of the outer-link to reach the target height
|
upon by the agent via changes in torque. The goal is to swing the end of the outer-link to reach the target height
|
||||||
(black horizontal line above system).
|
(black horizontal line above system).
|
||||||
|
|
||||||
### Action Space
|
## Action Space
|
||||||
|
|
||||||
The action is either applying +1, 0 or -1 torque on the joint between the two pendulum links.
|
The action is either applying +1, 0 or -1 torque on the joint between the two pendulum links.
|
||||||
|
|
||||||
@@ -46,7 +43,7 @@ class AcrobotEnv(core.Env):
|
|||||||
| 1 | apply 0 torque to the joint |
|
| 1 | apply 0 torque to the joint |
|
||||||
| 2 | apply 1 torque to the joint |
|
| 2 | apply 1 torque to the joint |
|
||||||
|
|
||||||
### Observation Space
|
## Observation Space
|
||||||
|
|
||||||
The observation space gives information about the two rotational joint angles `theta1` and `theta2`, as well as their
|
The observation space gives information about the two rotational joint angles `theta1` and `theta2`, as well as their
|
||||||
angular velocities:
|
angular velocities:
|
||||||
@@ -70,24 +67,24 @@ class AcrobotEnv(core.Env):
|
|||||||
or `[cos(theta1) sin(theta1) cos(theta2) sin(theta2) thetaDot1 thetaDot2]`. As an example, a state of
|
or `[cos(theta1) sin(theta1) cos(theta2) sin(theta2) thetaDot1 thetaDot2]`. As an example, a state of
|
||||||
`[1, 0, 1, 0, ..., ...]` indicates that both links are pointing downwards.
|
`[1, 0, 1, 0, ..., ...]` indicates that both links are pointing downwards.
|
||||||
|
|
||||||
### Rewards
|
## Rewards
|
||||||
|
|
||||||
All steps that do not reach the goal (termination criteria) incur a reward of -1. Achieving the target height and
|
All steps that do not reach the goal (termination criteria) incur a reward of -1. Achieving the target height and
|
||||||
terminating incurs a reward of 0. The reward threshold is -100.
|
terminating incurs a reward of 0. The reward threshold is -100.
|
||||||
|
|
||||||
### Starting State
|
## Starting State
|
||||||
|
|
||||||
At start, each parameter in the underlying state (`theta1`, `theta2`, and the two angular velocities) is initialized
|
At start, each parameter in the underlying state (`theta1`, `theta2`, and the two angular velocities) is initialized
|
||||||
uniformly at random between -0.1 and 0.1. This means both links are pointing roughly downwards.
|
uniformly at random between -0.1 and 0.1. This means both links are pointing roughly downwards.
|
||||||
|
|
||||||
### Episode Termination
|
## Episode Termination
|
||||||
The episode terminates of one of the following occurs:
|
The episode terminates of one of the following occurs:
|
||||||
|
|
||||||
1. The target height is achieved. As constructed, this occurs when
|
1. The target height is achieved. As constructed, this occurs when
|
||||||
`-cos(theta1) - cos(theta2 + theta1) > 1.0`
|
`-cos(theta1) - cos(theta2 + theta1) > 1.0`
|
||||||
2. Episode length is greater than 500 (200 for v0)
|
2. Episode length is greater than 500 (200 for v0)
|
||||||
|
|
||||||
### Arguments
|
## Arguments
|
||||||
|
|
||||||
There are no arguments supported in constructing the environment. As an example:
|
There are no arguments supported in constructing the environment. As an example:
|
||||||
|
|
||||||
@@ -118,14 +115,14 @@ class AcrobotEnv(core.Env):
|
|||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
### Version History
|
## Version History
|
||||||
|
|
||||||
- v1: Maximum number of steps increased from 200 to 500. The observation space for v0 provided direct readings of
|
- v1: Maximum number of steps increased from 200 to 500. The observation space for v0 provided direct readings of
|
||||||
`theta1` and `theta2` in radians, having a range of `[-pi, pi]`. The v1 observation space as described here provides the
|
`theta1` and `theta2` in radians, having a range of `[-pi, pi]`. The v1 observation space as described here provides the
|
||||||
sin and cosin of each angle instead.
|
sin and cosin of each angle instead.
|
||||||
- v0: Initial versions release (1.0.0) (removed from gym for v1)
|
- v0: Initial versions release (1.0.0) (removed from gym for v1)
|
||||||
|
|
||||||
### References
|
## References
|
||||||
- Sutton, R. S. (1996). Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. In D. Touretzky, M. C. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8). MIT Press. https://proceedings.neurips.cc/paper/1995/file/8f1d43620bc6bb580df6e80b0dc05c48-Paper.pdf
|
- Sutton, R. S. (1996). Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. In D. Touretzky, M. C. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8). MIT Press. https://proceedings.neurips.cc/paper/1995/file/8f1d43620bc6bb580df6e80b0dc05c48-Paper.pdf
|
||||||
- Sutton, R. S., Barto, A. G. (2018 ). Reinforcement Learning: An Introduction. The MIT Press.
|
- Sutton, R. S., Barto, A. G. (2018 ). Reinforcement Learning: An Introduction. The MIT Press.
|
||||||
"""
|
"""
|
||||||
|
@@ -38,26 +38,34 @@ class Continuous_MountainCarEnv(gym.Env):
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
Observation space is a 2-dim vector, where the 1st element represents the "car position" and the 2nd element represents the "car velocity".
|
## Observation Space
|
||||||
|
|
||||||
Action: The actual driving force is calculated by multiplying the power coef by power (0.0015)
|
The observation space is a 2-dim vector, where the 1st element represents the "car position" and the 2nd element represents the "car velocity".
|
||||||
|
|
||||||
Reward: Reward of 100 is awarded if the agent reached the flag (position = 0.45)
|
## Action
|
||||||
|
|
||||||
|
The actual driving force is calculated by multiplying the power coef by power (0.0015)
|
||||||
|
|
||||||
|
## Reward
|
||||||
|
|
||||||
|
Reward of 100 is awarded if the agent reached the flag (position = 0.45)
|
||||||
on top of the mountain. Reward is decrease based on amount of energy consumed each step.
|
on top of the mountain. Reward is decrease based on amount of energy consumed each step.
|
||||||
|
|
||||||
Starting State: The position of the car is assigned a uniform random value in [-0.6 , -0.4]. The starting velocity of the car is always assigned to 0.
|
## Starting State
|
||||||
|
|
||||||
Episode Termination: The car position is more than 0.45. Episode length is greater than 200
|
The position of the car is assigned a uniform random value in [-0.6 , -0.4]. The starting velocity of the car is always assigned to 0.
|
||||||
|
|
||||||
|
## Episode Termination
|
||||||
|
|
||||||
|
The car position is more than 0.45. Episode length is greater than 200
|
||||||
|
|
||||||
### Arguments
|
## Arguments
|
||||||
|
|
||||||
```
|
```
|
||||||
gym.make('MountainCarContinuous-v0')
|
gym.make('MountainCarContinuous-v0')
|
||||||
```
|
```
|
||||||
|
|
||||||
### Version History
|
## Version History
|
||||||
|
|
||||||
* v0: Initial versions release (1.0.0)
|
* v0: Initial versions release (1.0.0)
|
||||||
"""
|
"""
|
||||||
|
@@ -11,7 +11,7 @@ from os import path
|
|||||||
|
|
||||||
class PendulumEnv(gym.Env):
|
class PendulumEnv(gym.Env):
|
||||||
"""
|
"""
|
||||||
### Description
|
## Description
|
||||||
|
|
||||||
The inverted pendulum swingup problem is a classic problem in the control literature. In this
|
The inverted pendulum swingup problem is a classic problem in the control literature. In this
|
||||||
version of the problem, the pendulum starts in a random position, and the goal is to swing it up so
|
version of the problem, the pendulum starts in a random position, and the goal is to swing it up so
|
||||||
@@ -26,7 +26,7 @@ class PendulumEnv(gym.Env):
|
|||||||
- `theta`: angle in radians.
|
- `theta`: angle in radians.
|
||||||
- `tau`: torque in `N * m`. Defined as positive _counter-clockwise_.
|
- `tau`: torque in `N * m`. Defined as positive _counter-clockwise_.
|
||||||
|
|
||||||
### Action Space
|
## Action Space
|
||||||
The action is the torque applied to the pendulum.
|
The action is the torque applied to the pendulum.
|
||||||
|
|
||||||
| Num | Action | Min | Max |
|
| Num | Action | Min | Max |
|
||||||
@@ -34,7 +34,7 @@ class PendulumEnv(gym.Env):
|
|||||||
| 0 | Torque | -2.0 | 2.0 |
|
| 0 | Torque | -2.0 | 2.0 |
|
||||||
|
|
||||||
|
|
||||||
### Observation Space
|
## Observation Space
|
||||||
The observations correspond to the x-y coordinate of the pendulum's end, and its angular velocity.
|
The observations correspond to the x-y coordinate of the pendulum's end, and its angular velocity.
|
||||||
|
|
||||||
| Num | Observation | Min | Max |
|
| Num | Observation | Min | Max |
|
||||||
@@ -43,7 +43,7 @@ class PendulumEnv(gym.Env):
|
|||||||
| 1 | y = sin(angle) | -1.0 | 1.0 |
|
| 1 | y = sin(angle) | -1.0 | 1.0 |
|
||||||
| 2 | Angular Velocity | -8.0 | 8.0 |
|
| 2 | Angular Velocity | -8.0 | 8.0 |
|
||||||
|
|
||||||
### Rewards
|
## Rewards
|
||||||
The reward is defined as:
|
The reward is defined as:
|
||||||
```
|
```
|
||||||
r = -(theta^2 + 0.1*theta_dt^2 + 0.001*torque^2)
|
r = -(theta^2 + 0.1*theta_dt^2 + 0.001*torque^2)
|
||||||
@@ -53,13 +53,13 @@ class PendulumEnv(gym.Env):
|
|||||||
0.001*2^2) = -16.2736044`, while the maximum reward is zero (pendulum is
|
0.001*2^2) = -16.2736044`, while the maximum reward is zero (pendulum is
|
||||||
upright with zero velocity and no torque being applied).
|
upright with zero velocity and no torque being applied).
|
||||||
|
|
||||||
### Starting State
|
## Starting State
|
||||||
The starting state is a random angle in `[-pi, pi]` and a random angular velocity in `[-1,1]`.
|
The starting state is a random angle in `[-pi, pi]` and a random angular velocity in `[-1,1]`.
|
||||||
|
|
||||||
### Episode Termination
|
## Episode Termination
|
||||||
An episode terminates after 200 steps. There's no other criteria for termination.
|
An episode terminates after 200 steps. There's no other criteria for termination.
|
||||||
|
|
||||||
### Arguments
|
## Arguments
|
||||||
- `g`: acceleration of gravity measured in `(m/s^2)` used to calculate the pendulum dynamics. The default is
|
- `g`: acceleration of gravity measured in `(m/s^2)` used to calculate the pendulum dynamics. The default is
|
||||||
`g=10.0`.
|
`g=10.0`.
|
||||||
|
|
||||||
@@ -67,7 +67,7 @@ class PendulumEnv(gym.Env):
|
|||||||
gym.make('CartPole-v1', g=9.81)
|
gym.make('CartPole-v1', g=9.81)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Version History
|
## Version History
|
||||||
|
|
||||||
* v1: Simplify the math equations, no difference in behavior.
|
* v1: Simplify the math equations, no difference in behavior.
|
||||||
* v0: Initial versions release (1.0.0)
|
* v0: Initial versions release (1.0.0)
|
||||||
|
Reference in New Issue
Block a user