mirror of
https://github.com/Farama-Foundation/Gymnasium.git
synced 2025-08-18 04:49:12 +00:00
@@ -12,10 +12,10 @@ class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
"""
|
"""
|
||||||
### Description
|
### Description
|
||||||
|
|
||||||
This environment is based on the environment iintroduced by Schulman,
|
This environment is based on the environment introduced by Schulman,
|
||||||
Moritz, Levine, Jordan and Abbeel in ["High-Dimensional Continuous Control
|
Moritz, Levine, Jordan and Abbeel in ["High-Dimensional Continuous Control
|
||||||
Using Generalized Advantage Estimation"](https://arxiv.org/abs/1506.02438).
|
Using Generalized Advantage Estimation"](https://arxiv.org/abs/1506.02438).
|
||||||
The ant is a 3D roboot consisting of one torso (free rotational body) with
|
The ant is a 3D robot consisting of one torso (free rotational body) with
|
||||||
four legs attached to it with each leg having two links. The goal is to
|
four legs attached to it with each leg having two links. The goal is to
|
||||||
coordinate the four legs to move in the forward (right) direction by applying
|
coordinate the four legs to move in the forward (right) direction by applying
|
||||||
torques on the eight hinges connecting the two links of each leg and the torso
|
torques on the eight hinges connecting the two links of each leg and the torso
|
||||||
@@ -41,14 +41,14 @@ class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
|
|
||||||
### Observation Space
|
### Observation Space
|
||||||
|
|
||||||
The state space consists of positional values of different body parts of the hopper,
|
The state space consists of positional values of different body parts of the ant,
|
||||||
followed by the velocities of those individual parts (their derivatives) with all
|
followed by the velocities of those individual parts (their derivatives) with all
|
||||||
the positions ordered before all the velocities.
|
the positions ordered before all the velocities.
|
||||||
|
|
||||||
The observation is a `ndarray` with shape `(111,)` where the elements correspond to the following:
|
The observation is a `ndarray` with shape `(111,)` where the elements correspond to the following:
|
||||||
|
|
||||||
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit |
|
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit |
|
||||||
|-----|---------------------------------------------------------|----------------|-----------------|----------------------------------------|-------|------|
|
|-----|-------------------------------------------------------------|----------------|-----------------|----------------------------------------|-------|------|
|
||||||
| 0 | x-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) |
|
| 0 | x-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) |
|
||||||
| 1 | y-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) |
|
| 1 | y-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) |
|
||||||
| 2 | z-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) |
|
| 2 | z-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) |
|
||||||
@@ -137,7 +137,7 @@ class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
|
|
||||||
No additional arguments are currently supported (in v2 and lower), but modifications
|
No additional arguments are currently supported (in v2 and lower), but modifications
|
||||||
can be made to the XML file in the assets folder (or by changing the path to a modified
|
can be made to the XML file in the assets folder (or by changing the path to a modified
|
||||||
XML file in another folder)..
|
XML file in another folder).
|
||||||
|
|
||||||
```
|
```
|
||||||
env = gym.make('Ant-v2')
|
env = gym.make('Ant-v2')
|
||||||
@@ -146,7 +146,7 @@ class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.
|
v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.
|
||||||
|
|
||||||
```
|
```
|
||||||
env = gym.make('Ant-v3', ctrl_cost_weight=0.1, ....)
|
env = gym.make('Ant-v3', ctrl_cost_weight=0.1, ...)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Version History
|
### Version History
|
||||||
|
@@ -101,7 +101,7 @@ class HalfCheetahEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
(default is 5), where the *dt* for one frame is 0.01 - making the
|
(default is 5), where the *dt* for one frame is 0.01 - making the
|
||||||
default *dt = 5*0.01 = 0.05*. This reward would be positive if the cheetah
|
default *dt = 5*0.01 = 0.05*. This reward would be positive if the cheetah
|
||||||
runs forward (right) desired.
|
runs forward (right) desired.
|
||||||
- *reward_control*: A negative reward for penalising the swimmer if it takes
|
- *reward_control*: A negative reward for penalising the cheetah if it takes
|
||||||
actions that are too large. It is measured as *-coefficient x
|
actions that are too large. It is measured as *-coefficient x
|
||||||
sum(action<sup>2</sup>)* where *coefficient* is a parameter set for the
|
sum(action<sup>2</sup>)* where *coefficient* is a parameter set for the
|
||||||
control and has a default value of 0.1
|
control and has a default value of 0.1
|
||||||
|
@@ -18,8 +18,7 @@ class HopperEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
### Description
|
### Description
|
||||||
|
|
||||||
This environment is based on the work done by Erez, Tassa, and Todorov in
|
This environment is based on the work done by Erez, Tassa, and Todorov in
|
||||||
["Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks"]
|
["Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks"](http://www.roboticsproceedings.org/rss07/p10.pdf). The environment aims to
|
||||||
(http://www.roboticsproceedings.org/rss07/p10.pdf). The environment aims to
|
|
||||||
increase the number of independent state and control variables as compared to
|
increase the number of independent state and control variables as compared to
|
||||||
the classic control environments. The hopper is a two-dimensional
|
the classic control environments. The hopper is a two-dimensional
|
||||||
one-legged figure that consist of four main body parts - the torso at the
|
one-legged figure that consist of four main body parts - the torso at the
|
||||||
@@ -71,19 +70,20 @@ class HopperEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
on that value. This value is hidden from the algorithm, which in turn has
|
on that value. This value is hidden from the algorithm, which in turn has
|
||||||
to develop an abstract understanding of it from the observed rewards.
|
to develop an abstract understanding of it from the observed rewards.
|
||||||
Therefore, observation space has shape `(11,)` instead of `(12,)` and looks like:
|
Therefore, observation space has shape `(11,)` instead of `(12,)` and looks like:
|
||||||
|
|
||||||
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint| Unit |
|
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint| Unit |
|
||||||
|-----|-----------------------|----------------------|--------------------|----------------------|--------------------|--------------------|
|
|-----|-----------------------|----------------------|--------------------|----------------------|--------------------|--------------------|
|
||||||
| 0 | z-coordinate of the top (height of hopper) | -Inf | Inf | rootz | slide | position (m) |
|
| 0 | z-coordinate of the top (height of hopper) | -Inf | Inf | rootz | slide | position (m) |
|
||||||
| 1 | angle of the top | -Inf | Inf | rooty | hinge | angle (rad) |
|
| 1 | angle of the top | -Inf | Inf | rooty | hinge | angle (rad) |
|
||||||
| 2 | angle of the thigh joint | -Inf | Inf | thigh_joint | hinge | angle (rad) |
|
| 2 | angle of the thigh joint | -Inf | Inf | thigh_joint | hinge | angle (rad) |
|
||||||
| 3 | angle of the leg joint | -Inf | Inf | leg_joint | hinge | angle (rad) |
|
| 3 | angle of the leg joint | -Inf | Inf | leg_joint | hinge | angle (rad) |
|
||||||
| 4 | angle of the foot joint | -Inf | Inf | foot_joint | hinge | angle (rad) |
|
| 4 | angle of the foot joint | -Inf | Inf | foot_joint | hinge | angle (rad) |
|
||||||
| 5 | velocity of the x-coordinate of the top | -Inf | Inf | rootx | slide | velocity (m/s) |
|
| 5 | velocity of the x-coordinate of the top | -Inf | Inf | rootx | slide | velocity (m/s) |
|
||||||
| 6 | velocity of the z-coordinate (height) of the top | -Inf | Inf | rootz | slide | velocity (m/s) |
|
| 6 | velocity of the z-coordinate (height) of the top | -Inf | Inf | rootz | slide | velocity (m/s) |
|
||||||
| 7 | angular velocity of the angle of the top | -Inf | Inf | rooty | hinge | angular velocity (rad/s) |
|
| 7 | angular velocity of the angle of the top | -Inf | Inf | rooty | hinge | angular velocity (rad/s) |
|
||||||
| 8 | angular velocity of the thigh hinge | -Inf | Inf | thigh_joint | hinge | angular velocity (rad/s) |
|
| 8 | angular velocity of the thigh hinge | -Inf | Inf | thigh_joint | hinge | angular velocity (rad/s) |
|
||||||
| 9 | angular velocity of the leg hinge | -Inf | Inf | leg_joint | hinge | angular velocity (rad/s) |
|
| 9 | angular velocity of the leg hinge | -Inf | Inf | leg_joint | hinge | angular velocity (rad/s) |
|
||||||
| 10 | angular velocity of the foot hinge | -Inf | Inf | foot_joint | hinge | angular velocity (rad/s) |
|
| 10 | angular velocity of the foot hinge | -Inf | Inf | foot_joint | hinge | angular velocity (rad/s) |
|
||||||
|
|
||||||
### Rewards
|
### Rewards
|
||||||
The reward consists of three parts:
|
The reward consists of three parts:
|
||||||
@@ -103,8 +103,8 @@ class HopperEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
|
|
||||||
### Starting State
|
### Starting State
|
||||||
All observations start in state
|
All observations start in state
|
||||||
(0.0, 1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform nois
|
(0.0, 1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform noise
|
||||||
e in the range of [-0.005, 0.005] added to the values for stochasticity.
|
in the range of [-0.005, 0.005] added to the values for stochasticity.
|
||||||
|
|
||||||
### Episode Termination
|
### Episode Termination
|
||||||
The episode terminates when any of the following happens:
|
The episode terminates when any of the following happens:
|
||||||
|
@@ -144,7 +144,7 @@ class HumanoidEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
|
|
||||||
### Rewards
|
### Rewards
|
||||||
The reward consists of three parts:
|
The reward consists of three parts:
|
||||||
- *alive_bonus*: Every timestep that the ant is alive, it gets a reward of 5.
|
- *alive_bonus*: Every timestep that the humanoid is alive, it gets a reward of 5.
|
||||||
- *lin_vel_cost*: A reward of walking forward which is measured as *1.25 *
|
- *lin_vel_cost*: A reward of walking forward which is measured as *1.25 *
|
||||||
(average center of mass before action - average center of mass after action)/dt*.
|
(average center of mass before action - average center of mass after action)/dt*.
|
||||||
*dt* is the time between actions and is dependent on the frame_skip parameter
|
*dt* is the time between actions and is dependent on the frame_skip parameter
|
||||||
|
@@ -8,12 +8,12 @@ class HumanoidStandupEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
### Description
|
### Description
|
||||||
|
|
||||||
This environment is based on the environment introduced by Tassa, Erez and Todorov
|
This environment is based on the environment introduced by Tassa, Erez and Todorov
|
||||||
in ["Synthesis and stabilization of complex behaviors through online trajectory optimization"]
|
in ["Synthesis and stabilization of complex behaviors through online trajectory optimization"](https://ieeexplore.ieee.org/document/6386025).
|
||||||
(https://ieeexplore.ieee.org/document/6386025). The 3D bipedal robot is designed to simulate
|
The 3D bipedal robot is designed to simulate a human. It has a torso (abdomen) with a
|
||||||
a human. It has a torso (abdomen) with a pair of legs and arms. The legs each consist of two
|
pair of legs and arms. The legs each consist of two links, and so the arms (representing the
|
||||||
links, and so the arms (representing the knees and elbows respectively). The environment starts
|
knees and elbows respectively). The environment starts with the humanoid layiing on the ground,
|
||||||
with the humanoid layiing on the ground, and then the goal of the environment is to make the
|
and then the goal of the environment is to make the humanoid standup and then keep it standing
|
||||||
humanoid standup and then keep it standing by applying torques on the various hinges.
|
by applying torques on the various hinges.
|
||||||
|
|
||||||
### Action Space
|
### Action Space
|
||||||
The agent take a 17-element vector for actions.
|
The agent take a 17-element vector for actions.
|
||||||
|
@@ -29,7 +29,7 @@ class InvertedDoublePendulumEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
|
|
||||||
### Observation Space
|
### Observation Space
|
||||||
|
|
||||||
The state space consists of positional values of different body parts of the hopper,
|
The state space consists of positional values of different body parts of the pendulum system,
|
||||||
followed by the velocities of those individual parts (their derivatives) with all the
|
followed by the velocities of those individual parts (their derivatives) with all the
|
||||||
positions ordered before all the velocities.
|
positions ordered before all the velocities.
|
||||||
|
|
||||||
|
@@ -30,7 +30,7 @@ class InvertedPendulumEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
### Observation Space
|
### Observation Space
|
||||||
|
|
||||||
The state space consists of positional values of different body parts of
|
The state space consists of positional values of different body parts of
|
||||||
the hopper, followed by the velocities of those individual parts (their derivatives)
|
the pendulum system, followed by the velocities of those individual parts (their derivatives)
|
||||||
with all the positions ordered before all the velocities.
|
with all the positions ordered before all the velocities.
|
||||||
|
|
||||||
The observation is a `ndarray` with shape `(4,)` where the elements correspond to the following:
|
The observation is a `ndarray` with shape `(4,)` where the elements correspond to the following:
|
||||||
|
@@ -46,6 +46,7 @@ class ReacherEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
reacher the state is created by combining only certain elements of the
|
reacher the state is created by combining only certain elements of the
|
||||||
position and velocity, and performing some function transformations on them.
|
position and velocity, and performing some function transformations on them.
|
||||||
If one is to read the `.xml` for reacher then they will find 4 joints:
|
If one is to read the `.xml` for reacher then they will find 4 joints:
|
||||||
|
|
||||||
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit |
|
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit |
|
||||||
|-----|-----------------------------|----------|----------|----------------------------------|-------|--------------------|
|
|-----|-----------------------------|----------|----------|----------------------------------|-------|--------------------|
|
||||||
| 0 | angle of the first arm | -Inf | Inf | joint0 | hinge | angle (rad |
|
| 0 | angle of the first arm | -Inf | Inf | joint0 | hinge | angle (rad |
|
||||||
|
@@ -72,6 +72,7 @@ class SwimmerEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
In practice (and Gym implementation), the first two positional elements are
|
In practice (and Gym implementation), the first two positional elements are
|
||||||
omitted from the state space since the reward function is calculated based
|
omitted from the state space since the reward function is calculated based
|
||||||
on those values. Therefore, observation space has shape `(8,)` and looks like:
|
on those values. Therefore, observation space has shape `(8,)` and looks like:
|
||||||
|
|
||||||
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint| Unit |
|
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint| Unit |
|
||||||
|-----|-----------------------|----------------------|--------------------|----------------------|--------------------|--------------------|
|
|-----|-----------------------|----------------------|--------------------|----------------------|--------------------|--------------------|
|
||||||
| 0 | angle of the front tip | -Inf | Inf | rot | hinge | angle (rad) |
|
| 0 | angle of the front tip | -Inf | Inf | rot | hinge | angle (rad) |
|
||||||
@@ -89,7 +90,7 @@ class SwimmerEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
as *(x-coordinate before action - x-coordinate after action)/dt*. *dt* is
|
as *(x-coordinate before action - x-coordinate after action)/dt*. *dt* is
|
||||||
the time between actions and is dependeent on the frame_skip parameter
|
the time between actions and is dependeent on the frame_skip parameter
|
||||||
(default is 4), where the *dt* for one frame is 0.01 - making the
|
(default is 4), where the *dt* for one frame is 0.01 - making the
|
||||||
default *dt = 4*0.01 = 0.04*. This reward would be positive if the swimmer
|
default *dt = 4 * 0.01 = 0.04*. This reward would be positive if the swimmer
|
||||||
swims right as desired.
|
swims right as desired.
|
||||||
- *reward_control*: A negative reward for penalising the swimmer if it takes
|
- *reward_control*: A negative reward for penalising the swimmer if it takes
|
||||||
actions that are too large. It is measured as *-coefficient x
|
actions that are too large. It is measured as *-coefficient x
|
||||||
|
@@ -76,6 +76,7 @@ class Walker2dEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
|||||||
hidden from the algorithm, which in turn has to develop an abstract understanding of it
|
hidden from the algorithm, which in turn has to develop an abstract understanding of it
|
||||||
from the observed rewards. Therefore, observation space has shape `(17,)`
|
from the observed rewards. Therefore, observation space has shape `(17,)`
|
||||||
instead of `(18,)` and looks like:
|
instead of `(18,)` and looks like:
|
||||||
|
|
||||||
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit |
|
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit |
|
||||||
|-----|--------------------------------------------------------|----------------|-----------------|----------------------------------------|-------|------|
|
|-----|--------------------------------------------------------|----------------|-----------------|----------------------------------------|-------|------|
|
||||||
| 0 | z-coordinate of the top (height of hopper) | -Inf | Inf | rootz (torso) | slide | position (m) |
|
| 0 | z-coordinate of the top (height of hopper) | -Inf | Inf | rootz (torso) | slide | position (m) |
|
||||||
|
Reference in New Issue
Block a user