* Minor Mujoco Doc Typos

* Fixed precommit black
This commit is contained in:
Rushiv Arora
2022-02-07 22:53:47 -05:00
committed by GitHub
parent 80d28e5a9c
commit 2b6ec51580
10 changed files with 36 additions and 33 deletions

View File

@@ -12,10 +12,10 @@ class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
""" """
### Description ### Description
This environment is based on the environment iintroduced by Schulman, This environment is based on the environment introduced by Schulman,
Moritz, Levine, Jordan and Abbeel in ["High-Dimensional Continuous Control Moritz, Levine, Jordan and Abbeel in ["High-Dimensional Continuous Control
Using Generalized Advantage Estimation"](https://arxiv.org/abs/1506.02438). Using Generalized Advantage Estimation"](https://arxiv.org/abs/1506.02438).
The ant is a 3D roboot consisting of one torso (free rotational body) with The ant is a 3D robot consisting of one torso (free rotational body) with
four legs attached to it with each leg having two links. The goal is to four legs attached to it with each leg having two links. The goal is to
coordinate the four legs to move in the forward (right) direction by applying coordinate the four legs to move in the forward (right) direction by applying
torques on the eight hinges connecting the two links of each leg and the torso torques on the eight hinges connecting the two links of each leg and the torso
@@ -41,14 +41,14 @@ class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
### Observation Space ### Observation Space
The state space consists of positional values of different body parts of the hopper, The state space consists of positional values of different body parts of the ant,
followed by the velocities of those individual parts (their derivatives) with all followed by the velocities of those individual parts (their derivatives) with all
the positions ordered before all the velocities. the positions ordered before all the velocities.
The observation is a `ndarray` with shape `(111,)` where the elements correspond to the following: The observation is a `ndarray` with shape `(111,)` where the elements correspond to the following:
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit | | Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit |
|-----|---------------------------------------------------------|----------------|-----------------|----------------------------------------|-------|------| |-----|-------------------------------------------------------------|----------------|-----------------|----------------------------------------|-------|------|
| 0 | x-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) | | 0 | x-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) |
| 1 | y-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) | | 1 | y-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) |
| 2 | z-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) | | 2 | z-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) |
@@ -137,7 +137,7 @@ class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
No additional arguments are currently supported (in v2 and lower), but modifications No additional arguments are currently supported (in v2 and lower), but modifications
can be made to the XML file in the assets folder (or by changing the path to a modified can be made to the XML file in the assets folder (or by changing the path to a modified
XML file in another folder).. XML file in another folder).
``` ```
env = gym.make('Ant-v2') env = gym.make('Ant-v2')
@@ -146,7 +146,7 @@ class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.
``` ```
env = gym.make('Ant-v3', ctrl_cost_weight=0.1, ....) env = gym.make('Ant-v3', ctrl_cost_weight=0.1, ...)
``` ```
### Version History ### Version History

View File

@@ -101,7 +101,7 @@ class HalfCheetahEnv(mujoco_env.MujocoEnv, utils.EzPickle):
(default is 5), where the *dt* for one frame is 0.01 - making the (default is 5), where the *dt* for one frame is 0.01 - making the
default *dt = 5*0.01 = 0.05*. This reward would be positive if the cheetah default *dt = 5*0.01 = 0.05*. This reward would be positive if the cheetah
runs forward (right) desired. runs forward (right) desired.
- *reward_control*: A negative reward for penalising the swimmer if it takes - *reward_control*: A negative reward for penalising the cheetah if it takes
actions that are too large. It is measured as *-coefficient x actions that are too large. It is measured as *-coefficient x
sum(action<sup>2</sup>)* where *coefficient* is a parameter set for the sum(action<sup>2</sup>)* where *coefficient* is a parameter set for the
control and has a default value of 0.1 control and has a default value of 0.1

View File

@@ -18,8 +18,7 @@ class HopperEnv(mujoco_env.MujocoEnv, utils.EzPickle):
### Description ### Description
This environment is based on the work done by Erez, Tassa, and Todorov in This environment is based on the work done by Erez, Tassa, and Todorov in
["Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks"] ["Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks"](http://www.roboticsproceedings.org/rss07/p10.pdf). The environment aims to
(http://www.roboticsproceedings.org/rss07/p10.pdf). The environment aims to
increase the number of independent state and control variables as compared to increase the number of independent state and control variables as compared to
the classic control environments. The hopper is a two-dimensional the classic control environments. The hopper is a two-dimensional
one-legged figure that consist of four main body parts - the torso at the one-legged figure that consist of four main body parts - the torso at the
@@ -71,6 +70,7 @@ class HopperEnv(mujoco_env.MujocoEnv, utils.EzPickle):
on that value. This value is hidden from the algorithm, which in turn has on that value. This value is hidden from the algorithm, which in turn has
to develop an abstract understanding of it from the observed rewards. to develop an abstract understanding of it from the observed rewards.
Therefore, observation space has shape `(11,)` instead of `(12,)` and looks like: Therefore, observation space has shape `(11,)` instead of `(12,)` and looks like:
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint| Unit | | Num | Observation | Min | Max | Name (in corresponding XML file) | Joint| Unit |
|-----|-----------------------|----------------------|--------------------|----------------------|--------------------|--------------------| |-----|-----------------------|----------------------|--------------------|----------------------|--------------------|--------------------|
| 0 | z-coordinate of the top (height of hopper) | -Inf | Inf | rootz | slide | position (m) | | 0 | z-coordinate of the top (height of hopper) | -Inf | Inf | rootz | slide | position (m) |
@@ -103,8 +103,8 @@ class HopperEnv(mujoco_env.MujocoEnv, utils.EzPickle):
### Starting State ### Starting State
All observations start in state All observations start in state
(0.0, 1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform nois (0.0, 1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform noise
e in the range of [-0.005, 0.005] added to the values for stochasticity. in the range of [-0.005, 0.005] added to the values for stochasticity.
### Episode Termination ### Episode Termination
The episode terminates when any of the following happens: The episode terminates when any of the following happens:

View File

@@ -144,7 +144,7 @@ class HumanoidEnv(mujoco_env.MujocoEnv, utils.EzPickle):
### Rewards ### Rewards
The reward consists of three parts: The reward consists of three parts:
- *alive_bonus*: Every timestep that the ant is alive, it gets a reward of 5. - *alive_bonus*: Every timestep that the humanoid is alive, it gets a reward of 5.
- *lin_vel_cost*: A reward of walking forward which is measured as *1.25 * - *lin_vel_cost*: A reward of walking forward which is measured as *1.25 *
(average center of mass before action - average center of mass after action)/dt*. (average center of mass before action - average center of mass after action)/dt*.
*dt* is the time between actions and is dependent on the frame_skip parameter *dt* is the time between actions and is dependent on the frame_skip parameter

View File

@@ -8,12 +8,12 @@ class HumanoidStandupEnv(mujoco_env.MujocoEnv, utils.EzPickle):
### Description ### Description
This environment is based on the environment introduced by Tassa, Erez and Todorov This environment is based on the environment introduced by Tassa, Erez and Todorov
in ["Synthesis and stabilization of complex behaviors through online trajectory optimization"] in ["Synthesis and stabilization of complex behaviors through online trajectory optimization"](https://ieeexplore.ieee.org/document/6386025).
(https://ieeexplore.ieee.org/document/6386025). The 3D bipedal robot is designed to simulate The 3D bipedal robot is designed to simulate a human. It has a torso (abdomen) with a
a human. It has a torso (abdomen) with a pair of legs and arms. The legs each consist of two pair of legs and arms. The legs each consist of two links, and so the arms (representing the
links, and so the arms (representing the knees and elbows respectively). The environment starts knees and elbows respectively). The environment starts with the humanoid layiing on the ground,
with the humanoid layiing on the ground, and then the goal of the environment is to make the and then the goal of the environment is to make the humanoid standup and then keep it standing
humanoid standup and then keep it standing by applying torques on the various hinges. by applying torques on the various hinges.
### Action Space ### Action Space
The agent take a 17-element vector for actions. The agent take a 17-element vector for actions.

View File

@@ -29,7 +29,7 @@ class InvertedDoublePendulumEnv(mujoco_env.MujocoEnv, utils.EzPickle):
### Observation Space ### Observation Space
The state space consists of positional values of different body parts of the hopper, The state space consists of positional values of different body parts of the pendulum system,
followed by the velocities of those individual parts (their derivatives) with all the followed by the velocities of those individual parts (their derivatives) with all the
positions ordered before all the velocities. positions ordered before all the velocities.

View File

@@ -30,7 +30,7 @@ class InvertedPendulumEnv(mujoco_env.MujocoEnv, utils.EzPickle):
### Observation Space ### Observation Space
The state space consists of positional values of different body parts of The state space consists of positional values of different body parts of
the hopper, followed by the velocities of those individual parts (their derivatives) the pendulum system, followed by the velocities of those individual parts (their derivatives)
with all the positions ordered before all the velocities. with all the positions ordered before all the velocities.
The observation is a `ndarray` with shape `(4,)` where the elements correspond to the following: The observation is a `ndarray` with shape `(4,)` where the elements correspond to the following:

View File

@@ -46,6 +46,7 @@ class ReacherEnv(mujoco_env.MujocoEnv, utils.EzPickle):
reacher the state is created by combining only certain elements of the reacher the state is created by combining only certain elements of the
position and velocity, and performing some function transformations on them. position and velocity, and performing some function transformations on them.
If one is to read the `.xml` for reacher then they will find 4 joints: If one is to read the `.xml` for reacher then they will find 4 joints:
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit | | Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit |
|-----|-----------------------------|----------|----------|----------------------------------|-------|--------------------| |-----|-----------------------------|----------|----------|----------------------------------|-------|--------------------|
| 0 | angle of the first arm | -Inf | Inf | joint0 | hinge | angle (rad | | 0 | angle of the first arm | -Inf | Inf | joint0 | hinge | angle (rad |

View File

@@ -72,6 +72,7 @@ class SwimmerEnv(mujoco_env.MujocoEnv, utils.EzPickle):
In practice (and Gym implementation), the first two positional elements are In practice (and Gym implementation), the first two positional elements are
omitted from the state space since the reward function is calculated based omitted from the state space since the reward function is calculated based
on those values. Therefore, observation space has shape `(8,)` and looks like: on those values. Therefore, observation space has shape `(8,)` and looks like:
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint| Unit | | Num | Observation | Min | Max | Name (in corresponding XML file) | Joint| Unit |
|-----|-----------------------|----------------------|--------------------|----------------------|--------------------|--------------------| |-----|-----------------------|----------------------|--------------------|----------------------|--------------------|--------------------|
| 0 | angle of the front tip | -Inf | Inf | rot | hinge | angle (rad) | | 0 | angle of the front tip | -Inf | Inf | rot | hinge | angle (rad) |

View File

@@ -76,6 +76,7 @@ class Walker2dEnv(mujoco_env.MujocoEnv, utils.EzPickle):
hidden from the algorithm, which in turn has to develop an abstract understanding of it hidden from the algorithm, which in turn has to develop an abstract understanding of it
from the observed rewards. Therefore, observation space has shape `(17,)` from the observed rewards. Therefore, observation space has shape `(17,)`
instead of `(18,)` and looks like: instead of `(18,)` and looks like:
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit | | Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit |
|-----|--------------------------------------------------------|----------------|-----------------|----------------------------------------|-------|------| |-----|--------------------------------------------------------|----------------|-----------------|----------------------------------------|-------|------|
| 0 | z-coordinate of the top (height of hopper) | -Inf | Inf | rootz (torso) | slide | position (m) | | 0 | z-coordinate of the top (height of hopper) | -Inf | Inf | rootz (torso) | slide | position (m) |