* Minor Mujoco Doc Typos

* Fixed precommit black
This commit is contained in:
Rushiv Arora
2022-02-07 22:53:47 -05:00
committed by GitHub
parent 80d28e5a9c
commit 2b6ec51580
10 changed files with 36 additions and 33 deletions

View File

@@ -12,10 +12,10 @@ class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
"""
### Description
This environment is based on the environment iintroduced by Schulman,
This environment is based on the environment introduced by Schulman,
Moritz, Levine, Jordan and Abbeel in ["High-Dimensional Continuous Control
Using Generalized Advantage Estimation"](https://arxiv.org/abs/1506.02438).
The ant is a 3D roboot consisting of one torso (free rotational body) with
The ant is a 3D robot consisting of one torso (free rotational body) with
four legs attached to it with each leg having two links. The goal is to
coordinate the four legs to move in the forward (right) direction by applying
torques on the eight hinges connecting the two links of each leg and the torso
@@ -41,14 +41,14 @@ class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
### Observation Space
The state space consists of positional values of different body parts of the hopper,
The state space consists of positional values of different body parts of the ant,
followed by the velocities of those individual parts (their derivatives) with all
the positions ordered before all the velocities.
The observation is a `ndarray` with shape `(111,)` where the elements correspond to the following:
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit |
|-----|---------------------------------------------------------|----------------|-----------------|----------------------------------------|-------|------|
|-----|-------------------------------------------------------------|----------------|-----------------|----------------------------------------|-------|------|
| 0 | x-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) |
| 1 | y-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) |
| 2 | z-coordinate of the torso (centre) | -Inf | Inf | torso | free | position (m) |
@@ -137,7 +137,7 @@ class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
No additional arguments are currently supported (in v2 and lower), but modifications
can be made to the XML file in the assets folder (or by changing the path to a modified
XML file in another folder)..
XML file in another folder).
```
env = gym.make('Ant-v2')
@@ -146,7 +146,7 @@ class AntEnv(mujoco_env.MujocoEnv, utils.EzPickle):
v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.
```
env = gym.make('Ant-v3', ctrl_cost_weight=0.1, ....)
env = gym.make('Ant-v3', ctrl_cost_weight=0.1, ...)
```
### Version History

View File

@@ -101,7 +101,7 @@ class HalfCheetahEnv(mujoco_env.MujocoEnv, utils.EzPickle):
(default is 5), where the *dt* for one frame is 0.01 - making the
default *dt = 5*0.01 = 0.05*. This reward would be positive if the cheetah
runs forward (right) desired.
- *reward_control*: A negative reward for penalising the swimmer if it takes
- *reward_control*: A negative reward for penalising the cheetah if it takes
actions that are too large. It is measured as *-coefficient x
sum(action<sup>2</sup>)* where *coefficient* is a parameter set for the
control and has a default value of 0.1

View File

@@ -18,8 +18,7 @@ class HopperEnv(mujoco_env.MujocoEnv, utils.EzPickle):
### Description
This environment is based on the work done by Erez, Tassa, and Todorov in
["Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks"]
(http://www.roboticsproceedings.org/rss07/p10.pdf). The environment aims to
["Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks"](http://www.roboticsproceedings.org/rss07/p10.pdf). The environment aims to
increase the number of independent state and control variables as compared to
the classic control environments. The hopper is a two-dimensional
one-legged figure that consist of four main body parts - the torso at the
@@ -71,6 +70,7 @@ class HopperEnv(mujoco_env.MujocoEnv, utils.EzPickle):
on that value. This value is hidden from the algorithm, which in turn has
to develop an abstract understanding of it from the observed rewards.
Therefore, observation space has shape `(11,)` instead of `(12,)` and looks like:
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint| Unit |
|-----|-----------------------|----------------------|--------------------|----------------------|--------------------|--------------------|
| 0 | z-coordinate of the top (height of hopper) | -Inf | Inf | rootz | slide | position (m) |
@@ -103,8 +103,8 @@ class HopperEnv(mujoco_env.MujocoEnv, utils.EzPickle):
### Starting State
All observations start in state
(0.0, 1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform nois
e in the range of [-0.005, 0.005] added to the values for stochasticity.
(0.0, 1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) with a uniform noise
in the range of [-0.005, 0.005] added to the values for stochasticity.
### Episode Termination
The episode terminates when any of the following happens:

View File

@@ -144,7 +144,7 @@ class HumanoidEnv(mujoco_env.MujocoEnv, utils.EzPickle):
### Rewards
The reward consists of three parts:
- *alive_bonus*: Every timestep that the ant is alive, it gets a reward of 5.
- *alive_bonus*: Every timestep that the humanoid is alive, it gets a reward of 5.
- *lin_vel_cost*: A reward of walking forward which is measured as *1.25 *
(average center of mass before action - average center of mass after action)/dt*.
*dt* is the time between actions and is dependent on the frame_skip parameter

View File

@@ -8,12 +8,12 @@ class HumanoidStandupEnv(mujoco_env.MujocoEnv, utils.EzPickle):
### Description
This environment is based on the environment introduced by Tassa, Erez and Todorov
in ["Synthesis and stabilization of complex behaviors through online trajectory optimization"]
(https://ieeexplore.ieee.org/document/6386025). The 3D bipedal robot is designed to simulate
a human. It has a torso (abdomen) with a pair of legs and arms. The legs each consist of two
links, and so the arms (representing the knees and elbows respectively). The environment starts
with the humanoid layiing on the ground, and then the goal of the environment is to make the
humanoid standup and then keep it standing by applying torques on the various hinges.
in ["Synthesis and stabilization of complex behaviors through online trajectory optimization"](https://ieeexplore.ieee.org/document/6386025).
The 3D bipedal robot is designed to simulate a human. It has a torso (abdomen) with a
pair of legs and arms. The legs each consist of two links, and so the arms (representing the
knees and elbows respectively). The environment starts with the humanoid layiing on the ground,
and then the goal of the environment is to make the humanoid standup and then keep it standing
by applying torques on the various hinges.
### Action Space
The agent take a 17-element vector for actions.

View File

@@ -29,7 +29,7 @@ class InvertedDoublePendulumEnv(mujoco_env.MujocoEnv, utils.EzPickle):
### Observation Space
The state space consists of positional values of different body parts of the hopper,
The state space consists of positional values of different body parts of the pendulum system,
followed by the velocities of those individual parts (their derivatives) with all the
positions ordered before all the velocities.

View File

@@ -30,7 +30,7 @@ class InvertedPendulumEnv(mujoco_env.MujocoEnv, utils.EzPickle):
### Observation Space
The state space consists of positional values of different body parts of
the hopper, followed by the velocities of those individual parts (their derivatives)
the pendulum system, followed by the velocities of those individual parts (their derivatives)
with all the positions ordered before all the velocities.
The observation is a `ndarray` with shape `(4,)` where the elements correspond to the following:

View File

@@ -46,6 +46,7 @@ class ReacherEnv(mujoco_env.MujocoEnv, utils.EzPickle):
reacher the state is created by combining only certain elements of the
position and velocity, and performing some function transformations on them.
If one is to read the `.xml` for reacher then they will find 4 joints:
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit |
|-----|-----------------------------|----------|----------|----------------------------------|-------|--------------------|
| 0 | angle of the first arm | -Inf | Inf | joint0 | hinge | angle (rad |

View File

@@ -72,6 +72,7 @@ class SwimmerEnv(mujoco_env.MujocoEnv, utils.EzPickle):
In practice (and Gym implementation), the first two positional elements are
omitted from the state space since the reward function is calculated based
on those values. Therefore, observation space has shape `(8,)` and looks like:
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint| Unit |
|-----|-----------------------|----------------------|--------------------|----------------------|--------------------|--------------------|
| 0 | angle of the front tip | -Inf | Inf | rot | hinge | angle (rad) |

View File

@@ -76,6 +76,7 @@ class Walker2dEnv(mujoco_env.MujocoEnv, utils.EzPickle):
hidden from the algorithm, which in turn has to develop an abstract understanding of it
from the observed rewards. Therefore, observation space has shape `(17,)`
instead of `(18,)` and looks like:
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint | Unit |
|-----|--------------------------------------------------------|----------------|-----------------|----------------------------------------|-------|------|
| 0 | z-coordinate of the top (height of hopper) | -Inf | Inf | rootz (torso) | slide | position (m) |