2016-04-27 08:00:58 -07:00
|
|
|
import numpy as np
|
|
|
|
from gym import utils
|
|
|
|
from gym.envs.mujoco import mujoco_env
|
|
|
|
|
2021-07-29 02:26:34 +02:00
|
|
|
|
2016-04-27 08:00:58 -07:00
|
|
|
class InvertedPendulumEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
2022-02-02 09:00:27 -05:00
|
|
|
"""
|
|
|
|
### Description
|
|
|
|
|
|
|
|
This environment is the cartpole environment based on the work done by
|
|
|
|
Barto, Sutton, and Anderson in ["Neuronlike adaptive elements that can
|
|
|
|
solve difficult learning control problems"](https://ieeexplore.ieee.org/document/6313077),
|
|
|
|
just like in the classic environments but now powered by the Mujoco physics simulator -
|
|
|
|
allowing for more complex experiments (such as varying the effects of gravity).
|
|
|
|
This environment involves a cart that can moved linearly, with a pole fixed on it
|
|
|
|
at one end and having another end free. The cart can be pushed left or right, and the
|
|
|
|
goal is to balance the pole on the top of the cart by applying forces on the cart.
|
|
|
|
|
|
|
|
### Action Space
|
|
|
|
The agent take a 1-element vector for actions.
|
|
|
|
|
|
|
|
The action space is a continuous `(action)` in `[-3, 3]`, where `action` represents
|
|
|
|
the numerical force applied to the cart (with magnitude representing the amount of
|
|
|
|
force and sign representing the direction)
|
|
|
|
|
|
|
|
| Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit |
|
|
|
|
|-----|---------------------------|-------------|-------------|----------------------------------|-------|-----------|
|
|
|
|
| 0 | Force applied on the cart | -3 | 3 | slider | slide | Force (N) |
|
|
|
|
|
|
|
|
### Observation Space
|
|
|
|
|
|
|
|
The state space consists of positional values of different body parts of
|
2022-02-07 22:53:47 -05:00
|
|
|
the pendulum system, followed by the velocities of those individual parts (their derivatives)
|
2022-02-02 09:00:27 -05:00
|
|
|
with all the positions ordered before all the velocities.
|
|
|
|
|
|
|
|
The observation is a `ndarray` with shape `(4,)` where the elements correspond to the following:
|
|
|
|
|
|
|
|
| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint| Unit |
|
|
|
|
|-----|-----------------------|----------------------|--------------------|----------------------|--------------------|--------------------|
|
|
|
|
| 0 | position of the cart along the linear surface | -Inf | Inf | slider | slide | position (m) |
|
|
|
|
| 1 | vertical angle of the pole on the cart | -Inf | Inf | hinge | hinge | angle (rad) |
|
|
|
|
| 2 | linear velocity of the cart | -Inf | Inf | slider | slide | velocity (m/s) |
|
|
|
|
| 3 | angular velocity of the pole on the cart | -Inf | Inf | hinge | hinge | anglular velocity (rad/s) |
|
|
|
|
|
|
|
|
|
|
|
|
### Rewards
|
|
|
|
|
|
|
|
The goal is to make the inverted pendulum stand upright (within a certain angle limit)
|
|
|
|
as long as possible - as such a reward of +1 is awarded for each timestep that
|
|
|
|
the pole is upright.
|
|
|
|
|
|
|
|
### Starting State
|
|
|
|
All observations start in state
|
|
|
|
(0.0, 0.0, 0.0, 0.0) with a uniform noise in the range
|
|
|
|
of [-0.01, 0.01] added to the values for stochasticity.
|
|
|
|
|
|
|
|
### Episode Termination
|
|
|
|
The episode terminates when any of the following happens:
|
|
|
|
|
|
|
|
1. The episode duration reaches 1000 timesteps.
|
|
|
|
2. Any of the state space values is no longer finite.
|
|
|
|
3. The absolutely value of the vertical angle between the pole and the cart is greater than 0.2 radian.
|
|
|
|
|
|
|
|
### Arguments
|
|
|
|
|
|
|
|
No additional arguments are currently supported (in v2 and lower),
|
|
|
|
but modifications can be made to the XML file in the assets folder
|
|
|
|
(or by changing the path to a modified XML file in another folder).
|
|
|
|
|
|
|
|
```
|
|
|
|
env = gym.make('InvertedPendulum-v2')
|
|
|
|
```
|
|
|
|
There is no v3 for InvertedPendulum, unlike the robot environments where a
|
|
|
|
v3 and beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.
|
|
|
|
|
|
|
|
|
|
|
|
### Version History
|
|
|
|
|
|
|
|
* v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
|
|
|
|
* v2: All continuous control environments now use mujoco_py >= 1.50
|
|
|
|
* v1: max_time_steps raised to 1000 for robot based tasks (including inverted pendulum)
|
|
|
|
* v0: Initial versions release (1.0.0)
|
|
|
|
|
|
|
|
"""
|
|
|
|
|
2016-04-27 08:00:58 -07:00
|
|
|
def __init__(self):
|
|
|
|
utils.EzPickle.__init__(self)
|
2021-07-29 02:26:34 +02:00
|
|
|
mujoco_env.MujocoEnv.__init__(self, "inverted_pendulum.xml", 2)
|
2016-04-27 08:00:58 -07:00
|
|
|
|
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
|
|
|
def step(self, a):
|
2016-04-27 08:00:58 -07:00
|
|
|
reward = 1.0
|
|
|
|
self.do_simulation(a, self.frame_skip)
|
|
|
|
ob = self._get_obs()
|
2021-07-29 02:26:34 +02:00
|
|
|
notdone = np.isfinite(ob).all() and (np.abs(ob[1]) <= 0.2)
|
2016-04-27 08:00:58 -07:00
|
|
|
done = not notdone
|
|
|
|
return ob, reward, done, {}
|
|
|
|
|
2016-04-30 22:47:51 -07:00
|
|
|
def reset_model(self):
|
2021-07-29 15:39:42 -04:00
|
|
|
qpos = self.init_qpos + self.np_random.uniform(
|
|
|
|
size=self.model.nq, low=-0.01, high=0.01
|
|
|
|
)
|
|
|
|
qvel = self.init_qvel + self.np_random.uniform(
|
|
|
|
size=self.model.nv, low=-0.01, high=0.01
|
|
|
|
)
|
2016-04-30 22:56:12 -07:00
|
|
|
self.set_state(qpos, qvel)
|
2016-04-27 08:00:58 -07:00
|
|
|
return self._get_obs()
|
|
|
|
|
|
|
|
def _get_obs(self):
|
2018-01-24 15:42:29 -08:00
|
|
|
return np.concatenate([self.sim.data.qpos, self.sim.data.qvel]).ravel()
|
2016-04-27 08:00:58 -07:00
|
|
|
|
|
|
|
def viewer_setup(self):
|
|
|
|
v = self.viewer
|
2017-02-22 17:24:27 -08:00
|
|
|
v.cam.trackbodyid = 0
|
2018-02-26 17:35:07 +01:00
|
|
|
v.cam.distance = self.model.stat.extent
|