Gymnasium/gym/envs/mujoco/swimmer_v4.py

__credits__ = ["Rushiv Arora"]

import numpy as np

from gym import utils
from gym.envs.mujoco import MujocoEnv
from gym.spaces import Box

DEFAULT_CAMERA_CONFIG = {}


class SwimmerEnv(MujocoEnv, utils.EzPickle):
    """
    ### Description

    This environment corresponds to the Swimmer environment described in Rémi Coulom's PhD thesis
    ["Reinforcement Learning Using Neural Networks, with Applications to Motor Control"](https://tel.archives-ouvertes.fr/tel-00003985/document).
    The environment aims to increase the number of independent state and control
    variables as compared to the classic control environments. The swimmers
    consist of three or more segments ('***links***') and one less articulation
    joints ('***rotors***') - one rotor joint connecting exactly two links to
    form a linear chain. The swimmer is suspended in a two dimensional pool and
    always starts in the same position (subject to some deviation drawn from an
    uniform distribution), and the goal is to move as fast as possible towards
    the right by applying torque on the rotors and using the fluids friction.

    ### Notes

    The problem parameters are:
    Problem parameters:
    * *n*: number of body parts
    * *m<sub>i</sub>*: mass of part *i* (*i* ∈ {1...n})
    * *l<sub>i</sub>*: length of part *i* (*i* ∈ {1...n})
    * *k*: viscous-friction coefficient

    While the default environment has *n* = 3, *l<sub>i</sub>* = 0.1,
    and *k* = 0.1. It is possible to pass a custom MuJoCo XML file during construction to increase the
    number of links, or to tweak any of the parameters.

    ### Action Space
    The action space is a `Box(-1, 1, (2,), float32)`. An action represents the torques applied between *links*

    | Num | Action                             | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit         |
    |-----|------------------------------------|-------------|-------------|----------------------------------|-------|--------------|
    | 0   | Torque applied on the first rotor  | -1          | 1           | rot2                             | hinge | torque (N m) |
    | 1   | Torque applied on the second rotor | -1          | 1           | rot3                             | hinge | torque (N m) |

    ### Observation Space

    By default, observations consists of:
    * θ<sub>i</sub>: angle of part *i* with respect to the *x* axis
    * θ<sub>i</sub>': its derivative with respect to time (angular velocity)

    In the default case, observations do not include the x- and y-coordinates of the front tip. These may
    be included by passing `exclude_current_positions_from_observation=False` during construction.
    Then, the observation space will have 10 dimensions where the first two dimensions
    represent the x- and y-coordinates of the front tip.
    Regardless of whether `exclude_current_positions_from_observation` was set to true or false, the x- and y-coordinates
    will be returned in `info` with keys `"x_position"` and `"y_position"`, respectively.

    By default, the observation is a `ndarray` with shape `(8,)` where the elements correspond to the following:

    | Num | Observation                          | Min  | Max | Name (in corresponding XML file) | Joint | Unit                     |
    | --- | ------------------------------------ | ---- | --- | -------------------------------- | ----- | ------------------------ |
    | 0   | angle of the front tip               | -Inf | Inf | rot                              | hinge | angle (rad)              |
    | 1   | angle of the first rotor             | -Inf | Inf | rot2                             | hinge | angle (rad)              |
    | 2   | angle of the second rotor            | -Inf | Inf | rot3                             | hinge | angle (rad)              |
    | 3   | velocity of the tip along the x-axis | -Inf | Inf | slider1                          | slide | velocity (m/s)           |
    | 4   | velocity of the tip along the y-axis | -Inf | Inf | slider2                          | slide | velocity (m/s)           |
    | 5   | angular velocity of front tip        | -Inf | Inf | rot                              | hinge | angular velocity (rad/s) |
    | 6   | angular velocity of first rotor      | -Inf | Inf | rot2                             | hinge | angular velocity (rad/s) |
    | 7   | angular velocity of second rotor     | -Inf | Inf | rot3                             | hinge | angular velocity (rad/s) |

    ### Rewards
    The reward consists of two parts:
    - *forward_reward*: A reward of moving forward which is measured
    as *`forward_reward_weight` * (x-coordinate before action - x-coordinate after action)/dt*. *dt* is
    the time between actions and is dependent on the frame_skip parameter
    (default is 4), where the frametime is 0.01 - making the
    default *dt = 4 * 0.01 = 0.04*. This reward would be positive if the swimmer
    swims right as desired.
    - *ctrl_cost*: A cost for penalising the swimmer if it takes
    actions that are too large. It is measured as *`ctrl_cost_weight` *
    sum(action<sup>2</sup>)* where *`ctrl_cost_weight`* is a parameter set for the
    control and has a default value of 1e-4

    The total reward returned is ***reward*** *=* *forward_reward - ctrl_cost* and `info` will also contain the individual reward terms

    ### Starting State
    All observations start in state (0,0,0,0,0,0,0,0) with a Uniform noise in the range of [-`reset_noise_scale`, `reset_noise_scale`] is added to the initial state for stochasticity.

    ### Episode End
    The episode truncates when the episode length is greater than 1000.

    ### Arguments

    No additional arguments are currently supported in v2 and lower.

    ```
    gym.make('Swimmer-v4')
    ```

    v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.

    ```
    env = gym.make('Swimmer-v4', ctrl_cost_weight=0.1, ....)
    ```

    | Parameter                                    | Type      | Default         | Description                                                                                                                                                               |
    | -------------------------------------------- | --------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | `xml_file`                                   | **str**   | `"swimmer.xml"` | Path to a MuJoCo model                                                                                                                                                    |
    | `forward_reward_weight`                      | **float** | `1.0`           | Weight for _forward_reward_ term (see section on reward)                                                                                                                  |
    | `ctrl_cost_weight`                           | **float** | `1e-4`          | Weight for _ctrl_cost_ term (see section on reward)                                                                                                                       |
    | `reset_noise_scale`                          | **float** | `0.1`           | Scale of random perturbations of initial position and velocity (see section on Starting State)                                                                            |
    | `exclude_current_positions_from_observation` | **bool**  | `True`          | Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies |


    ### Version History

    * v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3
    * v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)
    * v2: All continuous control environments now use mujoco_py >= 1.50
    * v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.
    * v0: Initial versions release (1.0.0)
    """

    metadata = {
        "render_modes": [
            "human",
            "rgb_array",
            "depth_array",
            "single_rgb_array",
            "single_depth_array",
        ],
        "render_fps": 25,
    }

    def __init__(
        self,
        forward_reward_weight=1.0,
        ctrl_cost_weight=1e-4,
        reset_noise_scale=0.1,
        exclude_current_positions_from_observation=True,
        **kwargs
    ):
        utils.EzPickle.__init__(
            self,
            forward_reward_weight,
            ctrl_cost_weight,
            reset_noise_scale,
            exclude_current_positions_from_observation,
            **kwargs
        )

        self._forward_reward_weight = forward_reward_weight
        self._ctrl_cost_weight = ctrl_cost_weight

        self._reset_noise_scale = reset_noise_scale

        self._exclude_current_positions_from_observation = (
            exclude_current_positions_from_observation
        )
        if exclude_current_positions_from_observation:
            observation_space = Box(
                low=-np.inf, high=np.inf, shape=(8,), dtype=np.float64
            )
        else:
            observation_space = Box(
                low=-np.inf, high=np.inf, shape=(10,), dtype=np.float64
            )
        MujocoEnv.__init__(
            self, "swimmer.xml", 4, observation_space=observation_space, **kwargs
        )

    def control_cost(self, action):
        control_cost = self._ctrl_cost_weight * np.sum(np.square(action))
        return control_cost

    def step(self, action):
        xy_position_before = self.data.qpos[0:2].copy()
        self.do_simulation(action, self.frame_skip)
        xy_position_after = self.data.qpos[0:2].copy()

        xy_velocity = (xy_position_after - xy_position_before) / self.dt
        x_velocity, y_velocity = xy_velocity

        forward_reward = self._forward_reward_weight * x_velocity

        ctrl_cost = self.control_cost(action)

        observation = self._get_obs()
        reward = forward_reward - ctrl_cost
        info = {
            "reward_fwd": forward_reward,
            "reward_ctrl": -ctrl_cost,
            "x_position": xy_position_after[0],
            "y_position": xy_position_after[1],
            "distance_from_origin": np.linalg.norm(xy_position_after, ord=2),
            "x_velocity": x_velocity,
            "y_velocity": y_velocity,
            "forward_reward": forward_reward,
        }

        self.renderer.render_step()
        return observation, reward, False, False, info

    def _get_obs(self):
        position = self.data.qpos.flat.copy()
        velocity = self.data.qvel.flat.copy()

        if self._exclude_current_positions_from_observation:
            position = position[2:]

        observation = np.concatenate([position, velocity]).ravel()
        return observation

    def reset_model(self):
        noise_low = -self._reset_noise_scale
        noise_high = self._reset_noise_scale

        qpos = self.init_qpos + self.np_random.uniform(
            low=noise_low, high=noise_high, size=self.model.nq
        )
        qvel = self.init_qvel + self.np_random.uniform(
            low=noise_low, high=noise_high, size=self.model.nv
        )

        self.set_state(qpos, qvel)

        observation = self._get_obs()
        return observation

    def viewer_setup(self):
        assert self.viewer is not None
        for key, value in DEFAULT_CAMERA_CONFIG.items():
            if isinstance(value, np.ndarray):
                getattr(self.viewer.cam, key)[:] = value
            else:
                setattr(self.viewer.cam, key, value)
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00			`__credits__ = ["Rushiv Arora"]`

			`import numpy as np`

			`from gym import utils`
split base mujoco env class (#2946) 2022-07-06 11:18:03 -04:00			`from gym.envs.mujoco import MujocoEnv`
Initialize observation spaces and pytest (#2929) * Remove step initialization for mujoco obs spaces * remove step initialization for mujoco obs space * pre-commit pytest obs space mujoco 2022-06-30 10:59:59 -04:00			`from gym.spaces import Box`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
			`DEFAULT_CAMERA_CONFIG = {}`


split base mujoco env class (#2946) 2022-07-06 11:18:03 -04:00			`class SwimmerEnv(MujocoEnv, utils.EzPickle):`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00			`"""`
			`### Description`

			`This environment corresponds to the Swimmer environment described in Rémi Coulom's PhD thesis`
			`["Reinforcement Learning Using Neural Networks, with Applications to Motor Control"](https://tel.archives-ouvertes.fr/tel-00003985/document).`
			`The environment aims to increase the number of independent state and control`
			`variables as compared to the classic control environments. The swimmers`
			`consist of three or more segments ('*links*') and one less articulation`
			`joints ('*rotors*') - one rotor joint connecting exactly two links to`
			`form a linear chain. The swimmer is suspended in a two dimensional pool and`
			`always starts in the same position (subject to some deviation drawn from an`
			`uniform distribution), and the goal is to move as fast as possible towards`
			`the right by applying torque on the rotors and using the fluids friction.`

			`### Notes`

			`The problem parameters are:`
			`Problem parameters:`
			`* n: number of body parts`
			`* m<sub>i</sub>: mass of part i (i ∈ {1...n})`
			`* l<sub>i</sub>: length of part i (i ∈ {1...n})`
			`* k: viscous-friction coefficient`

			`While the default environment has n = 3, l<sub>i</sub> = 0.1,`
Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			`and k = 0.1. It is possible to pass a custom MuJoCo XML file during construction to increase the`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00			`number of links, or to tweak any of the parameters.`

			`### Action Space`
Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			The action space is a `Box(-1, 1, (2,), float32)`. An action represents the torques applied between links
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
			`\| Num \| Action \| Control Min \| Control Max \| Name (in corresponding XML file) \| Joint \| Unit \|`
			`\|-----\|------------------------------------\|-------------\|-------------\|----------------------------------\|-------\|--------------\|`
			`\| 0 \| Torque applied on the first rotor \| -1 \| 1 \| rot2 \| hinge \| torque (N m) \|`
			`\| 1 \| Torque applied on the second rotor \| -1 \| 1 \| rot3 \| hinge \| torque (N m) \|`

			`### Observation Space`

Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			`By default, observations consists of:`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00			`* θ<sub>i</sub>: angle of part i with respect to the x axis`
Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			`* θ<sub>i</sub>': its derivative with respect to time (angular velocity)`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			`In the default case, observations do not include the x- and y-coordinates of the front tip. These may`
			be included by passing `exclude_current_positions_from_observation=False` during construction.
			`Then, the observation space will have 10 dimensions where the first two dimensions`
			`represent the x- and y-coordinates of the front tip.`
			Regardless of whether `exclude_current_positions_from_observation` was set to true or false, the x- and y-coordinates
			will be returned in `info` with keys `"x_position"` and `"y_position"`, respectively.
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			By default, the observation is a `ndarray` with shape `(8,)` where the elements correspond to the following:
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
Mujoco and Website Docstring fixes (#2834) * Fixed minor docstring issues found on the website * Updated the top docstrings with mujoco environments that fixed the observation and action tables. Added v4 gym.make code 2022-05-24 23:09:05 +01:00			`\| Num \| Observation \| Min \| Max \| Name (in corresponding XML file) \| Joint \| Unit \|`
Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			`\| --- \| ------------------------------------ \| ---- \| --- \| -------------------------------- \| ----- \| ------------------------ \|`
Mujoco and Website Docstring fixes (#2834) * Fixed minor docstring issues found on the website * Updated the top docstrings with mujoco environments that fixed the observation and action tables. Added v4 gym.make code 2022-05-24 23:09:05 +01:00			`\| 0 \| angle of the front tip \| -Inf \| Inf \| rot \| hinge \| angle (rad) \|`
Update swimmer_v4.py (#2993) Fix the type-o of rotors 2022-07-26 13:23:28 -07:00			`\| 1 \| angle of the first rotor \| -Inf \| Inf \| rot2 \| hinge \| angle (rad) \|`
Mujoco and Website Docstring fixes (#2834) * Fixed minor docstring issues found on the website * Updated the top docstrings with mujoco environments that fixed the observation and action tables. Added v4 gym.make code 2022-05-24 23:09:05 +01:00			`\| 2 \| angle of the second rotor \| -Inf \| Inf \| rot3 \| hinge \| angle (rad) \|`
			`\| 3 \| velocity of the tip along the x-axis \| -Inf \| Inf \| slider1 \| slide \| velocity (m/s) \|`
			`\| 4 \| velocity of the tip along the y-axis \| -Inf \| Inf \| slider2 \| slide \| velocity (m/s) \|`
			`\| 5 \| angular velocity of front tip \| -Inf \| Inf \| rot \| hinge \| angular velocity (rad/s) \|`
Update swimmer_v4.py (#2993) Fix the type-o of rotors 2022-07-26 13:23:28 -07:00			`\| 6 \| angular velocity of first rotor \| -Inf \| Inf \| rot2 \| hinge \| angular velocity (rad/s) \|`
			`\| 7 \| angular velocity of second rotor \| -Inf \| Inf \| rot3 \| hinge \| angular velocity (rad/s) \|`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
			`### Rewards`
			`The reward consists of two parts:`
Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			`- forward_reward: A reward of moving forward which is measured`
			as `forward_reward_weight` (x-coordinate before action - x-coordinate after action)/dt. dt* is
Mujoco and Website Docstring fixes (#2834) * Fixed minor docstring issues found on the website * Updated the top docstrings with mujoco environments that fixed the observation and action tables. Added v4 gym.make code 2022-05-24 23:09:05 +01:00			`the time between actions and is dependent on the frame_skip parameter`
Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			`(default is 4), where the frametime is 0.01 - making the`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00			`default dt = 4 0.01 = 0.04*. This reward would be positive if the swimmer`
			`swims right as desired.`
Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			`- ctrl_cost: A cost for penalising the swimmer if it takes`
			actions that are too large. It is measured as `ctrl_cost_weight`
			sum(action<sup>2</sup>)* where `ctrl_cost_weight` is a parameter set for the
			`control and has a default value of 1e-4`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			The total reward returned is *reward* = forward_reward - ctrl_cost and `info` will also contain the individual reward terms
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
			`### Starting State`
Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			All observations start in state (0,0,0,0,0,0,0,0) with a Uniform noise in the range of [-`reset_noise_scale`, `reset_noise_scale`] is added to the initial state for stochasticity.
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
New Step API with terminated, truncated bools instead of done (#2752) 2022-07-10 02:18:06 +05:30			`### Episode End`
			`The episode truncates when the episode length is greater than 1000.`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
			`### Arguments`

Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			`No additional arguments are currently supported in v2 and lower.`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
			```
Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			`gym.make('Swimmer-v4')`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00			```

Mujoco and Website Docstring fixes (#2834) * Fixed minor docstring issues found on the website * Updated the top docstrings with mujoco environments that fixed the observation and action tables. Added v4 gym.make code 2022-05-24 23:09:05 +01:00			`v3 and v4 take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
			```
Mujoco and Website Docstring fixes (#2834) * Fixed minor docstring issues found on the website * Updated the top docstrings with mujoco environments that fixed the observation and action tables. Added v4 gym.make code 2022-05-24 23:09:05 +01:00			`env = gym.make('Swimmer-v4', ctrl_cost_weight=0.1, ....)`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00			```

Roll back mujoco docstrings (#2913) 2022-06-21 15:57:21 +02:00			`\| Parameter \| Type \| Default \| Description \|`
			`\| -------------------------------------------- \| --------- \| --------------- \| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- \|`
			\| `xml_file` \| str \| `"swimmer.xml"` \| Path to a MuJoCo model \|
			\| `forward_reward_weight` \| float \| `1.0` \| Weight for _forward_reward_ term (see section on reward) \|
			\| `ctrl_cost_weight` \| float \| `1e-4` \| Weight for _ctrl_cost_ term (see section on reward) \|
			\| `reset_noise_scale` \| float \| `0.1` \| Scale of random perturbations of initial position and velocity (see section on Starting State) \|
			\| `exclude_current_positions_from_observation` \| bool \| `True` \| Whether or not to omit the x- and y-coordinates from observations. Excluding the position can serve as an inductive bias to induce position-agnostic behavior in policies \|


Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00			`### Version History`

Mujoco and Website Docstring fixes (#2834) * Fixed minor docstring issues found on the website * Updated the top docstrings with mujoco environments that fixed the observation and action tables. Added v4 gym.make code 2022-05-24 23:09:05 +01:00			`* v4: all mujoco environments now use the mujoco bindings in mujoco>=2.1.3`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00			`* v3: support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc. rgb rendering comes from tracking camera (so agent does not run away from screen)`
			`* v2: All continuous control environments now use mujoco_py >= 1.50`
			`* v1: max_time_steps raised to 1000 for robot based tasks. Added reward_threshold to environments.`
			`* v0: Initial versions release (1.0.0)`
			`"""`

Mujoco metadata (#2904) 2022-06-19 21:50:31 +01:00			`metadata = {`
			`"render_modes": [`
			`"human",`
			`"rgb_array",`
			`"depth_array",`
			`"single_rgb_array",`
			`"single_depth_array",`
			`],`
			`"render_fps": 25,`
			`}`

Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00			`def __init__(`
			`self,`
			`forward_reward_weight=1.0,`
			`ctrl_cost_weight=1e-4,`
			`reset_noise_scale=0.1,`
			`exclude_current_positions_from_observation=True,`
Fix: add mujoco render arguments to init (#2891) * fix: add render_mode getter to Wrappers * fix: add render args to mujoco init * reformat * add type hints 2022-06-16 18:29:50 +02:00			`**kwargs`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00			`):`
Fix unpickling Box2D and MuJoCo envs (#3025) * Try to fix car racing unpickling * Fix EzPickle for BipedalWalker and LunarLander * Shamelessly steal the pickle-unpickle test from Mark, with slight modifications * CarRacing EzPickle fix * Mujoco ezpickle fix 2022-08-16 18:05:36 +02:00			`utils.EzPickle.__init__(`
			`self,`
			`forward_reward_weight,`
			`ctrl_cost_weight,`
			`reset_noise_scale,`
			`exclude_current_positions_from_observation,`
			`**kwargs`
			`)`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
			`self._forward_reward_weight = forward_reward_weight`
			`self._ctrl_cost_weight = ctrl_cost_weight`

			`self._reset_noise_scale = reset_noise_scale`

			`self._exclude_current_positions_from_observation = (`
			`exclude_current_positions_from_observation`
			`)`
Initialize observation spaces and pytest (#2929) * Remove step initialization for mujoco obs spaces * remove step initialization for mujoco obs space * pre-commit pytest obs space mujoco 2022-06-30 10:59:59 -04:00			`if exclude_current_positions_from_observation:`
			`observation_space = Box(`
			`low=-np.inf, high=np.inf, shape=(8,), dtype=np.float64`
			`)`
			`else:`
			`observation_space = Box(`
			`low=-np.inf, high=np.inf, shape=(10,), dtype=np.float64`
			`)`
split base mujoco env class (#2946) 2022-07-06 11:18:03 -04:00			`MujocoEnv.__init__(`
Initialize observation spaces and pytest (#2929) * Remove step initialization for mujoco obs spaces * remove step initialization for mujoco obs space * pre-commit pytest obs space mujoco 2022-06-30 10:59:59 -04:00			`self, "swimmer.xml", 4, observation_space=observation_space, **kwargs`
			`)`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
			`def control_cost(self, action):`
			`control_cost = self._ctrl_cost_weight * np.sum(np.square(action))`
			`return control_cost`

			`def step(self, action):`
			`xy_position_before = self.data.qpos[0:2].copy()`
			`self.do_simulation(action, self.frame_skip)`
			`xy_position_after = self.data.qpos[0:2].copy()`

			`xy_velocity = (xy_position_after - xy_position_before) / self.dt`
			`x_velocity, y_velocity = xy_velocity`

			`forward_reward = self._forward_reward_weight * x_velocity`

			`ctrl_cost = self.control_cost(action)`

			`observation = self._get_obs()`
			`reward = forward_reward - ctrl_cost`
			`info = {`
			`"reward_fwd": forward_reward,`
			`"reward_ctrl": -ctrl_cost,`
			`"x_position": xy_position_after[0],`
			`"y_position": xy_position_after[1],`
			`"distance_from_origin": np.linalg.norm(xy_position_after, ord=2),`
			`"x_velocity": x_velocity,`
			`"y_velocity": y_velocity,`
			`"forward_reward": forward_reward,`
			`}`

Render API (#2671) * add pygame GUI for frozen_lake.py env * add new line at EOF * pre-commit reformat * improve graphics * new images and dynamic window size * darker tile borders and fix ICC profile * pre-commit hook * adjust elf and stool size * Update frozen_lake.py * reformat * fix #2600 * #2600 * add rgb_array support * reformat * test render api change on FrozenLake * add render support for reset on frozenlake * add clock on pygame render * new render api for blackjack * new render api for cliffwalking * new render api for Env class * update reset method, lunar and Env * fix wrapper * fix reset lunar * new render api for box2d envs * new render api for mujoco envs * fix bug * new render api for classic control envs * fix tests * add render_mode None for CartPole * new render api for test fake envs * pre-commit hook * fix FrozenLake * fix FrozenLake * more render_mode to super - frozenlake * remove kwargs from frozen_lake new * pre-commit hook * add deprecated render method * add backwards compatibility * fix test * add _render * move pygame.init() (avoid pygame dependency on init) * fix pygame dependencies * remove collect_render() maintain multi-behaviours .render() * add type hints * fix renderer * don't call .render() with None * improve docstring * add single_rgb_array to all envs * remove None from metadata["render_modes"] * add type hints to test_env_checkers * fix lint * add comments to renderer * add comments to single_depth_array and single_state_pixels * reformat * add deprecation warnings and env.render_mode declaration * fix lint * reformat * fix tests * add docs * fix car racing determinism * remove warning test envs, customizable modes on renderer * remove commments and add todo for env_checker * fix car racing * replace render mode check with assert * update new mujoco * reformat * reformat * change metaclass definition * fix tests * implement mark suggestions (test, docs, sets) * check_render Co-authored-by: J K Terry <jkterry0@gmail.com> 2022-06-08 00:20:56 +02:00			`self.renderer.render_step()`
New Step API with terminated, truncated bools instead of done (#2752) 2022-07-10 02:18:06 +05:30			`return observation, reward, False, False, info`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00
			`def _get_obs(self):`
			`position = self.data.qpos.flat.copy()`
			`velocity = self.data.qvel.flat.copy()`

			`if self._exclude_current_positions_from_observation:`
			`position = position[2:]`

			`observation = np.concatenate([position, velocity]).ravel()`
			`return observation`

			`def reset_model(self):`
			`noise_low = -self._reset_noise_scale`
			`noise_high = self._reset_noise_scale`

			`qpos = self.init_qpos + self.np_random.uniform(`
			`low=noise_low, high=noise_high, size=self.model.nq`
			`)`
			`qvel = self.init_qvel + self.np_random.uniform(`
			`low=noise_low, high=noise_high, size=self.model.nv`
			`)`

			`self.set_state(qpos, qvel)`

			`observation = self._get_obs()`
			`return observation`

			`def viewer_setup(self):`
Full type hinting (#2942) * Allows a new RNG to be generated with seed=-1 and updated env_checker to fix bug if environment doesn't use np_random in reset * Revert "fixed `gym.vector.make` where the checker was being applied in the opposite case than was intended to (#2871)" This reverts commit 519dfd9117e98e4f52d38064d2b0f79974fb676d. * Remove bad pushed commits * Fixed spelling in core.py * Pins pytest to the last py 3.6 version * Allow Box automatic scalar shape * Add test box and change default from () to (1,) * update Box shape inference with more strict checking * Update the box shape and add check on the custom Box shape * Removed incorrect shape type and assert shape code * Update the Box and associated tests * Remove all folders and files from pyright exclude * Revert issues * Push RedTachyon code review * Add Python Platform * Remove play from pyright check * Fixed CI issues * remove mujoco env type hinting * Fixed pixel observation test * Added some new type hints * Fixed CI errors * Fixed CI errors * Remove play.py from exlucde pyright * Fixed pyright issues 2022-07-04 18:19:25 +01:00			`assert self.viewer is not None`
Add new MuJoCo bindings (#2762) 2022-05-24 08:47:51 -04:00			`for key, value in DEFAULT_CAMERA_CONFIG.items():`
			`if isinstance(value, np.ndarray):`
			`getattr(self.viewer.cam, key)[:] = value`
			`else:`
			`setattr(self.viewer.cam, key, value)`