Files
Gymnasium/gym/envs/classic_control/pendulum.py

93 lines
3.0 KiB
Python
Raw Normal View History

2016-04-27 08:00:58 -07:00
import gym
from gym import spaces
[WIP] add support for seeding environments (#135) * Make environments seedable * Fix monitor bugs - Set monitor_id before setting the infix. This was a bug that would yield incorrect results with multiple monitors. - Remove extra pid from stats recorder filename. This should be purely cosmetic. * Start uploading seeds in episode_batch * Fix _bigint_from_bytes for python3 * Set seed explicitly in random_agent * Pass through seed argument * Also pass through random state to spaces * Pass random state into the observation/action spaces * Make all _seed methods return the list of used seeds * Switch over to np.random where possible * Start hashing seeds, and also seed doom engine * Fixup seeding determinism in many cases * Seed before loading the ROM * Make seeding more Python3 friendly * Make the MuJoCo skipping a bit more forgiving * Remove debugging PDB calls * Make setInt argument into raw bytes * Validate and upload seeds * Skip box2d * Make seeds smaller, and change representation of seeds in upload * Handle long seeds * Fix RandomAgent example to be deterministic * Handle integer types correctly in Python2 and Python3 * Try caching pip * Try adding swap * Add df and free calls * Bump swap * Bump swap size * Try setting overcommit * Try other sysctls * Try fixing overcommit * Try just setting overcommit_memory=1 * Add explanatory comment * Add what's new section to readme * BUG: Mark ElevatorAction-ram-v0 as non-deterministic for now * Document seed * Move nondetermistic check into spec
2016-05-29 09:07:09 -07:00
from gym.utils import seeding
2016-04-27 08:00:58 -07:00
import numpy as np
from os import path
2020-04-24 23:56:04 +02:00
2016-04-27 08:00:58 -07:00
class PendulumEnv(gym.Env):
2021-07-29 02:26:34 +02:00
metadata = {"render.modes": ["human", "rgb_array"], "video.frames_per_second": 30}
2016-04-27 08:00:58 -07:00
def __init__(self, g=10.0):
2020-04-24 23:56:04 +02:00
self.max_speed = 8
2021-07-29 02:26:34 +02:00
self.max_torque = 2.0
self.dt = 0.05
self.g = g
2021-07-29 02:26:34 +02:00
self.m = 1.0
self.l = 1.0
2016-04-27 08:00:58 -07:00
self.viewer = None
2021-07-29 02:26:34 +02:00
high = np.array([1.0, 1.0, self.max_speed], dtype=np.float32)
2021-07-29 12:42:48 -04:00
self.action_space = spaces.Box(low=-self.max_torque, high=self.max_torque, shape=(1,), dtype=np.float32)
2021-07-29 02:26:34 +02:00
self.observation_space = spaces.Box(low=-high, high=high, dtype=np.float32)
self.seed()
[WIP] add support for seeding environments (#135) * Make environments seedable * Fix monitor bugs - Set monitor_id before setting the infix. This was a bug that would yield incorrect results with multiple monitors. - Remove extra pid from stats recorder filename. This should be purely cosmetic. * Start uploading seeds in episode_batch * Fix _bigint_from_bytes for python3 * Set seed explicitly in random_agent * Pass through seed argument * Also pass through random state to spaces * Pass random state into the observation/action spaces * Make all _seed methods return the list of used seeds * Switch over to np.random where possible * Start hashing seeds, and also seed doom engine * Fixup seeding determinism in many cases * Seed before loading the ROM * Make seeding more Python3 friendly * Make the MuJoCo skipping a bit more forgiving * Remove debugging PDB calls * Make setInt argument into raw bytes * Validate and upload seeds * Skip box2d * Make seeds smaller, and change representation of seeds in upload * Handle long seeds * Fix RandomAgent example to be deterministic * Handle integer types correctly in Python2 and Python3 * Try caching pip * Try adding swap * Add df and free calls * Bump swap * Bump swap size * Try setting overcommit * Try other sysctls * Try fixing overcommit * Try just setting overcommit_memory=1 * Add explanatory comment * Add what's new section to readme * BUG: Mark ElevatorAction-ram-v0 as non-deterministic for now * Document seed * Move nondetermistic check into spec
2016-05-29 09:07:09 -07:00
def seed(self, seed=None):
[WIP] add support for seeding environments (#135) * Make environments seedable * Fix monitor bugs - Set monitor_id before setting the infix. This was a bug that would yield incorrect results with multiple monitors. - Remove extra pid from stats recorder filename. This should be purely cosmetic. * Start uploading seeds in episode_batch * Fix _bigint_from_bytes for python3 * Set seed explicitly in random_agent * Pass through seed argument * Also pass through random state to spaces * Pass random state into the observation/action spaces * Make all _seed methods return the list of used seeds * Switch over to np.random where possible * Start hashing seeds, and also seed doom engine * Fixup seeding determinism in many cases * Seed before loading the ROM * Make seeding more Python3 friendly * Make the MuJoCo skipping a bit more forgiving * Remove debugging PDB calls * Make setInt argument into raw bytes * Validate and upload seeds * Skip box2d * Make seeds smaller, and change representation of seeds in upload * Handle long seeds * Fix RandomAgent example to be deterministic * Handle integer types correctly in Python2 and Python3 * Try caching pip * Try adding swap * Add df and free calls * Bump swap * Bump swap size * Try setting overcommit * Try other sysctls * Try fixing overcommit * Try just setting overcommit_memory=1 * Add explanatory comment * Add what's new section to readme * BUG: Mark ElevatorAction-ram-v0 as non-deterministic for now * Document seed * Move nondetermistic check into spec
2016-05-29 09:07:09 -07:00
self.np_random, seed = seeding.np_random(seed)
return [seed]
2016-04-27 08:00:58 -07:00
2020-02-28 15:55:13 -08:00
def step(self, u):
2020-04-24 23:56:04 +02:00
th, thdot = self.state # th := theta
2016-04-27 08:00:58 -07:00
g = self.g
m = self.m
l = self.l
2016-04-27 08:00:58 -07:00
dt = self.dt
u = np.clip(u, -self.max_torque, self.max_torque)[0]
2020-04-24 23:56:04 +02:00
self.last_u = u # for rendering
2021-07-29 02:26:34 +02:00
costs = angle_normalize(th) ** 2 + 0.1 * thdot ** 2 + 0.001 * (u ** 2)
2016-04-27 08:00:58 -07:00
2021-07-29 12:42:48 -04:00
newthdot = thdot + (-3 * g / (2 * l) * np.sin(th + np.pi) + 3.0 / (m * l ** 2) * u) * dt
2020-04-24 23:56:04 +02:00
newth = th + newthdot * dt
newthdot = np.clip(newthdot, -self.max_speed, self.max_speed)
2016-04-27 08:00:58 -07:00
self.state = np.array([newth, newthdot])
return self._get_obs(), -costs, False, {}
def reset(self):
2016-04-27 08:00:58 -07:00
high = np.array([np.pi, 1])
[WIP] add support for seeding environments (#135) * Make environments seedable * Fix monitor bugs - Set monitor_id before setting the infix. This was a bug that would yield incorrect results with multiple monitors. - Remove extra pid from stats recorder filename. This should be purely cosmetic. * Start uploading seeds in episode_batch * Fix _bigint_from_bytes for python3 * Set seed explicitly in random_agent * Pass through seed argument * Also pass through random state to spaces * Pass random state into the observation/action spaces * Make all _seed methods return the list of used seeds * Switch over to np.random where possible * Start hashing seeds, and also seed doom engine * Fixup seeding determinism in many cases * Seed before loading the ROM * Make seeding more Python3 friendly * Make the MuJoCo skipping a bit more forgiving * Remove debugging PDB calls * Make setInt argument into raw bytes * Validate and upload seeds * Skip box2d * Make seeds smaller, and change representation of seeds in upload * Handle long seeds * Fix RandomAgent example to be deterministic * Handle integer types correctly in Python2 and Python3 * Try caching pip * Try adding swap * Add df and free calls * Bump swap * Bump swap size * Try setting overcommit * Try other sysctls * Try fixing overcommit * Try just setting overcommit_memory=1 * Add explanatory comment * Add what's new section to readme * BUG: Mark ElevatorAction-ram-v0 as non-deterministic for now * Document seed * Move nondetermistic check into spec
2016-05-29 09:07:09 -07:00
self.state = self.np_random.uniform(low=-high, high=high)
2016-04-27 08:00:58 -07:00
self.last_u = None
return self._get_obs()
def _get_obs(self):
theta, thetadot = self.state
return np.array([np.cos(theta), np.sin(theta), thetadot])
2021-07-29 02:26:34 +02:00
def render(self, mode="human"):
2016-04-27 08:00:58 -07:00
if self.viewer is None:
from gym.envs.classic_control import rendering
2021-07-29 02:26:34 +02:00
2020-04-24 23:56:04 +02:00
self.viewer = rendering.Viewer(500, 500)
self.viewer.set_bounds(-2.2, 2.2, -2.2, 2.2)
2021-07-29 02:26:34 +02:00
rod = rendering.make_capsule(1, 0.2)
rod.set_color(0.8, 0.3, 0.3)
2016-04-27 08:00:58 -07:00
self.pole_transform = rendering.Transform()
rod.add_attr(self.pole_transform)
self.viewer.add_geom(rod)
2021-07-29 02:26:34 +02:00
axle = rendering.make_circle(0.05)
2020-04-24 23:56:04 +02:00
axle.set_color(0, 0, 0)
2016-04-27 08:00:58 -07:00
self.viewer.add_geom(axle)
fname = path.join(path.dirname(__file__), "assets/clockwise.png")
2021-07-29 02:26:34 +02:00
self.img = rendering.Image(fname, 1.0, 1.0)
2016-04-27 08:00:58 -07:00
self.imgtrans = rendering.Transform()
self.img.add_attr(self.imgtrans)
self.viewer.add_onetime(self.img)
2020-04-24 23:56:04 +02:00
self.pole_transform.set_rotation(self.state[0] + np.pi / 2)
2016-04-27 08:00:58 -07:00
if self.last_u:
2020-04-24 23:56:04 +02:00
self.imgtrans.scale = (-self.last_u / 2, np.abs(self.last_u) / 2)
2016-04-27 08:00:58 -07:00
2021-07-29 02:26:34 +02:00
return self.viewer.render(return_rgb_array=mode == "rgb_array")
2016-04-27 08:00:58 -07:00
def close(self):
if self.viewer:
self.viewer.close()
self.viewer = None
2020-04-24 23:56:04 +02:00
2016-04-27 08:00:58 -07:00
def angle_normalize(x):
2021-07-29 02:26:34 +02:00
return ((x + np.pi) % (2 * np.pi)) - np.pi