Files
Gymnasium/gym/envs/mujoco/swimmer.py

36 lines
1.2 KiB
Python
Raw Normal View History

2016-04-27 08:00:58 -07:00
import numpy as np
2016-04-27 08:00:58 -07:00
from gym import utils
from gym.envs.mujoco import mujoco_env
2021-07-29 02:26:34 +02:00
2016-04-27 08:00:58 -07:00
class SwimmerEnv(mujoco_env.MujocoEnv, utils.EzPickle):
def __init__(self):
2021-07-29 02:26:34 +02:00
mujoco_env.MujocoEnv.__init__(self, "swimmer.xml", 4)
2016-04-27 08:00:58 -07:00
utils.EzPickle.__init__(self)
def step(self, a):
[WIP] add support for seeding environments (#135) * Make environments seedable * Fix monitor bugs - Set monitor_id before setting the infix. This was a bug that would yield incorrect results with multiple monitors. - Remove extra pid from stats recorder filename. This should be purely cosmetic. * Start uploading seeds in episode_batch * Fix _bigint_from_bytes for python3 * Set seed explicitly in random_agent * Pass through seed argument * Also pass through random state to spaces * Pass random state into the observation/action spaces * Make all _seed methods return the list of used seeds * Switch over to np.random where possible * Start hashing seeds, and also seed doom engine * Fixup seeding determinism in many cases * Seed before loading the ROM * Make seeding more Python3 friendly * Make the MuJoCo skipping a bit more forgiving * Remove debugging PDB calls * Make setInt argument into raw bytes * Validate and upload seeds * Skip box2d * Make seeds smaller, and change representation of seeds in upload * Handle long seeds * Fix RandomAgent example to be deterministic * Handle integer types correctly in Python2 and Python3 * Try caching pip * Try adding swap * Add df and free calls * Bump swap * Bump swap size * Try setting overcommit * Try other sysctls * Try fixing overcommit * Try just setting overcommit_memory=1 * Add explanatory comment * Add what's new section to readme * BUG: Mark ElevatorAction-ram-v0 as non-deterministic for now * Document seed * Move nondetermistic check into spec
2016-05-29 09:07:09 -07:00
ctrl_cost_coeff = 0.0001
xposbefore = self.sim.data.qpos[0]
2016-04-27 08:00:58 -07:00
self.do_simulation(a, self.frame_skip)
xposafter = self.sim.data.qpos[0]
2016-04-27 08:00:58 -07:00
reward_fwd = (xposafter - xposbefore) / self.dt
2021-07-29 02:26:34 +02:00
reward_ctrl = -ctrl_cost_coeff * np.square(a).sum()
2016-04-27 08:00:58 -07:00
reward = reward_fwd + reward_ctrl
ob = self._get_obs()
return ob, reward, False, dict(reward_fwd=reward_fwd, reward_ctrl=reward_ctrl)
2016-04-27 08:00:58 -07:00
def _get_obs(self):
qpos = self.sim.data.qpos
qvel = self.sim.data.qvel
2016-04-30 22:47:51 -07:00
return np.concatenate([qpos.flat[2:], qvel.flat])
2016-04-27 08:00:58 -07:00
2016-04-30 22:47:51 -07:00
def reset_model(self):
self.set_state(
2021-07-29 15:39:42 -04:00
self.init_qpos
+ self.np_random.uniform(low=-0.1, high=0.1, size=self.model.nq),
self.init_qvel
+ self.np_random.uniform(low=-0.1, high=0.1, size=self.model.nv),
2016-04-30 22:47:51 -07:00
)
2016-04-27 08:00:58 -07:00
return self._get_obs()