2016-04-27 08:00:58 -07:00
""" classic Acrobot task """
2021-12-08 22:14:15 +01:00
from typing import Optional
2016-04-27 08:00:58 -07:00
import numpy as np
2022-04-01 09:01:34 -04:00
from numpy import cos , pi , sin
2016-04-27 08:00:58 -07:00
2022-10-05 17:53:45 +01:00
import gymnasium as gym
2022-09-16 23:41:27 +01:00
from gymnasium import Env , spaces
2022-10-05 17:53:45 +01:00
from gymnasium . envs . classic_control import utils
2022-09-08 10:10:07 +01:00
from gymnasium . error import DependencyNotInstalled
2018-11-29 02:27:27 +01:00
2022-12-04 22:24:02 +08:00
2016-04-27 08:00:58 -07:00
__copyright__ = " Copyright 2013, RLPy http://acl.mit.edu/RLPy "
2021-07-29 02:26:34 +02:00
__credits__ = [
" Alborz Geramifard " ,
" Robert H. Klein " ,
" Christoph Dann " ,
" William Dabney " ,
" Jonathan P. How " ,
]
2016-04-27 08:00:58 -07:00
__license__ = " BSD 3-Clause "
__author__ = " Christoph Dann <cdann@cdann.de> "
# SOURCE:
# https://github.com/rlpy/rlpy/blob/master/rlpy/Domains/Acrobot.py
2021-07-29 02:26:34 +02:00
2022-09-16 23:41:27 +01:00
class AcrobotEnv ( Env ) :
2016-04-27 08:00:58 -07:00
"""
2022-10-12 15:58:01 +01:00
## Description
2022-01-26 16:02:42 -05:00
2022-02-28 08:54:16 -05:00
The Acrobot environment is based on Sutton ' s work in
[ " Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding " ] ( https : / / papers . nips . cc / paper / 1995 / hash / 8 f1d43620bc6bb580df6e80b0dc05c48 - Abstract . html )
and [ Sutton and Barto ' s book](http://www.incompleteideas.net/book/the-book-2nd.html).
The system consists of two links connected linearly to form a chain , with one end of
the chain fixed . The joint between the two links is actuated . The goal is to apply
torques on the actuated joint to swing the free end of the linear chain above a
given height while starting from the initial state of hanging downwards .
As seen in the * * Gif * * : two blue links connected by two green joints . The joint in
between the two links is actuated . The goal is to swing the free end of the outer - link
to reach the target height ( black horizontal line above system ) by applying torque on
the actuator .
2022-01-26 16:02:42 -05:00
2022-10-12 15:58:01 +01:00
## Action Space
2022-01-26 16:02:42 -05:00
2022-02-28 08:54:16 -05:00
The action is discrete , deterministic , and represents the torque applied on the actuated
joint between the two links .
2022-01-26 16:02:42 -05:00
2022-05-25 14:46:41 +01:00
| Num | Action | Unit |
| - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - |
2022-02-28 08:54:16 -05:00
| 0 | apply - 1 torque to the actuated joint | torque ( N m ) |
| 1 | apply 0 torque to the actuated joint | torque ( N m ) |
| 2 | apply 1 torque to the actuated joint | torque ( N m ) |
2022-01-26 16:02:42 -05:00
2022-10-12 15:58:01 +01:00
## Observation Space
2022-01-26 16:02:42 -05:00
2022-02-28 08:54:16 -05:00
The observation is a ` ndarray ` with shape ` ( 6 , ) ` that provides information about the
two rotational joint angles as well as their angular velocities :
2022-01-26 16:02:42 -05:00
2022-05-25 14:46:41 +01:00
| Num | Observation | Min | Max |
| - - - - - | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - - - | - - - - - - - - - - - - - - - - - - - |
| 0 | Cosine of ` theta1 ` | - 1 | 1 |
| 1 | Sine of ` theta1 ` | - 1 | 1 |
| 2 | Cosine of ` theta2 ` | - 1 | 1 |
| 3 | Sine of ` theta2 ` | - 1 | 1 |
| 4 | Angular velocity of ` theta1 ` | ~ - 12.567 ( - 4 * pi ) | ~ 12.567 ( 4 * pi ) |
| 5 | Angular velocity of ` theta2 ` | ~ - 28.274 ( - 9 * pi ) | ~ 28.274 ( 9 * pi ) |
2022-01-26 16:02:42 -05:00
2022-02-28 08:54:16 -05:00
where
- ` theta1 ` is the angle of the first joint , where an angle of 0 indicates the first link is pointing directly
downwards .
2022-05-25 14:46:41 +01:00
- ` theta2 ` is * * * relative to the angle of the first link . * * *
An angle of 0 corresponds to having the same angle between the two links .
2022-02-28 08:54:16 -05:00
The angular velocities of ` theta1 ` and ` theta2 ` are bounded at ± 4 π , and ± 9 π rad / s respectively .
A state of ` [ 1 , 0 , 1 , 0 , . . . , . . . ] ` indicates that both links are pointing downwards .
2022-01-26 16:02:42 -05:00
2022-10-12 15:58:01 +01:00
## Rewards
2022-01-26 16:02:42 -05:00
2022-05-25 14:46:41 +01:00
The goal is to have the free end reach a designated target height in as few steps as possible ,
and as such all steps that do not reach the goal incur a reward of - 1.
Achieving the target height results in termination with a reward of 0. The reward threshold is - 100.
2022-01-26 16:02:42 -05:00
2022-10-12 15:58:01 +01:00
## Starting State
2022-01-26 16:02:42 -05:00
2022-02-28 08:54:16 -05:00
Each parameter in the underlying state ( ` theta1 ` , ` theta2 ` , and the two angular velocities ) is initialized
uniformly between - 0.1 and 0.1 . This means both links are pointing downwards with some initial stochasticity .
2022-01-26 16:02:42 -05:00
2022-10-12 15:58:01 +01:00
## Episode End
2022-01-26 16:02:42 -05:00
2022-07-10 02:18:06 +05:30
The episode ends if one of the following occurs :
1. Termination : The free end reaches the target height , which is constructed as :
2022-01-26 16:02:42 -05:00
` - cos ( theta1 ) - cos ( theta2 + theta1 ) > 1.0 `
2022-07-10 02:18:06 +05:30
2. Truncation : Episode length is greater than 500 ( 200 for v0 )
2022-01-26 16:02:42 -05:00
2022-10-12 15:58:01 +01:00
## Arguments
2022-01-26 16:02:42 -05:00
2022-09-16 23:41:27 +01:00
No additional arguments are currently supported during construction .
2022-02-28 08:54:16 -05:00
2022-09-16 23:41:27 +01:00
` ` ` python
import gymnasium as gym
env = gym . make ( ' Acrobot-v1 ' )
2022-02-28 08:54:16 -05:00
` ` `
2022-09-16 23:41:27 +01:00
On reset , the ` options ` parameter allows the user to change the bounds used to determine
the new random state .
2022-02-28 08:54:16 -05:00
By default , the dynamics of the acrobot follow those described in Sutton and Barto ' s book
2022-05-25 14:46:41 +01:00
[ Reinforcement Learning : An Introduction ] ( http : / / incompleteideas . net / book / 11 / node4 . html ) .
However , a ` book_or_nips ` parameter can be modified to change the pendulum dynamics to those described
2022-02-28 08:54:16 -05:00
in the original [ NeurIPS paper ] ( https : / / papers . nips . cc / paper / 1995 / hash / 8 f1d43620bc6bb580df6e80b0dc05c48 - Abstract . html ) .
2022-01-26 16:02:42 -05:00
2022-09-16 23:41:27 +01:00
` ` ` python
2022-02-28 08:54:16 -05:00
# To change the dynamics as described above
2022-09-16 23:41:27 +01:00
env . unwrapped . book_or_nips = ' nips '
2022-01-26 16:02:42 -05:00
` ` `
2022-09-07 22:27:28 -04:00
See the following note for details :
2022-01-26 16:02:42 -05:00
> The dynamics equations were missing some terms in the NIPS paper which
are present in the book . R . Sutton confirmed in personal correspondence
that the experimental results shown in the paper and the book were
generated with the equations shown in the book .
However , there is the option to run the domain with the paper equations
by setting ` book_or_nips = ' nips ' `
2022-10-12 15:58:01 +01:00
## Version History
2022-01-26 16:02:42 -05:00
- v1 : Maximum number of steps increased from 200 to 500. The observation space for v0 provided direct readings of
` theta1 ` and ` theta2 ` in radians , having a range of ` [ - pi , pi ] ` . The v1 observation space as described here provides the
2022-02-28 08:54:16 -05:00
sine and cosine of each angle instead .
2022-09-08 10:10:07 +01:00
- v0 : Initial versions release ( 1.0 .0 ) ( removed from gymnasium for v1 )
2022-01-26 16:02:42 -05:00
2022-10-12 15:58:01 +01:00
## References
2022-05-25 14:46:41 +01:00
- Sutton , R . S . ( 1996 ) . Generalization in Reinforcement Learning : Successful Examples Using Sparse Coarse Coding .
In D . Touretzky , M . C . Mozer , & M . Hasselmo ( Eds . ) , Advances in Neural Information Processing Systems ( Vol . 8 ) .
MIT Press . https : / / proceedings . neurips . cc / paper / 1995 / file / 8 f1d43620bc6bb580df6e80b0dc05c48 - Paper . pdf
2022-01-26 16:02:42 -05:00
- Sutton , R . S . , Barto , A . G . ( 2018 ) . Reinforcement Learning : An Introduction . The MIT Press .
2016-04-27 08:00:58 -07:00
"""
2022-06-08 00:20:56 +02:00
metadata = {
2023-01-09 13:12:07 +00:00
" render_modes " : [ " human " , " rgb_array " ] ,
2022-06-08 00:20:56 +02:00
" render_fps " : 15 ,
}
2016-04-27 08:00:58 -07:00
2021-07-29 02:26:34 +02:00
dt = 0.2
2016-04-27 08:00:58 -07:00
2021-07-29 02:26:34 +02:00
LINK_LENGTH_1 = 1.0 # [m]
LINK_LENGTH_2 = 1.0 # [m]
LINK_MASS_1 = 1.0 #: [kg] mass of link 1
LINK_MASS_2 = 1.0 #: [kg] mass of link 2
2016-04-27 08:00:58 -07:00
LINK_COM_POS_1 = 0.5 #: [m] position of the center of mass of link 1
LINK_COM_POS_2 = 0.5 #: [m] position of the center of mass of link 2
2021-07-29 02:26:34 +02:00
LINK_MOI = 1.0 #: moments of inertia for both links
2016-04-27 08:00:58 -07:00
2019-11-08 23:22:50 +01:00
MAX_VEL_1 = 4 * pi
MAX_VEL_2 = 9 * pi
2016-04-27 08:00:58 -07:00
2021-07-29 02:26:34 +02:00
AVAIL_TORQUE = [ - 1.0 , 0.0 , + 1 ]
2016-04-27 08:00:58 -07:00
2021-07-29 02:26:34 +02:00
torque_noise_max = 0.0
2016-04-27 08:00:58 -07:00
2022-02-11 23:48:42 +08:00
SCREEN_DIM = 500
2016-04-27 08:00:58 -07:00
#: use dynamics equations from the nips paper or the book
book_or_nips = " book "
action_arrow = None
domain_fig = None
actions_num = 3
2022-06-08 00:20:56 +02:00
def __init__ ( self , render_mode : Optional [ str ] = None ) :
self . render_mode = render_mode
2022-02-11 23:48:42 +08:00
self . screen = None
2022-03-02 16:37:48 +01:00
self . clock = None
2022-02-11 23:48:42 +08:00
self . isopen = True
2021-07-29 15:39:42 -04:00
high = np . array (
[ 1.0 , 1.0 , 1.0 , 1.0 , self . MAX_VEL_1 , self . MAX_VEL_2 ] , dtype = np . float32
)
2016-05-30 18:07:59 -07:00
low = - high
2018-11-29 02:27:27 +01:00
self . observation_space = spaces . Box ( low = low , high = high , dtype = np . float32 )
2016-05-30 18:07:59 -07:00
self . action_space = spaces . Discrete ( 3 )
2017-02-27 10:00:48 -08:00
self . state = None
2016-04-27 08:00:58 -07:00
2022-08-23 11:09:54 -04:00
def reset ( self , * , seed : Optional [ int ] = None , options : Optional [ dict ] = None ) :
2021-12-08 22:14:15 +01:00
super ( ) . reset ( seed = seed )
2022-07-06 07:08:01 -04:00
# Note that if you use custom reset bounds, it may lead to out-of-bound
# state/observations.
low , high = utils . maybe_parse_reset_bounds (
options , - 0.1 , 0.1 # default low
) # default high
self . state = self . np_random . uniform ( low = low , high = high , size = ( 4 , ) ) . astype (
2021-08-22 00:11:19 +02:00
np . float32
)
2022-06-08 00:20:56 +02:00
2022-09-05 21:56:36 +02:00
if self . render_mode == " human " :
self . render ( )
2022-08-23 11:09:54 -04:00
return self . _get_ob ( ) , { }
2016-04-27 08:00:58 -07:00
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
def step ( self , a ) :
2016-04-27 08:00:58 -07:00
s = self . state
2022-02-10 18:24:41 +01:00
assert s is not None , " Call reset before using AcrobotEnv object. "
2016-04-27 08:00:58 -07:00
torque = self . AVAIL_TORQUE [ a ]
# Add noise to the force action
if self . torque_noise_max > 0 :
2021-07-29 15:39:42 -04:00
torque + = self . np_random . uniform (
- self . torque_noise_max , self . torque_noise_max
)
2016-04-27 08:00:58 -07:00
# Now, augment the state with our force action so it can be passed to
# _dsdt
s_augmented = np . append ( s , torque )
ns = rk4 ( self . _dsdt , s_augmented , [ 0 , self . dt ] )
2021-08-26 15:17:57 -04:00
2016-08-06 00:15:50 -07:00
ns [ 0 ] = wrap ( ns [ 0 ] , - pi , pi )
ns [ 1 ] = wrap ( ns [ 1 ] , - pi , pi )
2016-04-27 08:00:58 -07:00
ns [ 2 ] = bound ( ns [ 2 ] , - self . MAX_VEL_1 , self . MAX_VEL_1 )
ns [ 3 ] = bound ( ns [ 3 ] , - self . MAX_VEL_2 , self . MAX_VEL_2 )
2016-08-06 00:15:50 -07:00
self . state = ns
2022-07-10 02:18:06 +05:30
terminated = self . _terminal ( )
reward = - 1.0 if not terminated else 0.0
2022-06-08 00:20:56 +02:00
2022-09-05 21:56:36 +02:00
if self . render_mode == " human " :
self . render ( )
2022-07-10 02:18:06 +05:30
return ( self . _get_ob ( ) , reward , terminated , False , { } )
2016-08-06 00:15:50 -07:00
def _get_ob ( self ) :
s = self . state
2022-02-10 18:24:41 +01:00
assert s is not None , " Call reset before using AcrobotEnv object. "
2021-08-22 00:11:19 +02:00
return np . array (
[ cos ( s [ 0 ] ) , sin ( s [ 0 ] ) , cos ( s [ 1 ] ) , sin ( s [ 1 ] ) , s [ 2 ] , s [ 3 ] ] , dtype = np . float32
)
2016-04-27 08:00:58 -07:00
def _terminal ( self ) :
s = self . state
2022-02-10 18:24:41 +01:00
assert s is not None , " Call reset before using AcrobotEnv object. "
2021-07-29 02:26:34 +02:00
return bool ( - cos ( s [ 0 ] ) - cos ( s [ 1 ] + s [ 0 ] ) > 1.0 )
2016-04-27 08:00:58 -07:00
2021-08-26 12:17:36 -07:00
def _dsdt ( self , s_augmented ) :
2016-04-27 08:00:58 -07:00
m1 = self . LINK_MASS_1
m2 = self . LINK_MASS_2
l1 = self . LINK_LENGTH_1
lc1 = self . LINK_COM_POS_1
lc2 = self . LINK_COM_POS_2
I1 = self . LINK_MOI
I2 = self . LINK_MOI
g = 9.8
a = s_augmented [ - 1 ]
s = s_augmented [ : - 1 ]
theta1 = s [ 0 ]
theta2 = s [ 1 ]
dtheta1 = s [ 2 ]
dtheta2 = s [ 3 ]
2021-07-29 15:39:42 -04:00
d1 = (
2022-03-31 12:50:38 -07:00
m1 * lc1 * * 2
+ m2 * ( l1 * * 2 + lc2 * * 2 + 2 * l1 * lc2 * cos ( theta2 ) )
2021-07-29 15:39:42 -04:00
+ I1
+ I2
)
2022-03-31 12:50:38 -07:00
d2 = m2 * ( lc2 * * 2 + l1 * lc2 * cos ( theta2 ) ) + I2
2021-07-29 02:26:34 +02:00
phi2 = m2 * lc2 * g * cos ( theta1 + theta2 - pi / 2.0 )
phi1 = (
2022-03-31 12:50:38 -07:00
- m2 * l1 * lc2 * dtheta2 * * 2 * sin ( theta2 )
2021-07-29 02:26:34 +02:00
- 2 * m2 * l1 * lc2 * dtheta2 * dtheta1 * sin ( theta2 )
+ ( m1 * lc1 + m2 * l1 ) * g * cos ( theta1 - pi / 2 )
+ phi2
)
2016-04-27 08:00:58 -07:00
if self . book_or_nips == " nips " :
# the following line is consistent with the description in the
# paper
2022-03-31 12:50:38 -07:00
ddtheta2 = ( a + d2 / d1 * phi1 - phi2 ) / ( m2 * lc2 * * 2 + I2 - d2 * * 2 / d1 )
2016-04-27 08:00:58 -07:00
else :
# the following line is consistent with the java implementation and the
# book
2021-07-29 15:39:42 -04:00
ddtheta2 = (
2022-03-31 12:50:38 -07:00
a + d2 / d1 * phi1 - m2 * l1 * lc2 * dtheta1 * * 2 * sin ( theta2 ) - phi2
) / ( m2 * lc2 * * 2 + I2 - d2 * * 2 / d1 )
2016-04-27 08:00:58 -07:00
ddtheta1 = - ( d2 * ddtheta2 + phi1 ) / d1
2022-06-08 00:20:56 +02:00
return dtheta1 , dtheta2 , ddtheta1 , ddtheta2 , 0.0
2016-04-27 08:00:58 -07:00
2022-08-22 17:21:08 +02:00
def render ( self ) :
2022-10-05 17:53:45 +01:00
if self . render_mode is None :
2022-11-12 10:21:24 +00:00
assert self . spec is not None
2022-10-05 17:53:45 +01:00
gym . logger . warn (
" You are calling render method without specifying any render mode. "
" You can specify the render_mode at initialization, "
2022-11-18 17:15:52 +01:00
f ' e.g. gym.make( " { self . spec . id } " , render_mode= " rgb_array " ) '
2022-10-05 17:53:45 +01:00
)
return
2022-04-30 00:44:28 +01:00
try :
import pygame
from pygame import gfxdraw
2022-12-10 16:47:18 +02:00
except ImportError as e :
2022-04-30 00:44:28 +01:00
raise DependencyNotInstalled (
2022-09-08 10:10:07 +01:00
" pygame is not installed, run `pip install gymnasium[classic_control]` "
2022-12-10 16:47:18 +02:00
) from e
2022-04-01 00:55:48 +02:00
2022-02-11 23:48:42 +08:00
if self . screen is None :
pygame . init ( )
2022-09-04 15:42:10 +02:00
if self . render_mode == " human " :
2022-06-08 00:20:56 +02:00
pygame . display . init ( )
self . screen = pygame . display . set_mode (
( self . SCREEN_DIM , self . SCREEN_DIM )
)
2022-09-04 15:42:10 +02:00
else : # mode in "rgb_array"
2022-06-08 00:20:56 +02:00
self . screen = pygame . Surface ( ( self . SCREEN_DIM , self . SCREEN_DIM ) )
2022-03-02 16:37:48 +01:00
if self . clock is None :
self . clock = pygame . time . Clock ( )
2022-06-08 00:20:56 +02:00
surf = pygame . Surface ( ( self . SCREEN_DIM , self . SCREEN_DIM ) )
surf . fill ( ( 255 , 255 , 255 ) )
2016-04-27 08:00:58 -07:00
s = self . state
2022-02-11 23:48:42 +08:00
bound = self . LINK_LENGTH_1 + self . LINK_LENGTH_2 + 0.2 # 2.2 for default
scale = self . SCREEN_DIM / ( bound * 2 )
offset = self . SCREEN_DIM / 2
2016-04-27 08:00:58 -07:00
2021-07-29 02:26:34 +02:00
if s is None :
return None
2017-02-27 10:00:48 -08:00
2022-02-11 23:48:42 +08:00
p1 = [
- self . LINK_LENGTH_1 * cos ( s [ 0 ] ) * scale ,
self . LINK_LENGTH_1 * sin ( s [ 0 ] ) * scale ,
]
2016-04-27 08:00:58 -07:00
2021-07-29 02:26:34 +02:00
p2 = [
2022-02-11 23:48:42 +08:00
p1 [ 0 ] - self . LINK_LENGTH_2 * cos ( s [ 0 ] + s [ 1 ] ) * scale ,
p1 [ 1 ] + self . LINK_LENGTH_2 * sin ( s [ 0 ] + s [ 1 ] ) * scale ,
2021-07-29 02:26:34 +02:00
]
2016-04-27 08:00:58 -07:00
2021-07-29 02:26:34 +02:00
xys = np . array ( [ [ 0 , 0 ] , p1 , p2 ] ) [ : , : : - 1 ]
thetas = [ s [ 0 ] - pi / 2 , s [ 0 ] + s [ 1 ] - pi / 2 ]
2022-02-11 23:48:42 +08:00
link_lengths = [ self . LINK_LENGTH_1 * scale , self . LINK_LENGTH_2 * scale ]
pygame . draw . line (
2022-06-08 00:20:56 +02:00
surf ,
2022-02-11 23:48:42 +08:00
start_pos = ( - 2.2 * scale + offset , 1 * scale + offset ) ,
end_pos = ( 2.2 * scale + offset , 1 * scale + offset ) ,
color = ( 0 , 0 , 0 ) ,
)
2016-04-27 08:00:58 -07:00
2021-07-29 02:26:34 +02:00
for ( ( x , y ) , th , llen ) in zip ( xys , thetas , link_lengths ) :
2022-02-11 23:48:42 +08:00
x = x + offset
y = y + offset
l , r , t , b = 0 , llen , 0.1 * scale , - 0.1 * scale
coords = [ ( l , b ) , ( l , t ) , ( r , t ) , ( r , b ) ]
transformed_coords = [ ]
for coord in coords :
coord = pygame . math . Vector2 ( coord ) . rotate_rad ( th )
coord = ( coord [ 0 ] + x , coord [ 1 ] + y )
transformed_coords . append ( coord )
2022-06-08 00:20:56 +02:00
gfxdraw . aapolygon ( surf , transformed_coords , ( 0 , 204 , 204 ) )
gfxdraw . filled_polygon ( surf , transformed_coords , ( 0 , 204 , 204 ) )
2022-02-11 23:48:42 +08:00
2022-06-08 00:20:56 +02:00
gfxdraw . aacircle ( surf , int ( x ) , int ( y ) , int ( 0.1 * scale ) , ( 204 , 204 , 0 ) )
gfxdraw . filled_circle ( surf , int ( x ) , int ( y ) , int ( 0.1 * scale ) , ( 204 , 204 , 0 ) )
surf = pygame . transform . flip ( surf , False , True )
self . screen . blit ( surf , ( 0 , 0 ) )
2022-02-11 23:48:42 +08:00
2022-09-04 15:42:10 +02:00
if self . render_mode == " human " :
2022-03-12 00:37:04 +08:00
pygame . event . pump ( )
2022-03-02 16:37:48 +01:00
self . clock . tick ( self . metadata [ " render_fps " ] )
2022-02-11 23:48:42 +08:00
pygame . display . flip ( )
2016-04-27 08:00:58 -07:00
2022-09-04 15:42:10 +02:00
elif self . render_mode == " rgb_array " :
2022-02-11 23:48:42 +08:00
return np . transpose (
np . array ( pygame . surfarray . pixels3d ( self . screen ) ) , axes = ( 1 , 0 , 2 )
)
2016-04-27 08:00:58 -07:00
2022-08-08 22:41:15 +01:00
def close ( self ) :
if self . screen is not None :
import pygame
2022-04-01 00:55:48 +02:00
2022-08-08 22:41:15 +01:00
pygame . display . quit ( )
pygame . quit ( )
self . isopen = False
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
2021-07-29 02:26:34 +02:00
2016-04-27 08:00:58 -07:00
def wrap ( x , m , M ) :
2020-04-24 23:10:27 +02:00
""" Wraps ``x`` so m <= x <= M; but unlike ``bound()`` which
2016-04-27 08:00:58 -07:00
truncates , ` ` wrap ( ) ` ` wraps x around the coordinate system defined by m , M . \n
For example , m = - 180 , M = 180 ( degrees ) , x = 360 - - > returns 0.
2020-04-24 23:10:27 +02:00
Args :
x : a scalar
m : minimum possible value in range
M : maximum possible value in range
Returns :
x : a scalar , wrapped
2016-04-27 08:00:58 -07:00
"""
diff = M - m
while x > M :
x = x - diff
while x < m :
x = x + diff
return x
2021-07-29 02:26:34 +02:00
2016-04-27 08:00:58 -07:00
def bound ( x , m , M = None ) :
2020-04-24 23:10:27 +02:00
""" Either have m as scalar, so bound(x,m,M) which returns m <= x <= M *OR*
2016-04-27 08:00:58 -07:00
have m as length 2 vector , bound ( x , m , < IGNORED > ) returns m [ 0 ] < = x < = m [ 1 ] .
2020-04-24 23:10:27 +02:00
Args :
x : scalar
2022-05-25 14:46:41 +01:00
m : The lower bound
M : The upper bound
2020-04-24 23:10:27 +02:00
Returns :
x : scalar , bound between min ( m ) and Max ( M )
2016-04-27 08:00:58 -07:00
"""
if M is None :
M = m [ 1 ]
m = m [ 0 ]
# bound x between min (m) and Max (M)
return min ( max ( x , m ) , M )
2021-08-26 12:17:36 -07:00
def rk4 ( derivs , y0 , t ) :
2016-04-27 08:00:58 -07:00
"""
2022-01-26 16:02:42 -05:00
Integrate 1 - D or N - D system of ODEs using 4 - th order Runge - Kutta .
2020-04-24 23:10:27 +02:00
2022-05-25 14:46:41 +01:00
Example for 2 D system :
2022-05-20 14:49:30 +01:00
>> > def derivs ( x ) :
. . . d1 = x [ 0 ] + 2 * x [ 1 ]
. . . d2 = - 3 * x [ 0 ] + 4 * x [ 1 ]
2022-05-25 14:46:41 +01:00
. . . return d1 , d2
2022-05-20 14:49:30 +01:00
>> > dt = 0.0005
2022-05-25 14:46:41 +01:00
>> > t = np . arange ( 0.0 , 2.0 , dt )
2022-05-20 14:49:30 +01:00
>> > y0 = ( 1 , 2 )
>> > yout = rk4 ( derivs , y0 , t )
2020-04-24 23:10:27 +02:00
Args :
2021-08-26 21:37:56 +02:00
derivs : the derivative of the system and has the signature ` ` dy = derivs ( yi ) ` `
2020-04-24 23:10:27 +02:00
y0 : initial state vector
t : sample times
Returns :
yout : Runge - Kutta approximation of the ODE
2016-04-27 08:00:58 -07:00
"""
try :
Ny = len ( y0 )
except TypeError :
yout = np . zeros ( ( len ( t ) , ) , np . float_ )
else :
yout = np . zeros ( ( len ( t ) , Ny ) , np . float_ )
yout [ 0 ] = y0
2018-02-27 10:18:07 -08:00
2016-04-27 08:00:58 -07:00
for i in np . arange ( len ( t ) - 1 ) :
2022-01-26 16:02:42 -05:00
this = t [ i ]
dt = t [ i + 1 ] - this
2016-04-27 08:00:58 -07:00
dt2 = dt / 2.0
y0 = yout [ i ]
2021-08-26 12:17:36 -07:00
k1 = np . asarray ( derivs ( y0 ) )
k2 = np . asarray ( derivs ( y0 + dt2 * k1 ) )
k3 = np . asarray ( derivs ( y0 + dt2 * k2 ) )
k4 = np . asarray ( derivs ( y0 + dt * k3 ) )
2016-04-27 08:00:58 -07:00
yout [ i + 1 ] = y0 + dt / 6.0 * ( k1 + 2 * k2 + 2 * k3 + k4 )
2021-08-26 15:17:57 -04:00
# We only care about the final timestep and we cleave off action value which will be zero
2021-08-26 12:17:36 -07:00
return yout [ - 1 ] [ : 4 ]