2016-04-27 08:00:58 -07:00
"""
Classic cart - pole system implemented by Rich Sutton et al .
2017-06-14 16:27:42 -04:00
Copied from http : / / incompleteideas . net / sutton / book / code / pole . c
permalink : https : / / perma . cc / C9ZM - 652 R
2016-04-27 08:00:58 -07:00
"""
import math
import gym
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
from gym import spaces , logger
2016-05-29 09:07:09 -07:00
from gym . utils import seeding
2016-04-27 08:00:58 -07:00
import numpy as np
class CartPoleEnv ( gym . Env ) :
2018-08-24 19:30:17 -04:00
"""
Description :
A pole is attached by an un - actuated joint to a cart , which moves along a frictionless track . The pendulum starts upright , and the goal is to prevent it from falling over by increasing and reducing the cart ' s velocity.
Source :
This environment corresponds to the version of the cart - pole problem described by Barto , Sutton , and Anderson
Observation :
Type : Box ( 4 )
Num Observation Min Max
2018-09-14 13:35:08 -07:00
0 Cart Position - 4.8 4.8
2018-08-24 19:30:17 -04:00
1 Cart Velocity - Inf Inf
2018-09-14 13:35:08 -07:00
2 Pole Angle - 24 ° 24 °
2018-08-24 19:30:17 -04:00
3 Pole Velocity At Tip - Inf Inf
Actions :
Type : Discrete ( 2 )
Num Action
0 Push cart to the left
1 Push cart to the right
Note : The amount the velocity is reduced or increased is not fixed as it depends on the angle the pole is pointing . This is because the center of gravity of the pole increases the amount of energy needed to move the cart underneath it
Reward :
Reward is 1 for every step taken , including the termination step
Starting State :
All observations are assigned a uniform random value between ± 0.05
Episode Termination :
Pole Angle is more than ± 12 °
Cart Position is more than ± 2.4 ( center of the cart reaches the edge of the display )
Episode length is greater than 200
Solved Requirements
Considered solved when the average reward is greater than or equal to 195.0 over 100 consecutive trials .
"""
2016-04-27 08:00:58 -07:00
metadata = {
' render.modes ' : [ ' human ' , ' rgb_array ' ] ,
' video.frames_per_second ' : 50
}
def __init__ ( self ) :
self . gravity = 9.8
self . masscart = 1.0
self . masspole = 0.1
self . total_mass = ( self . masspole + self . masscart )
self . length = 0.5 # actually half the pole's length
self . polemass_length = ( self . masspole * self . length )
self . force_mag = 10.0
self . tau = 0.02 # seconds between state updates
2018-09-21 17:19:40 -05:00
self . kinematics_integrator = ' euler '
2016-04-27 08:00:58 -07:00
# Angle at which to fail the episode
self . theta_threshold_radians = 12 * 2 * math . pi / 360
self . x_threshold = 2.4
2016-05-29 09:07:09 -07:00
2016-05-30 18:07:59 -07:00
# Angle limit set to 2 * theta_threshold_radians so failing observation is still within bounds
2016-06-20 16:42:06 -04:00
high = np . array ( [
self . x_threshold * 2 ,
np . finfo ( np . float32 ) . max ,
self . theta_threshold_radians * 2 ,
np . finfo ( np . float32 ) . max ] )
2016-05-30 18:07:59 -07:00
self . action_space = spaces . Discrete ( 2 )
2018-09-17 13:28:02 -04:00
self . observation_space = spaces . Box ( - high , high , dtype = np . float32 )
2016-05-30 18:07:59 -07:00
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
self . seed ( )
2016-04-27 08:00:58 -07:00
self . viewer = None
2017-02-27 10:00:48 -08:00
self . state = None
2016-04-27 08:00:58 -07:00
2016-05-29 09:07:09 -07:00
self . steps_beyond_done = None
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
def seed ( self , seed = None ) :
2016-05-29 09:07:09 -07:00
self . np_random , seed = seeding . np_random ( seed )
return [ seed ]
2016-04-28 22:31:46 -07:00
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
def step ( self , action ) :
2016-06-16 00:12:47 -07:00
assert self . action_space . contains ( action ) , " %r ( %s ) invalid " % ( action , type ( action ) )
2016-04-27 08:00:58 -07:00
state = self . state
x , x_dot , theta , theta_dot = state
force = self . force_mag if action == 1 else - self . force_mag
costheta = math . cos ( theta )
sintheta = math . sin ( theta )
temp = ( force + self . polemass_length * theta_dot * theta_dot * sintheta ) / self . total_mass
thetaacc = ( self . gravity * sintheta - costheta * temp ) / ( self . length * ( 4.0 / 3.0 - self . masspole * costheta * costheta / self . total_mass ) )
xacc = temp - self . polemass_length * thetaacc * costheta / self . total_mass
2018-09-21 17:19:40 -05:00
if self . kinematics_integrator == ' euler ' :
x = x + self . tau * x_dot
x_dot = x_dot + self . tau * xacc
theta = theta + self . tau * theta_dot
theta_dot = theta_dot + self . tau * thetaacc
else : # semi-implicit euler
x_dot = x_dot + self . tau * xacc
x = x + self . tau * x_dot
theta_dot = theta_dot + self . tau * thetaacc
theta = theta + self . tau * theta_dot
2016-04-27 08:00:58 -07:00
self . state = ( x , x_dot , theta , theta_dot )
done = x < - self . x_threshold \
or x > self . x_threshold \
or theta < - self . theta_threshold_radians \
or theta > self . theta_threshold_radians
done = bool ( done )
2016-04-28 22:31:46 -07:00
if not done :
reward = 1.0
elif self . steps_beyond_done is None :
# Pole just fell!
self . steps_beyond_done = 0
reward = 1.0
else :
if self . steps_beyond_done == 0 :
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
logger . warn ( " You are calling ' step() ' even though this environment has already returned done = True. You should always call ' reset() ' once you receive ' done = True ' -- any further steps are undefined behavior. " )
2016-04-28 22:31:46 -07:00
self . steps_beyond_done + = 1
reward = 0.0
2016-04-27 08:00:58 -07:00
return np . array ( self . state ) , reward , done , { }
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
def reset ( self ) :
2016-05-29 09:07:09 -07:00
self . state = self . np_random . uniform ( low = - 0.05 , high = 0.05 , size = ( 4 , ) )
2016-04-29 02:12:46 -07:00
self . steps_beyond_done = None
2016-04-27 08:00:58 -07:00
return np . array ( self . state )
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
def render ( self , mode = ' human ' ) :
2016-04-27 08:00:58 -07:00
screen_width = 600
screen_height = 400
world_width = self . x_threshold * 2
scale = screen_width / world_width
carty = 100 # TOP OF CART
polewidth = 10.0
polelen = scale * 1.0
cartwidth = 50.0
cartheight = 30.0
if self . viewer is None :
from gym . envs . classic_control import rendering
2017-01-03 23:19:14 -08:00
self . viewer = rendering . Viewer ( screen_width , screen_height )
2016-04-27 08:00:58 -07:00
l , r , t , b = - cartwidth / 2 , cartwidth / 2 , cartheight / 2 , - cartheight / 2
axleoffset = cartheight / 4.0
cart = rendering . FilledPolygon ( [ ( l , b ) , ( l , t ) , ( r , t ) , ( r , b ) ] )
self . carttrans = rendering . Transform ( )
cart . add_attr ( self . carttrans )
self . viewer . add_geom ( cart )
l , r , t , b = - polewidth / 2 , polewidth / 2 , polelen - polewidth / 2 , - polewidth / 2
pole = rendering . FilledPolygon ( [ ( l , b ) , ( l , t ) , ( r , t ) , ( r , b ) ] )
pole . set_color ( .8 , .6 , .4 )
self . poletrans = rendering . Transform ( translation = ( 0 , axleoffset ) )
pole . add_attr ( self . poletrans )
pole . add_attr ( self . carttrans )
self . viewer . add_geom ( pole )
self . axle = rendering . make_circle ( polewidth / 2 )
self . axle . add_attr ( self . poletrans )
self . axle . add_attr ( self . carttrans )
self . axle . set_color ( .5 , .5 , .8 )
self . viewer . add_geom ( self . axle )
self . track = rendering . Line ( ( 0 , carty ) , ( screen_width , carty ) )
self . track . set_color ( 0 , 0 , 0 )
self . viewer . add_geom ( self . track )
2017-02-27 10:00:48 -08:00
if self . state is None : return None
2016-04-27 08:00:58 -07:00
x = self . state
cartx = x [ 0 ] * scale + screen_width / 2.0 # MIDDLE OF CART
self . carttrans . set_translation ( cartx , carty )
self . poletrans . set_rotation ( - x [ 2 ] )
2016-06-06 10:06:26 +03:00
return self . viewer . render ( return_rgb_array = mode == ' rgb_array ' )
Cleanup, removal of unmaintained code (#836)
* add dtype to Box
* remove board_game, debugging, safety, parameter_tuning environments
* massive set of breaking changes
- remove python logging module
- _step, _reset, _seed, _close => non underscored method
- remove benchmark and scoring folder
* Improve render("human"), now resizable, closable window.
* get rid of default step and reset in wrappers, so it doesn’t silently fail for people with underscore methods
* CubeCrash unit test environment
* followup fixes
* MemorizeDigits unit test envrionment
* refactored spaces a bit
fixed indentation
disabled test_env_semantics
* fix unit tests
* fixes
* CubeCrash, MemorizeDigits tested
* gym backwards compatibility patch
* gym backwards compatibility, followup fixes
* changelist, add spaces to main namespaces
* undo_logger_setup for backwards compat
* remove configuration.py
2018-01-25 18:20:14 -08:00
def close ( self ) :
2018-09-14 13:36:57 -07:00
if self . viewer :
self . viewer . close ( )
self . viewer = None