6.2 KiB
API
Initializing Environments
Initializing environment is very easy in Gym and can be done via:
import gym
env = gym.make('CartPole-v0')
Interacting with the Environment
This example will run an instance of CartPole-v0
environment for 1000 timesteps, rendering the environment at each step. You should see a window pop up rendering the classic cart-pole problem
import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
env.render() # by default `mode="human"`(GUI), you can pass `mode="rbg_array"` to retrieve an image instead
env.step(env.action_space.sample()) # take a random action
env.close()
The output should look something like this
The commonly used methods are:
reset()
resets the environment to its initial state and returns the observation corresponding to the initial state
step(action)
takes an action as an input and implements that action in the environment. This method returns a set of four values
render()
renders the environment
observation
(object) : an environment specific object representation your observation of the environment after the step is taken. Its often aliased as the next state after the action has been takenreward
(float) : immediate reward achieved by the previous action. Actual value and range will varies between environments, but the final goal is always to increase your total rewarddone
(boolean): whether it’s time toreset
the environment again. Most (but not all) tasks are divided up into well-defined episodes, anddone
beingTrue
indicates the episode has terminated. (For example, perhaps the pole tipped too far, or you lost your last life.)info
(dict) : This provides general information helpful for debugging or additional information depending on the environment, such as the raw probabilities behind the environment’s last state change
Additional Environment API
action_space
: this attribute gives the format of valid actions. It is of datatypeSpace
provided by Gym. (For ex: If the action space is of typeDiscrete
and gives the valueDiscrete(2)
, this means there are two valid discrete actions 0 & 1 )
print(env.action_space)
#> Discrete(2)
print(env.observation_space)
#> Box(-3.4028234663852886e+38, 3.4028234663852886e+38, (4,), float32)
observation_space
: this attribute gives the format of valid observations. It if of datatypeSpace
provided by Gym. (For ex: if the observation space is of typeBox
and the shape of the object is(4,)
, this denotes a valid observation will be an array of 4 numbers). We can check the box bounds as well with attributes
print(env.observation_space.high)
#> array([4.8000002e+00, 3.4028235e+38, 4.1887903e-01, 3.4028235e+38], dtype=float32)
print(env.observation_space.low)
#> array([-4.8000002e+00, -3.4028235e+38, -4.1887903e-01, -3.4028235e+38], dtype=float32)
- There are multiple types of Space types inherently available in gym:
Box
describes an n-dimensional continuous space. Its a bounded space where we can define the upper and lower limit which describe the valid values our observations can take.Discrete
describes a discrete space where { 0, 1, ......., n-1} are the possible values our observation/action can take.Dict
represents a dictionary of simple spaces.Tuple
represents a tuple of simple spacesMultiBinary
creates a n-shape binary space. Argument n can be a number or alist
of numbersMultiDiscrete
consists of a series ofDiscrete
action spaces with different number of actions in each element
observation_space = Box(low=-1.0, high=2.0, shape=(3,), dtype=np.float32) print(observation_space.sample()) #> [ 1.6952509 -0.4399011 -0.7981693] observation_space = Discrete(4) print(observation_space.sample()) #> 1 observation_space = Dict({"position": Discrete(2), "velocity": Discrete(3)}) print(observation_space.sample()) #> OrderedDict([('position', 0), ('velocity', 1)]) observation_space = Tuple((Discrete(2), Discrete(3))) print(observation_space.sample()) #> (1, 2) observation_space = MultiBinary(5) print(observation_space.sample()) #> [1 1 1 0 1] observation_space = MultiDiscrete([ 5, 2, 2 ]) print(observation_space.sample()) #> [3 0 0]
reward_range
: returns a tuple corresponding to min and max possible rewards. Default range is set to[-inf,+inf]
. You can set it if you want a narrower rangeclose()
: Override close in your subclass to perform any necessary cleanupseed()
: Sets the seed for this env's random number generator
Unwrapping an environment
If you have a wrapped environment, and you want to get the unwrapped environment underneath all the layers of wrappers (so that you can manually call a function or change some underlying aspect of the environment), you can use the .unwrapped
attribute. If the environment is already a base environment, the .unwrapped
attribute will just return itself.
base_env = env.unwrapped
Vectorized Environment
Vectorized Environments are a way of stacking multiple independent environments, so that instead of training on one environment, our agent can train on multiple environments at a time. Each observation
returned from a vectorized environment is a batch of observations for each sub-environment, and step
is also expected to receive a batch of actions for each sub-environment.
NOTE: All sub-environments should share the identical observation and action spaces. A vector of multiple different environments is not supported
Gym Vector API consists of two types of vectorized environments:
AsyncVectorEnv
runs multiple environments in parallel. It usesmultiprocessing
processes, and pipes for communication.SyncVectorEnv
runs multiple environments serially
import gym
env = gym.vector.make('CartPole-v1', 3,asynchronous=True) # Creates an Asynchronous env
env.reset()
#> array([[-0.04456399, 0.04653909, 0.01326909, -0.02099827],
#> [ 0.03073904, 0.00145001, -0.03088818, -0.03131252],
#> [ 0.03468829, 0.01500225, 0.01230312, 0.01825218]],
#> dtype=float32)