Files
Gymnasium/docs/api.md
Justin Terry 76dd8d0c1c typos
2021-08-16 19:48:38 -04:00

124 lines
6.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# API
## Initializing Environments
Initializing environment is very easy in Gym and can be done via:
```python
import gym
env = gym.make('CartPole-v0')
```
## Interacting with the Environment
This example will run an instance of `CartPole-v0` environment for 1000 timesteps, rendering the environment at each step. You should see a window pop up rendering the classic [cart-pole](https://www.youtube.com/watch?v=J7E6_my3CHk&ab_channel=TylerStreeter) problem
```python
import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
env.render() # by default `mode="human"`(GUI), you can pass `mode="rbg_array"` to retrieve an image instead
env.step(env.action_space.sample()) # take a random action
env.close()
```
The output should look something like this
![cartpole-no-reset](https://user-images.githubusercontent.com/28860173/129241283-70069f7c-453d-4670-a226-854203bd8a1b.gif)
The commonly used methods are:
`reset()` resets the environment to its initial state and returns the observation corresponding to the initial state
`step(action)` takes an action as an input and implements that action in the environment. This method returns a set of four values
`render()` renders the environment
- `observation` (**object**) : an environment specific object representation your observation of the environment after the step is taken. Its often aliased as the next state after the action has been taken
- `reward`(**float**) : immediate reward achieved by the previous action. Actual value and range will varies between environments, but the final goal is always to increase your total reward
- `done`(**boolean**): whether its time to `reset` the environment again. Most (but not all) tasks are divided up into well-defined episodes, and `done` being `True` indicates the episode has terminated. (For example, perhaps the pole tipped too far, or you lost your last life.)
- `info`(**dict**) : This provides general information helpful for debugging or additional information depending on the environment, such as the raw probabilities behind the environments last state change
## Additional Environment API
- `action_space`: this attribute gives the format of valid actions. It is of datatype `Space` provided by Gym. (For ex: If the action space is of type `Discrete` and gives the value `Discrete(2)`, this means there are two valid discrete actions 0 & 1 )
```python
print(env.action_space)
#> Discrete(2)
print(env.observation_space)
#> Box(-3.4028234663852886e+38, 3.4028234663852886e+38, (4,), float32)
```
- `observation_space`: this attribute gives the format of valid observations. It if of datatype `Space` provided by Gym. (For ex: if the observation space is of type `Box` and the shape of the object is `(4,)`, this denotes a valid observation will be an array of 4 numbers). We can check the box bounds as well with attributes
```python
print(env.observation_space.high)
#> array([4.8000002e+00, 3.4028235e+38, 4.1887903e-01, 3.4028235e+38], dtype=float32)
print(env.observation_space.low)
#> array([-4.8000002e+00, -3.4028235e+38, -4.1887903e-01, -3.4028235e+38], dtype=float32)
```
- There are multiple types of Space types inherently available in gym:
- `Box` describes an n-dimensional continuous space. Its a bounded space where we can define the upper and lower limit which describe the valid values our observations can take.
- `Discrete` describes a discrete space where { 0, 1, ......., n-1} are the possible values our observation/action can take.
- `Dict` represents a dictionary of simple spaces.
- `Tuple` represents a tuple of simple spaces
- `MultiBinary` creates a n-shape binary space. Argument n can be a number or a `list` of numbers
- `MultiDiscrete` consists of a series of `Discrete` action spaces with different number of actions in each element
```python
observation_space = Box(low=-1.0, high=2.0, shape=(3,), dtype=np.float32)
print(observation_space.sample())
#> [ 1.6952509 -0.4399011 -0.7981693]
observation_space = Discrete(4)
print(observation_space.sample())
#> 1
observation_space = Dict({"position": Discrete(2), "velocity": Discrete(3)})
print(observation_space.sample())
#> OrderedDict([('position', 0), ('velocity', 1)])
observation_space = Tuple((Discrete(2), Discrete(3)))
print(observation_space.sample())
#> (1, 2)
observation_space = MultiBinary(5)
print(observation_space.sample())
#> [1 1 1 0 1]
observation_space = MultiDiscrete([ 5, 2, 2 ])
print(observation_space.sample())
#> [3 0 0]
```
- `reward_range`: returns a tuple corresponding to min and max possible rewards. Default range is set to `[-inf,+inf]`. You can set it if you want a narrower range
- `close()` : Override close in your subclass to perform any necessary cleanup
- `seed()`: Sets the seed for this env's random number generator
### Unwrapping an environment
If you have a wrapped environment, and you want to get the unwrapped environment underneath all the layers of wrappers (so that you can manually call a function or change some underlying aspect of the environment), you can use the `.unwrapped` attribute. If the environment is already a base environment, the `.unwrapped` attribute will just return itself.
```python
base_env = env.unwrapped
```
### Vectorized Environment
Vectorized Environments are a way of stacking multiple independent environments, so that instead of training on one environment, our agent can train on multiple environments at a time. Each `observation` returned from a vectorized environment is a batch of observations for each sub-environment, and `step` is also expected to receive a batch of actions for each sub-environment.
**NOTE:** All sub-environments should share the identical observation and action spaces. A vector of multiple different environments is not supported
Gym Vector API consists of two types of vectorized environments:
- `AsyncVectorEnv` runs multiple environments in parallel. It uses `multiprocessing` processes, and pipes for communication.
- `SyncVectorEnv`runs multiple environments serially
```python
import gym
env = gym.vector.make('CartPole-v1', 3,asynchronous=True) # Creates an Asynchronous env
env.reset()
#> array([[-0.04456399, 0.04653909, 0.01326909, -0.02099827],
#> [ 0.03073904, 0.00145001, -0.03088818, -0.03131252],
#> [ 0.03468829, 0.01500225, 0.01230312, 0.01825218]],
#> dtype=float32)
```