Update Docs with New Step API (#23)

Co-authored-by: Mark Towers <mark.m.towers@gmail.com>
This commit is contained in:
Arjun KG
2022-09-27 21:50:22 +05:30
committed by GitHub
parent 95da6c5714
commit 48b966233c
8 changed files with 125 additions and 42 deletions

View File

@@ -72,7 +72,7 @@ title: Vector
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
>>> envs.reset()
>>> actions = np.array([1, 0, 1])
>>> observations, rewards, terminated, truncated, infos = envs.step(actions)
>>> observations, rewards, termination, truncation, infos = envs.step(actions)
>>> observations
array([[ 0.00122802, 0.16228443, 0.02521779, -0.23700266],
@@ -81,7 +81,9 @@ array([[ 0.00122802, 0.16228443, 0.02521779, -0.23700266],
dtype=float32)
>>> rewards
array([1., 1., 1.])
>>> terminated
>>> termination
array([False, False, False])
>>> termination
array([False, False, False])
>>> infos
{}

View File

@@ -132,28 +132,28 @@ class ClipReward(gym.RewardWrapper):
Some users may want a wrapper which will automatically reset its wrapped environment when its wrapped environment reaches the done state. An advantage of this environment is that it will never produce undefined behavior as standard gymnasium environments do when stepping beyond the done state.
When calling step causes `self.env.step()` to return `done=True`,
When calling step causes `self.env.step()` to return `(terminated or truncated)=True`,
`self.env.reset()` is called,
and the return format of `self.step()` is as follows:
```python
new_obs, terminal_reward, terminated, truncated, info
new_obs, final_reward, final_terminated, final_truncated, info
```
`new_obs` is the first observation after calling `self.env.reset()`,
`terminal_reward` is the reward after calling `self.env.step()`,
`final_reward` is the reward after calling `self.env.step()`,
prior to calling `self.env.reset()`
`terminated or truncated` is always `True`
The expression `(final_terminated or final_truncated)` is always `True`
`info` is a dict containing all the keys from the info dict returned by
the call to `self.env.reset()`, with additional keys `terminal_observation`
the call to `self.env.reset()`, with additional keys `final_observation`
containing the observation returned by the last call to `self.env.step()`
and `terminal_info` containing the info dict returned by the last call
and `final_info` containing the info dict returned by the last call
to `self.env.step()`.
If `done` is not true when `self.env.step()` is called, `self.step()` returns
If `(terminated or truncated)` is not true when `self.env.step()` is called, `self.step()` returns
```python
obs, reward, terminated, truncated, info
@@ -180,7 +180,7 @@ that the when `self.env.step()` returns `done`, a
new observation from after calling `self.env.reset()` is returned
by `self.step()` alongside the terminal reward and done state from the
previous episode . If you need the terminal state from the previous
episode, you need to retrieve it via the the `terminal_observation` key
episode, you need to retrieve it via the the `final_observation` key
in the info dict. Make sure you know what you're doing if you
use this wrapper!
```