mirror of
https://github.com/Farama-Foundation/Gymnasium.git
synced 2025-08-01 14:10:30 +00:00
Update Docs with New Step API (#23)
Co-authored-by: Mark Towers <mark.m.towers@gmail.com>
This commit is contained in:
@@ -72,7 +72,7 @@ title: Vector
|
||||
>>> envs = gym.vector.make("CartPole-v1", num_envs=3)
|
||||
>>> envs.reset()
|
||||
>>> actions = np.array([1, 0, 1])
|
||||
>>> observations, rewards, terminated, truncated, infos = envs.step(actions)
|
||||
>>> observations, rewards, termination, truncation, infos = envs.step(actions)
|
||||
|
||||
>>> observations
|
||||
array([[ 0.00122802, 0.16228443, 0.02521779, -0.23700266],
|
||||
@@ -81,7 +81,9 @@ array([[ 0.00122802, 0.16228443, 0.02521779, -0.23700266],
|
||||
dtype=float32)
|
||||
>>> rewards
|
||||
array([1., 1., 1.])
|
||||
>>> terminated
|
||||
>>> termination
|
||||
array([False, False, False])
|
||||
>>> termination
|
||||
array([False, False, False])
|
||||
>>> infos
|
||||
{}
|
||||
|
@@ -132,28 +132,28 @@ class ClipReward(gym.RewardWrapper):
|
||||
|
||||
Some users may want a wrapper which will automatically reset its wrapped environment when its wrapped environment reaches the done state. An advantage of this environment is that it will never produce undefined behavior as standard gymnasium environments do when stepping beyond the done state.
|
||||
|
||||
When calling step causes `self.env.step()` to return `done=True`,
|
||||
When calling step causes `self.env.step()` to return `(terminated or truncated)=True`,
|
||||
`self.env.reset()` is called,
|
||||
and the return format of `self.step()` is as follows:
|
||||
|
||||
```python
|
||||
new_obs, terminal_reward, terminated, truncated, info
|
||||
new_obs, final_reward, final_terminated, final_truncated, info
|
||||
```
|
||||
|
||||
`new_obs` is the first observation after calling `self.env.reset()`,
|
||||
|
||||
`terminal_reward` is the reward after calling `self.env.step()`,
|
||||
`final_reward` is the reward after calling `self.env.step()`,
|
||||
prior to calling `self.env.reset()`
|
||||
|
||||
`terminated or truncated` is always `True`
|
||||
The expression `(final_terminated or final_truncated)` is always `True`
|
||||
|
||||
`info` is a dict containing all the keys from the info dict returned by
|
||||
the call to `self.env.reset()`, with additional keys `terminal_observation`
|
||||
the call to `self.env.reset()`, with additional keys `final_observation`
|
||||
containing the observation returned by the last call to `self.env.step()`
|
||||
and `terminal_info` containing the info dict returned by the last call
|
||||
and `final_info` containing the info dict returned by the last call
|
||||
to `self.env.step()`.
|
||||
|
||||
If `done` is not true when `self.env.step()` is called, `self.step()` returns
|
||||
If `(terminated or truncated)` is not true when `self.env.step()` is called, `self.step()` returns
|
||||
|
||||
```python
|
||||
obs, reward, terminated, truncated, info
|
||||
@@ -180,7 +180,7 @@ that the when `self.env.step()` returns `done`, a
|
||||
new observation from after calling `self.env.reset()` is returned
|
||||
by `self.step()` alongside the terminal reward and done state from the
|
||||
previous episode . If you need the terminal state from the previous
|
||||
episode, you need to retrieve it via the the `terminal_observation` key
|
||||
episode, you need to retrieve it via the the `final_observation` key
|
||||
in the info dict. Make sure you know what you're doing if you
|
||||
use this wrapper!
|
||||
```
|
||||
|
Reference in New Issue
Block a user