Update Docs with New Step API (#23)

Co-authored-by: Mark Towers <mark.m.towers@gmail.com>
2025-08-01 14:10:30 +00:00 · 2022-09-27 21:50:22 +05:30
parent 95da6c5714
commit 48b966233c
8 changed files with 125 additions and 42 deletions
--- a/docs/api/vector.md
+++ b/docs/api/vector.md
@@ -72,7 +72,7 @@ title: Vector
 >>> envs = gym.vector.make("CartPole-v1", num_envs=3)
 >>> envs.reset()
 >>> actions = np.array([1, 0, 1])
->>> observations, rewards, terminated, truncated, infos = envs.step(actions)
+>>> observations, rewards, termination, truncation, infos = envs.step(actions)

 >>> observations
 array([[ 0.00122802,  0.16228443,  0.02521779, -0.23700266],
@@ -81,7 +81,9 @@ array([[ 0.00122802,  0.16228443,  0.02521779, -0.23700266],
        dtype=float32)
 >>> rewards
 array([1., 1., 1.])
->>> terminated
+>>> termination
+array([False, False, False])
+>>> termination
 array([False, False, False])
 >>> infos
 {}
--- a/docs/api/wrappers.md
+++ b/docs/api/wrappers.md
@@ -132,28 +132,28 @@ class ClipReward(gym.RewardWrapper):

 Some users may want a wrapper which will automatically reset its wrapped environment when its wrapped environment reaches the done state. An advantage of this environment is that it will never produce undefined behavior as standard gymnasium environments do when stepping beyond the done state. 

-When calling step causes `self.env.step()` to return `done=True`,
+When calling step causes `self.env.step()` to return `(terminated or truncated)=True`,
 `self.env.reset()` is called,
 and the return format of `self.step()` is as follows:

 ```python
-new_obs, terminal_reward, terminated, truncated, info
+new_obs, final_reward, final_terminated, final_truncated, info
 ```

 `new_obs` is the first observation after calling `self.env.reset()`,

-`terminal_reward` is the reward after calling `self.env.step()`,
+`final_reward` is the reward after calling `self.env.step()`,
 prior to calling `self.env.reset()`

-`terminated or truncated` is always `True`
+The expression `(final_terminated or final_truncated)` is always `True`

 `info` is a dict containing all the keys from the info dict returned by
-the call to `self.env.reset()`, with additional keys `terminal_observation`
+the call to `self.env.reset()`, with additional keys `final_observation`
 containing the observation returned by the last call to `self.env.step()`
-and `terminal_info` containing the info dict returned by the last call
+and `final_info` containing the info dict returned by the last call
 to `self.env.step()`.

-If `done` is not true when `self.env.step()` is called, `self.step()` returns
+If `(terminated or truncated)` is not true when `self.env.step()` is called, `self.step()` returns

 ```python
 obs, reward, terminated, truncated, info
@@ -180,7 +180,7 @@ that the when `self.env.step()` returns `done`, a
 new observation from after calling `self.env.reset()` is returned
 by `self.step()` alongside the terminal reward and done state from the
 previous episode . If you need the terminal state from the previous
-episode, you need to retrieve it via the the `terminal_observation` key
+episode, you need to retrieve it via the the `final_observation` key
 in the info dict. Make sure you know what you're doing if you
 use this wrapper!
 ```