mirror of
https://github.com/Farama-Foundation/Gymnasium.git
synced 2025-08-23 23:12:46 +00:00
docs: fix typo (#1219)
This commit is contained in:
@@ -144,7 +144,7 @@ The :meth:`step` method usually contains most of the logic for your environment,
|
|||||||
|
|
||||||
For our environment, several things need to happen during the step function:
|
For our environment, several things need to happen during the step function:
|
||||||
|
|
||||||
- We use the self._action_to_direction to convert the discrete action (e.g., 2) to a grid direction with our agent location. To prevent the agent from going out of bounds of the grd, we clip the agen't location to stay within bounds.
|
- We use the self._action_to_direction to convert the discrete action (e.g., 2) to a grid direction with our agent location. To prevent the agent from going out of bounds of the grid, we clip the agent's location to stay within bounds.
|
||||||
- We compute the agent's reward by checking if the agent's current position is equal to the target's location.
|
- We compute the agent's reward by checking if the agent's current position is equal to the target's location.
|
||||||
- Since the environment doesn't truncate internally (we can apply a time limit wrapper to the environment during :meth:make), we permanently set truncated to False.
|
- Since the environment doesn't truncate internally (we can apply a time limit wrapper to the environment during :meth:make), we permanently set truncated to False.
|
||||||
- We once again use _get_obs and _get_info to obtain the agent's observation and auxiliary information.
|
- We once again use _get_obs and _get_info to obtain the agent's observation and auxiliary information.
|
||||||
|
@@ -55,7 +55,7 @@ In the script above, for the :class:`RecordVideo` wrapper, we specify three diff
|
|||||||
|
|
||||||
For the :class:`RecordEpisodicStatistics`, we only need to specify the buffer lengths, this is the max length of the internal ``time_queue``, ``return_queue`` and ``length_queue``. Rather than collect the data for each episode individually, we can use the data queues to print the information at the end of the evaluation.
|
For the :class:`RecordEpisodicStatistics`, we only need to specify the buffer lengths, this is the max length of the internal ``time_queue``, ``return_queue`` and ``length_queue``. Rather than collect the data for each episode individually, we can use the data queues to print the information at the end of the evaluation.
|
||||||
|
|
||||||
For speed ups in evaluating environments, it is possible to implement this with vector environments to in order to evaluate ``N`` episodes at the same time in parallel rather than series.
|
For speed ups in evaluating environments, it is possible to implement this with vector environments in order to evaluate ``N`` episodes at the same time in parallel rather than series.
|
||||||
```
|
```
|
||||||
|
|
||||||
## Recording the Agent during Training
|
## Recording the Agent during Training
|
||||||
|
Reference in New Issue
Block a user