docs: fix typo (#1219)

This commit is contained in:
Neven Lukić
2024-10-19 00:34:53 +02:00
committed by GitHub
parent 9cf678ee32
commit 8ab56a4668
2 changed files with 2 additions and 2 deletions

View File

@@ -144,7 +144,7 @@ The :meth:`step` method usually contains most of the logic for your environment,
For our environment, several things need to happen during the step function:
- We use the self._action_to_direction to convert the discrete action (e.g., 2) to a grid direction with our agent location. To prevent the agent from going out of bounds of the grd, we clip the agen't location to stay within bounds.
- We use the self._action_to_direction to convert the discrete action (e.g., 2) to a grid direction with our agent location. To prevent the agent from going out of bounds of the grid, we clip the agent's location to stay within bounds.
- We compute the agent's reward by checking if the agent's current position is equal to the target's location.
- Since the environment doesn't truncate internally (we can apply a time limit wrapper to the environment during :meth:make), we permanently set truncated to False.
- We once again use _get_obs and _get_info to obtain the agent's observation and auxiliary information.

View File

@@ -55,7 +55,7 @@ In the script above, for the :class:`RecordVideo` wrapper, we specify three diff
For the :class:`RecordEpisodicStatistics`, we only need to specify the buffer lengths, this is the max length of the internal ``time_queue``, ``return_queue`` and ``length_queue``. Rather than collect the data for each episode individually, we can use the data queues to print the information at the end of the evaluation.
For speed ups in evaluating environments, it is possible to implement this with vector environments to in order to evaluate ``N`` episodes at the same time in parallel rather than series.
For speed ups in evaluating environments, it is possible to implement this with vector environments in order to evaluate ``N`` episodes at the same time in parallel rather than series.
```
## Recording the Agent during Training