Fix tutorials inconsistencies (#44)

This commit is contained in:
Vairo Di Pasquale
2022-10-11 13:37:57 +02:00
committed by GitHub
parent 5ab0ea8364
commit ab03f684a9
2 changed files with 2 additions and 1 deletions

View File

@@ -1,5 +1,5 @@
# Handling Time Limits # Handling Time Limits
In using Gymnasium environments with reinforcement learning code, a common problem observed is how time limits are incorrectly handled. The `done` signal received (in previous versions of gymnasium < 0.26) from `env.step` indicated whether an episode has ended. However, this signal did not distinguish whether the episode ended due to `termination` or `truncation`. In using Gymnasium environments with reinforcement learning code, a common problem observed is how time limits are incorrectly handled. The `done` signal received (in previous versions of OpenAI Gym < 0.26) from `env.step` indicated whether an episode has ended. However, this signal did not distinguish whether the episode ended due to `termination` or `truncation`.
### Termination ### Termination
Termination refers to the episode ending after reaching a terminal state that is defined as part of the environment definition. Examples are - task success, task failure, robot falling down etc. Notably, this also includes episodes ending in finite-horizon environments due to a time-limit inherent to the environment. Note that to preserve Markov property, a representation of the remaining time must be present in the agent's observation in finite-horizon environments. [(Reference)](https://arxiv.org/abs/1712.00378) Termination refers to the episode ending after reaching a terminal state that is defined as part of the environment definition. Examples are - task success, task failure, robot falling down etc. Notably, this also includes episodes ending in finite-horizon environments due to a time-limit inherent to the environment. Note that to preserve Markov property, a representation of the remaining time must be present in the agent's observation in finite-horizon environments. [(Reference)](https://arxiv.org/abs/1712.00378)

View File

@@ -13,6 +13,7 @@ Gymnasium provides two types of vectorized environments:
- `gymnasium.vector.AsyncVectorEnv`, where the different copies of the environment are executed in parallel using [multiprocessing](https://docs.python.org/3/library/multiprocessing.html). This creates one process per copy. - `gymnasium.vector.AsyncVectorEnv`, where the different copies of the environment are executed in parallel using [multiprocessing](https://docs.python.org/3/library/multiprocessing.html). This creates one process per copy.
Similar to `gymnasium.make`, you can run a vectorized version of a registered environment using the `gymnasium.vector.make` function. This runs multiple copies of the same environment (in parallel, by default). Similar to `gymnasium.make`, you can run a vectorized version of a registered environment using the `gymnasium.vector.make` function. This runs multiple copies of the same environment (in parallel, by default).
The following example runs 3 copies of the ``CartPole-v1`` environment in parallel, taking as input a vector of 3 binary actions (one for each copy of the environment), and returning an array of 3 observations stacked along the first dimension, with an array of rewards returned by each copy, and an array of booleans indicating if the episode in each parallel environment has ended. The following example runs 3 copies of the ``CartPole-v1`` environment in parallel, taking as input a vector of 3 binary actions (one for each copy of the environment), and returning an array of 3 observations stacked along the first dimension, with an array of rewards returned by each copy, and an array of booleans indicating if the episode in each parallel environment has ended.