diff --git a/docs/content/handling_timelimits.md b/docs/content/handling_timelimits.md index bc821614a..ea392b12c 100644 --- a/docs/content/handling_timelimits.md +++ b/docs/content/handling_timelimits.md @@ -1,5 +1,5 @@ # Handling Time Limits -In using Gymnasium environments with reinforcement learning code, a common problem observed is how time limits are incorrectly handled. The `done` signal received (in previous versions of gymnasium < 0.26) from `env.step` indicated whether an episode has ended. However, this signal did not distinguish whether the episode ended due to `termination` or `truncation`. +In using Gymnasium environments with reinforcement learning code, a common problem observed is how time limits are incorrectly handled. The `done` signal received (in previous versions of OpenAI Gym < 0.26) from `env.step` indicated whether an episode has ended. However, this signal did not distinguish whether the episode ended due to `termination` or `truncation`. ### Termination Termination refers to the episode ending after reaching a terminal state that is defined as part of the environment definition. Examples are - task success, task failure, robot falling down etc. Notably, this also includes episodes ending in finite-horizon environments due to a time-limit inherent to the environment. Note that to preserve Markov property, a representation of the remaining time must be present in the agent's observation in finite-horizon environments. [(Reference)](https://arxiv.org/abs/1712.00378) diff --git a/docs/content/vectorising.md b/docs/content/vectorising.md index 6aca1c65f..d402d2104 100644 --- a/docs/content/vectorising.md +++ b/docs/content/vectorising.md @@ -13,6 +13,7 @@ Gymnasium provides two types of vectorized environments: - `gymnasium.vector.AsyncVectorEnv`, where the different copies of the environment are executed in parallel using [multiprocessing](https://docs.python.org/3/library/multiprocessing.html). This creates one process per copy. + Similar to `gymnasium.make`, you can run a vectorized version of a registered environment using the `gymnasium.vector.make` function. This runs multiple copies of the same environment (in parallel, by default). The following example runs 3 copies of the ``CartPole-v1`` environment in parallel, taking as input a vector of 3 binary actions (one for each copy of the environment), and returning an array of 3 observations stacked along the first dimension, with an array of rewards returned by each copy, and an array of booleans indicating if the episode in each parallel environment has ended.