mirror of
https://github.com/Farama-Foundation/Gymnasium.git
synced 2025-08-31 10:09:53 +00:00
Deploying to gh-pages from @ Farama-Foundation/Gymnasium@ab03f684a9 🚀
This commit is contained in:
@@ -391,7 +391,7 @@
|
||||
|
||||
<section id="handling-time-limits">
|
||||
<h1>Handling Time Limits<a class="headerlink" href="#handling-time-limits" title="Permalink to this heading">#</a></h1>
|
||||
<p>In using Gymnasium environments with reinforcement learning code, a common problem observed is how time limits are incorrectly handled. The <code class="docutils literal notranslate"><span class="pre">done</span></code> signal received (in previous versions of gymnasium < 0.26) from <code class="docutils literal notranslate"><span class="pre">env.step</span></code> indicated whether an episode has ended. However, this signal did not distinguish whether the episode ended due to <code class="docutils literal notranslate"><span class="pre">termination</span></code> or <code class="docutils literal notranslate"><span class="pre">truncation</span></code>.</p>
|
||||
<p>In using Gymnasium environments with reinforcement learning code, a common problem observed is how time limits are incorrectly handled. The <code class="docutils literal notranslate"><span class="pre">done</span></code> signal received (in previous versions of OpenAI Gym < 0.26) from <code class="docutils literal notranslate"><span class="pre">env.step</span></code> indicated whether an episode has ended. However, this signal did not distinguish whether the episode ended due to <code class="docutils literal notranslate"><span class="pre">termination</span></code> or <code class="docutils literal notranslate"><span class="pre">truncation</span></code>.</p>
|
||||
<section id="termination">
|
||||
<h2>Termination<a class="headerlink" href="#termination" title="Permalink to this heading">#</a></h2>
|
||||
<p>Termination refers to the episode ending after reaching a terminal state that is defined as part of the environment definition. Examples are - task success, task failure, robot falling down etc. Notably, this also includes episodes ending in finite-horizon environments due to a time-limit inherent to the environment. Note that to preserve Markov property, a representation of the remaining time must be present in the agent’s observation in finite-horizon environments. <a class="reference external" href="https://arxiv.org/abs/1712.00378">(Reference)</a></p>
|
||||
|
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user