This commit is contained in:
Justin Terry
2021-08-16 19:48:38 -04:00
parent 1da3d4f8e9
commit 76dd8d0c1c

View File

@@ -28,10 +28,10 @@ The output should look something like this
The commonly used methods are:
`reset()` resets the environment to its initial state and returns the observation corresponding to the initial state
`step(action)` takes an action as an input and implements that action in the environement. This method returns a set of four values
`step(action)` takes an action as an input and implements that action in the environment. This method returns a set of four values
`render()` renders the environment
- `observation` (**object**) : an environment specific object repesentation your observation of the environment after the step is taken. Its often aliased as the next state after the action has been taken
- `observation` (**object**) : an environment specific object representation your observation of the environment after the step is taken. Its often aliased as the next state after the action has been taken
- `reward`(**float**) : immediate reward achieved by the previous action. Actual value and range will varies between environments, but the final goal is always to increase your total reward
- `done`(**boolean**): whether its time to `reset` the environment again. Most (but not all) tasks are divided up into well-defined episodes, and `done` being `True` indicates the episode has terminated. (For example, perhaps the pole tipped too far, or you lost your last life.)
- `info`(**dict**) : This provides general information helpful for debugging or additional information depending on the environment, such as the raw probabilities behind the environments last state change