* Conditionally select different arguments for ffmpeg, and add meaningful error for third party application in charge of encoding
* Consistency with other logger() calls
Co-authored-by: J K Terry <justinkterry@gmail.com>
RiverSwim which is a simple hard exploration environment has been added to the list of third-party environments.
Co-authored-by: J K Terry <justinkterry@gmail.com>
* Fixing Warning issue
Fixing an issue causing a Warning message to pop because of conversion from python standard float64 to np.float32
* car_racing.py Warning Fix
Fixed a bug generating a warning with conversion from standard `float 64` to `numpy float 32`.
* Fixing Warning issue
Fixed a bug generating a warning with conversion from standard `float 64` to `numpy float 32`.
* Fixing Warning issue
Fixed a bug generating a warning with conversion from standard `float 64` to `numpy float 32`.
* bracket missing
* Bracket misplaced
* reverting to older version of 3.6 python to test if the build is failing because of that
* revert 3.7 to 3.7.3 for the build
* revert python 3.8 version to 3.8.1
* do not install mujoco on 3.8 and 3.9
* enable mujoco for 3.7
* .
* .
* .
* use regex to navigate python version dependent package installation
* .
* try only one python version at a time
* switch to possibly more popular python tag for 3.6
* disable mujoco
**Issues:** The current `reward_threhold` for `FrozenLake-v0` and `FrozenLake8x8-v0` is too high to be attained.
Commit: df515de07d @joschu
**Solution:** Reduce the `reward_threhold` to make them attainable.
**Reference:** Codes to calculate the theoretic optimal reward expectations:
```python
import gym
env = gym.make('FrozenLake-v0')
print(env.observation_space.n) # 16
print(env.action_space.n) # 4
print(env.spec.reward_threshold) # 0.78, should be smaller
print(env.spec.max_episode_steps) # 100
import numpy as np
v = np.zeros((101, 16), dtype=float)
q = np.zeros((101, 16, 4), dtype=float)
pi = np.zeros((101, 16), dtype=float)
for t in range(99, -1, -1): # backward
for s in range(16):
for a in range(4):
for p, next_s, r, d in env.P[s][a]:
q[t, s, a] += p * (r + (1. - float(d)) * v[t+1, next_s])
v[t, s] = q[t, s].max()
pi[t, s] = q[t, s].argmax()
print(v[0, 0]) # ~0.74 < 0.78
```
```python
import gym
env = gym.make('FrozenLake8x8-v0')
print(env.observation_space.n) # 64
print(env.action_space.n) # 4
print(env.spec.reward_threshold) # 0.99, should be smaller
print(env.spec.max_episode_steps) # 200
import numpy as np
v = np.zeros((201, 64), dtype=float)
q = np.zeros((201, 64, 4), dtype=float)
pi = np.zeros((201, 64), dtype=float)
for t in range(199, -1, -1): # backward
for s in range(64):
for a in range(4):
for p, next_s, r, d in env.P[s][a]:
q[t, s, a] += p * (r + (1. - float(d)) * v[t+1, next_s])
v[t, s] = q[t, s].max()
pi[t, s] = q[t, s].argmax()
print(v[0, 0]) # ~0.91 < 0.99
```