* update per-algorithm READMEs to reflect new way of running algorithms
* adding a link to repo-wide README
* updated README files and deepq.train_cartpole example
* updated benchmark pages with final rewards
* use htmlpreview to render pages
* use htmlpreview to render pages
* use htmlpreview to render pages
* updated README to reflect ppo1 being obsolete
* removed navbars from published benchmark pages
* fixed link in README
* exported rl-algs
* more stuff from rl-algs
* run slow tests
* re-exported rl_algs
* re-exported rl_algs - fixed problems with serialization test and test_cartpole
* replaced atari_arg_parser with common_arg_parser
* run.py can run algos from both baselines and rl_algs
* added approximate humanoid reward with ppo2 into the README for reference
* dummy commit to RUN BENCHMARKS
* dummy commit to RUN BENCHMARKS
* dummy commit to RUN BENCHMARKS
* dummy commit to RUN BENCHMARKS
* very dummy commit to RUN BENCHMARKS
* serialize variables as a dict, not as a list
* running_mean_std uses tensorflow variables
* fixed import in vec_normalize
* dummy commit to RUN BENCHMARKS
* dummy commit to RUN BENCHMARKS
* flake8 complaints
* save all variables to make sure we save the vec_normalize normalization
* benchmarks on ppo2 only RUN BENCHMARKS
* make_atari_env compatible with mpi
* run ppo_mpi benchmarks only RUN BENCHMARKS
* hardcode names of retro environments
* add defaults
* changed default ppo2 lr schedule to linear RUN BENCHMARKS
* non-tf normalization benchmark RUN BENCHMARKS
* use ncpu=1 for mujoco sessions - gives a bit of a performance speedup
* reverted running_mean_std to user property decorators for mean, var, count
* reverted VecNormalize to use RunningMeanStd (no tf)
* reverted VecNormalize to use RunningMeanStd (no tf)
* profiling wip
* use VecNormalize with regular RunningMeanStd
* added acer runner (missing import)
* flake8 complaints
* added a note in README about TfRunningMeanStd and serialization of VecNormalize
* dummy commit to RUN BENCHMARKS
* merged benchmarks branch
* import rl-algs from 2e3a166 commit
* extra import of the baselines badge
* exported commit with identity test
* proper rng seeding in the test_identity
* import internal
* adding missing tile_images.py
* import rl-algs from 2e3a166 commit
* extra import of the baselines badge
* exported commit with identity test
* proper rng seeding in the test_identity
* import internal
* import rl-algs from 2e3a166 commit
* extra import of the baselines badge
* exported commit with identity test
* proper rng seeding in the test_identity
* simple .travis.yml file
* added static syntax checks of common to .travis.yml
* dockerizing the build
* fix Dockerfile, adding build shield
* cleaning up workdir in Dockerfile and .travis.yml
* .travis.yml fixed common -> baselines/common for style check
* changes to README.md files with more detailed installation instructions
* md-fying the changes better
* link on the word homebrew in readme.md
* typos in README.md
* README.md
* removed extra comma sign
* removed sudo from brew command
The training loop used the rollout step variable `t` rather than the
training step variable `t_train` to decide when to adapt the scale of
the parameter space noise.
We need to flush the buffer after `pickle.dump`, otherwise the resulting zip archive might be incomplete (reproducible, if the state consists of a single integer).