Files

pzhokhov 8c2aea2add refactor a2c, acer, acktr, ppo2, deepq, and trpo_mpi (#490 )

* exported rl-algs

* more stuff from rl-algs

* run slow tests

* re-exported rl_algs

* re-exported rl_algs - fixed problems with serialization test and test_cartpole

* replaced atari_arg_parser with common_arg_parser

* run.py can run algos from both baselines and rl_algs

* added approximate humanoid reward with ppo2 into the README for reference

* dummy commit to RUN BENCHMARKS

* dummy commit to RUN BENCHMARKS

* dummy commit to RUN BENCHMARKS

* dummy commit to RUN BENCHMARKS

* very dummy commit to RUN BENCHMARKS

* serialize variables as a dict, not as a list

* running_mean_std uses tensorflow variables

* fixed import in vec_normalize

* dummy commit to RUN BENCHMARKS

* dummy commit to RUN BENCHMARKS

* flake8 complaints

* save all variables to make sure we save the vec_normalize normalization

* benchmarks on ppo2 only RUN BENCHMARKS

* make_atari_env compatible with mpi

* run ppo_mpi benchmarks only RUN BENCHMARKS

* hardcode names of retro environments

* add defaults

* changed default ppo2 lr schedule to linear RUN BENCHMARKS

* non-tf normalization benchmark RUN BENCHMARKS

* use ncpu=1 for mujoco sessions - gives a bit of a performance speedup

* reverted running_mean_std to user property decorators for mean, var, count

* reverted VecNormalize to use RunningMeanStd (no tf)

* reverted VecNormalize to use RunningMeanStd (no tf)

* profiling wip

* use VecNormalize with regular RunningMeanStd

* added acer runner (missing import)

* flake8 complaints

* added a note in README about TfRunningMeanStd and serialization of VecNormalize

* dummy commit to RUN BENCHMARKS

* merged benchmarks branch

2018-08-13 09:56:44 -07:00

__init__.py

add __init__.py

2017-08-27 22:36:24 -07:00

cnn_policy.py

Lots of cleanups

2018-01-25 18:54:24 -08:00

mlp_policy.py

fix trpo_mpi bug where logstd wasn’t included

2018-01-25 21:17:40 -08:00

pposgd_simple.py

Import internal changes (#422 )

2018-06-06 11:39:13 -07:00

README.md

Import internal changes (#422 )

2018-06-06 11:39:13 -07:00

run_atari.py

refactor a2c, acer, acktr, ppo2, deepq, and trpo_mpi (#490 )

2018-08-13 09:56:44 -07:00

run_humanoid.py

Import internal changes (#422 )

2018-06-06 11:39:13 -07:00

run_mujoco.py

Lots of cleanups

2018-01-25 18:54:24 -08:00

run_robotics.py

Import internal changes (#422 )

2018-06-06 11:39:13 -07:00

README.md

PPOSGD

Original paper: https://arxiv.org/abs/1707.06347
Baselines blog post: https://blog.openai.com/openai-baselines-ppo/
mpirun -np 8 python -m baselines.ppo1.run_atari runs the algorithm for 40M frames = 10M timesteps on an Atari game. See help (-h) for more options.
python -m baselines.ppo1.run_mujoco runs the algorithm for 1M frames on a Mujoco environment.
Train mujoco 3d humanoid (with optimal-ish hyperparameters): mpirun -np 16 python -m baselines.ppo1.run_humanoid --model-path=/path/to/model
Render the 3d humanoid: python -m baselines.ppo1.run_humanoid --play --model-path=/path/to/model