Go to file

pzhokhov fe06c6b4db continuous action spaces for codegen + some benchmarking (#82 )

* add some docstrings

* start making big changes

* state machine redesign

* sampling seems to work

* some reorg

* fixed sampling of real vals

* json conversion

* made it possible to register new commands
got nontrivial version of Pred working

* consolidate command definitions

* add more macro blocks

* revived visualization

* rename Userdata -> CmdInterpreter
make AlgoSmInstance subclass of SmInstance that uses appropriate userdata argument

* replace userdata by ci when appropriate

* minor test fixes

* revamped handmade dir, can run ppo_metal

* seed to avoid random test failure

* implement AlgoAgent

* Autogenerated object that performs all ops and macros

* more CmdRecorder changes

* move files around

* move MatchProb and JtftProb

* remove obsolete

* fix tests involving AlgoAgent (pending the next commit on ppo_metal code)

* ppo_metal: reduce duplication in policy_gen, make sess an attribute of PpoAgent and StochasticPolicy instead of using get_default_session everywhere.

* maze_env reformatting, move algo_search script (but stil broken)

* move agent.py

* fix test on handcrafted agents

* tuning/fixing ppo_metal baseline

* minor

* Fix ppo_metal baseline

* Don’t set epcount, tcount unless they’re being used

* get rid of old ppo_metal baseline

* fixes for handmade/run.py tuning

* fix codegen ppo

* fix handmade ppo hps

* fix test, go back to safe_div

* switch to more complex filtering

* make sure all handcrafted algos have finite probability

* train to maximize logprob of provided samples
Trex changes to avoid segfault

* AlgoSm also includes global hyperparams

* don’t duplicate global hyperparam defaults

* create generic_ob_ac_space function

* use sorted list of outkeys

* revive tsne

* todo changes

* determinism test

* todo + test fix

* remove a few deprecated files, rename other tests so they don’t run automatically, fix real test failure

* continuous control with codegen

* continuous control with codegen

* implement continuous action space algodistr

* ppo with trex RUN BENCHMARKS

* wrap trex in a monitor

* dummy commit to RUN BENCHMARKS

* adding monitor to trex env RUN BENCHMARKS

* adding monitor to trex RUN BENCHMARKS

* include monitor into trex env RUN BENCHMARKS

* generate nll and predmean using Distribution node

* dummy commit to RUN BENCHMARKS

* include pybullet into baselines optional dependencies

* dummy commit to RUN BENCHMARKS

* install games for cron rcall user RUN BENCHMARKS

* add --yes flag to install.py in rcall config for cron user RUN BENCHMARKS

* both continuous and discrete versions seem to run

* fixes to monitor to work with vecenv-like info and rewards RUN BENCHMARKS

* dummy commit to RUN BENCHMARKS

* removed shape check from one-hot encoding logic in distributions.CategoricalPd

* reset logger configuration in codegen/handmade/run.py to be in-line with baselines RUN BENCHMARKS

* merged peterz_codegen_benchmarks RUN BENCHMARKS

* skip tests RUN BENCHMARKS

* working on test failures

* save benchmark dicts RUN BENCHMARK

* merged peterz_codegen_benchmark RUN BENCHMARKS

* add get_git_commit_message to the baselines.common.console_util

* dummy commit to RUN BENCHMARKS

* merged fixes from peterz_codegen_benchmark RUN BENCHMARKS

* fixing failure in test_algo_nll WIP

* test_algo_nll passes with both ppo and softq

* re-enabled tests

* run trex on gpus for 100k total (horizon=100k / 16) RUN BENCHMARKS

* merged latest peterz_codegen_benchmarks RUN BENCHMARKS

* fixing codegen test failures (logging-related)

* fixed name collision in run-benchmarks-new.py RUN BENCHMARKS

* fixed name collision in run-benchmarks-new.py RUN BENCHMARKS

* fixed import in node_filters.py

* test_algo_search passes

* some cleanup

* dummy commit to RUN BENCHMARKS

* merge fast fail for subprocvecenv RUN BENCHMARKS

* use SubprocVecEnv in sonic_prob

* added deprecation note to shmem_vec_env

* allow indexing of distributions

* add timeout to pipeline.yaml

* typo in pipeline.yml

* run tests with --forked option

* resolved merge conflict in rl_algs.bench.benchmarks

* re-enable parallel tests

* fix remaining merge conflicts and syntax

* Update trex_prob.py

* fixes to ResultsWriter

* take baselines/run.py from peterz_codegen branch

* actually save stuff to file in VecMonitor RUN BENCHMARKS

* enable parallel tests

* merge stricter flake8

* merge peterz_codegen_benchmark, resolve conflicts

* autopep8

* remove traces of Monitor from trex env, check shapes before encoding in CategoricalPd

* asserts and warnings to make q -> distribution change more explicit

* fixed assert in CategoricalPd

* add header to vec_monitor output file RUN BENCHMARKS

* make VecMonitor write header to the output file

* remove deprecation message from shmem_vec_env RUN BENCHMARKS

* autopep8

* proper shape test in distributions.py

* ResultsWriter can take dict headers

* dummy commit to RUN BENCHMARKS

* replace assert len(qs)==1 with warning RUN BENCHMARKS

* removed pdb from ppo2 RUN BENCHMARKS

2018-09-14 15:43:49 -07:00

baselines

continuous action spaces for codegen + some benchmarking (#82 )

2018-09-14 15:43:49 -07:00

data

Initial commit

2017-05-24 02:34:20 -07:00

.benchmark_pattern

refactor a2c, acer, acktr, ppo2, deepq, and trpo_mpi (#490 )

2018-08-13 09:56:44 -07:00

.gitignore

refactor a2c, acer, acktr, ppo2, deepq, and trpo_mpi (#490 )

2018-08-13 09:56:44 -07:00

.travis.yml

flake8 fixes (#35 )

2018-08-27 16:40:13 -07:00

benchmarks_atari10M.htm

Publish benchmark results (#502 )

2018-08-13 15:59:43 -07:00

benchmarks_mujoco1M.htm

Publish benchmark results (#502 )

2018-08-13 15:59:43 -07:00

Dockerfile

baselines issue 146 (remove tensorflow from setup.py) (#34 )

2018-08-21 16:59:05 -07:00

LICENSE

Initial commit

2017-05-24 02:34:20 -07:00

README.md

added peterz to baselines authorlist

2018-09-11 12:44:51 -07:00

setup.cfg

tighten flake8, autopep8 to fix trailing whitespaces and blank lines with whitespaces (#87 )

2018-09-11 13:18:43 -07:00

setup.py

continuous action spaces for codegen + some benchmarking (#82 )

2018-09-14 15:43:49 -07:00

README.md

Baselines

OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms.

These algorithms will make it easier for the research community to replicate, refine, and identify new ideas, and will create good baselines to build research on top of. Our DQN implementation and its variants are roughly on par with the scores in published papers. We expect they will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones.

Prerequisites

Baselines requires python3 (>=3.5) with the development headers. You'll also need system packages CMake, OpenMPI and zlib. Those can be installed as follows

Ubuntu

sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev

Mac OS X

Installation of system packages on Mac requires Homebrew. With Homebrew installed, run the follwing:

brew install cmake openmpi

Virtual environment

From the general python package sanity perspective, it is a good idea to use virtual environments (virtualenvs) to make sure packages from different projects do not interfere with each other. You can install virtualenv (which is itself a pip package) via

pip install virtualenv

Virtualenvs are essentially folders that have copies of python executable and all python packages. To create a virtualenv called venv with python3, one runs

virtualenv /path/to/venv --python=python3

To activate a virtualenv:

. /path/to/venv/bin/activate

More thorough tutorial on virtualenvs and options can be found here

Installation

Clone the repo and cd into it:

git clone https://github.com/openai/baselines.git
cd baselines

If you don't have TensorFlow installed already, install your favourite flavor of TensorFlow. In most cases,
```
pip install tensorflow-gpu # if you have a CUDA-compatible gpu and proper drivers
```
or
```
pip install tensorflow
```
should be sufficient. Refer to TensorFlow installation guide for more details.
Install baselines package
```
pip install -e .
```

MuJoCo

Some of the baselines examples use MuJoCo (multi-joint dynamics in contact) physics simulator, which is proprietary and requires binaries and a license (temporary 30-day license can be obtained from www.mujoco.org). Instructions on setting up MuJoCo can be found here

Testing the installation

All unit tests in baselines can be run using pytest runner:

pip install pytest
pytest

Training models

Most of the algorithms in baselines repo are used as follows:

python -m baselines.run --alg=<name of the algorithm> --env=<environment_id> [additional arguments]

Example 1. PPO with MuJoCo Humanoid

For instance, to train a fully-connected network controlling MuJoCo humanoid using PPO2 for 20M timesteps

python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7

Note that for mujoco environments fully-connected network is default, so we can omit --network=mlp The hyperparameters for both network and the learning algorithm can be controlled via the command line, for instance:

python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --ent_coef=0.1 --num_hidden=32 --num_layers=3 --value_network=copy

will set entropy coeffient to 0.1, and construct fully connected network with 3 layers with 32 hidden units in each, and create a separate network for value function estimation (so that its parameters are not shared with the policy network, but the structure is the same)

See docstrings in common/models.py for description of network parameters for each type of model, and docstring for baselines/ppo2/ppo2.py/learn() for the description of the ppo2 hyperparamters.

Example 2. DQN on Atari

DQN with Atari is at this point a classics of benchmarks. To run the baselines implementation of DQN on Atari Pong:

python -m baselines.run --alg=deepq --env=PongNoFrameskip-v4 --num_timesteps=1e6

Saving, loading and visualizing models

The algorithms serialization API is not properly unified yet; however, there is a simple method to save / restore trained models. --save_path and --load_path command-line option loads the tensorflow state from a given path before training, and saves it after the training, respectively. Let's imagine you'd like to train ppo2 on Atari Pong, save the model and then later visualize what has it learnt.

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=2e7 --save_path=~/models/pong_20M_ppo2

This should get to the mean reward per episode about 20. To load and visualize the model, we'll do the following - load the model, train it for 0 steps, and then visualize:

python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=0 --load_path=~/models/pong_20M_ppo2 --play

NOTE: At the moment Mujoco training uses VecNormalize wrapper for the environment which is not being saved correctly; so loading the models trained on Mujoco will not work well if the environment is recreated. If necessary, you can work around that by replacing RunningMeanStd by TfRunningMeanStd in baselines/common/vec_env/vec_normalize.py. This way, mean and std of environment normalizing wrapper will be saved in tensorflow variables and included in the model file; however, training is slower that way - hence not including it by default

Using baselines with TensorBoard

Baselines logger can save data in the TensorBoard format. To do so, set environment variables OPENAI_LOG_FORMAT and OPENAI_LOGDIR:

export OPENAI_LOG_FORMAT='stdout,log,csv,tensorboard' # formats are comma-separated, but for tensorboard you only really need the last one
export OPENAI_LOGDIR=path/to/tensorboard/data

And you can now start TensorBoard with:

tensorboard --logdir=$OPENAI_LOGDIR

Subpackages

A2C
ACER
ACKTR
DDPG
DQN
GAIL
HER
PPO1 (obsolete version, left here temporarily)
PPO2
TRPO

Benchmarks

Results of benchmarks on Mujoco (1M timesteps) and Atari (10M timesteps) are available here for Mujoco and here for Atari respectively. Note that these results may be not on the latest version of the code, particular commit hash with which results were obtained is specified on the benchmarks page.

To cite this repository in publications:

@misc{baselines,
  author = {Dhariwal, Prafulla and Hesse, Christopher and Klimov, Oleg and Nichol, Alex and Plappert, Matthias and Radford, Alec and Schulman, John and Sidor, Szymon and Wu, Yuhuai and Zhokhov, Peter},
  title = {OpenAI Baselines},
  year = {2017},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/openai/baselines}},
}