39 Commits

Author SHA1 Message Date
pzhokhov
a07fad9066 change rms 2 tfrms switch in vec_normalize to be more explicit (#886)
* change rms 2 tfrms switch in vec_normalize to be more explicit

* modify the vec_normalize / use_tf logic a little bit

* typo

* use_tf = False by default
2019-04-26 16:14:21 -07:00
Darío Hereñú
b1644157d6 Fixed typo on #092 (#824) 2019-04-01 15:41:52 -07:00
Christopher Hesse
8607dca99e Update README.md 2018-11-21 14:57:10 -08:00
pzhokhov
c14d307834 move viz docs to a notebook entirely (#704)
* viz docs

* writing vizualization docs

* documenting plot_util

* docstrings in plot_util

* autopep8 and flake8

* spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)

* rephrased viz.md a little bit

* more examples of viz code usage in the docs

* replaced vizualization doc with notebook
2018-11-07 17:19:42 -08:00
pzhokhov
c74ce02b9d visualization code docs / bugfixes (#701)
* viz docs

* writing vizualization docs

* documenting plot_util

* docstrings in plot_util

* autopep8 and flake8

* spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)

* rephrased viz.md a little bit
2018-11-05 14:31:15 -08:00
Erik Doffagne
7bfbcf177e Fixed typos in README (#635) 2018-10-04 10:31:22 -07:00
pzhokhov
394339deb5 Update README.md 2018-10-03 20:53:58 -07:00
Peter Zhokhov
34ae3194b4 add a note about DQN algorithms not performing well 2018-09-27 12:51:43 -07:00
pzhokhov
0e7048b89f Update README.md 2018-09-19 15:04:54 -07:00
pzhokhov
75983bab64 Update README.md 2018-09-19 15:04:01 -07:00
Peter Zhokhov
5c62f5c7dd added peterz to baselines authorlist 2018-09-11 12:44:51 -07:00
Daniel Angelov
58b1021b28 Add tensorboard start command for convenience (#569) 2018-09-07 17:04:02 -07:00
Peter Zhokhov
be9118bcd8 git subrepo pull (merge) baselines
subrepo:
  subdir:   "baselines"
  merged:   "f2a9b8f2"
upstream:
  origin:   "git@github.com:openai/baselines.git"
  branch:   "master"
  commit:   "cc4215ef"
git-subrepo:
  version:  "0.4.0"
  origin:   "git@github.com:ingydotnet/git-subrepo.git"
  commit:   "74339e8"
2018-09-06 10:18:13 -07:00
pzhokhov
02a5e7aed5 fixes to readme and baselines/run.py (#80)
* fixes to readme and baselines/run.py

* polish installation section of baselines README

* polish installation section of baselines README
2018-09-06 10:18:13 -07:00
uronce-cc
43ed76944b Fix mean reward per episode after training Pong. (#562)
* Fix mean reward per episode after training Pong.

* Fix typo.
2018-09-05 15:06:29 -07:00
wangjksjtu
e92a6ad8f4 Update README.md (#537)
1. Delete repetitive section
2. Align the commands
2018-08-27 12:35:48 -07:00
HelgeS
92b9a37257 Updated example commands to run ppo2 (#534)
The headline mentions PPO, but the command was for A2C
2018-08-23 15:58:27 -07:00
pzhokhov
353bb15e90 deduplicate algorithms in rl-algs and baselines (#18)
* move vec_env

* cleaning up rl_common

* tests are passing (but mosts tests are deleted as moved to baselines)

* add benchmark runner for smoke tests

* removed duplicated algos

* route references to rl_algs.a2c to baselines.a2c

* route references to rl_algs.a2c to baselines.a2c

* unify conftest.py

* removing references to duplicated algs from codegen

* removing references to duplicated algs from codegen

* alex's changes to dummy_vec_env

* fixed test_carpole[deepq] testcase by decreasing number of training steps... alex's changes seemed to have fixed the bug and make it train better, but at seed=0 there is a dip in the training curve at 30k steps that fails the test

* codegen tests with atol=1e-6 seem to be unstable

* rl_common.vec_env -> baselines.common.vec_env mass replace

* fixed reference in trpo_mpi

* a2c.util references

* restored rl_algs.bench in sonic_prob

* fix reference in ci/runtests.sh

* simplifed expression in baselines/common/cmd_util

* further increased rtol to 1e-3 in codegen tests

* switched vecenvs to use SimpleImageViewer from gym instead of cv2

* make run.py --play option work with num_envs > 1

* make rosenbrock test reproducible

* git subrepo pull (merge) baselines

subrepo:
  subdir:   "baselines"
  merged:   "e23524a5"
upstream:
  origin:   "git@github.com:openai/baselines.git"
  branch:   "master"
  commit:   "bcde04e7"
git-subrepo:
  version:  "0.4.0"
  origin:   "git@github.com:ingydotnet/git-subrepo.git"
  commit:   "74339e8"

* updated baselines README (num-timesteps --> num_timesteps)

* typo in deepq/README.md
2018-08-17 13:54:11 -07:00
Peter Zhokhov
b222dd0610 updated links in README to point to master 2018-08-13 16:01:24 -07:00
pzhokhov
1870685071 Publish benchmark results (#502)
* updated benchmark pages with final rewards

* use htmlpreview to render pages

* use htmlpreview to render pages

* use htmlpreview to render pages

* updated README to reflect ppo1 being obsolete

* removed navbars from published benchmark pages

* fixed link in README
2018-08-13 15:59:43 -07:00
pzhokhov
8c2aea2add refactor a2c, acer, acktr, ppo2, deepq, and trpo_mpi (#490)
* exported rl-algs

* more stuff from rl-algs

* run slow tests

* re-exported rl_algs

* re-exported rl_algs - fixed problems with serialization test and test_cartpole

* replaced atari_arg_parser with common_arg_parser

* run.py can run algos from both baselines and rl_algs

* added approximate humanoid reward with ppo2 into the README for reference

* dummy commit to RUN BENCHMARKS

* dummy commit to RUN BENCHMARKS

* dummy commit to RUN BENCHMARKS

* dummy commit to RUN BENCHMARKS

* very dummy commit to RUN BENCHMARKS

* serialize variables as a dict, not as a list

* running_mean_std uses tensorflow variables

* fixed import in vec_normalize

* dummy commit to RUN BENCHMARKS

* dummy commit to RUN BENCHMARKS

* flake8 complaints

* save all variables to make sure we save the vec_normalize normalization

* benchmarks on ppo2 only RUN BENCHMARKS

* make_atari_env compatible with mpi

* run ppo_mpi benchmarks only RUN BENCHMARKS

* hardcode names of retro environments

* add defaults

* changed default ppo2 lr schedule to linear RUN BENCHMARKS

* non-tf normalization benchmark RUN BENCHMARKS

* use ncpu=1 for mujoco sessions - gives a bit of a performance speedup

* reverted running_mean_std to user property decorators for mean, var, count

* reverted VecNormalize to use RunningMeanStd (no tf)

* reverted VecNormalize to use RunningMeanStd (no tf)

* profiling wip

* use VecNormalize with regular RunningMeanStd

* added acer runner (missing import)

* flake8 complaints

* added a note in README about TfRunningMeanStd and serialization of VecNormalize

* dummy commit to RUN BENCHMARKS

* merged benchmarks branch
2018-08-13 09:56:44 -07:00
pzhokhov
24fe3d6576 Import internal repo (#409)
* import rl-algs from 2e3a166 commit

* extra import of the baselines badge

* exported commit with identity test

* proper rng seeding in the test_identity
2018-05-21 15:24:00 -07:00
pzhokhov
9cf95a0054 setup travis ci build (#388)
* simple .travis.yml file

* added static syntax checks of common to .travis.yml

* dockerizing the build

* fix Dockerfile, adding build shield

* cleaning up workdir in Dockerfile and .travis.yml

* .travis.yml fixed common -> baselines/common for style check
2018-05-03 09:43:28 -07:00
pzhokhov
2b0283b9db Readme.md detailed installation instructions (#377)
* changes to README.md files with more detailed installation instructions

* md-fying the changes better

* link on the word homebrew in readme.md

* typos in README.md

* README.md

* removed extra comma sign

* removed sudo from brew command
2018-04-25 17:40:48 -07:00
Matthias Plappert
b71152eea0 Adds support for Hindsight Experience Replay (HER) (#299)
* Add Hindsight Experience Replay (HER)

* Minor improvements
2018-02-26 17:40:16 +01:00
John Schulman
459f007bcc Merge pull request #260 from uidilr/master
Add GAIL
2018-01-25 20:54:20 -08:00
John Schulman
9fa8e1baf1 Lots of cleanups
Fixes for new gym version
Add @olegklimov and @unixpickle to authors list
2018-01-25 18:54:24 -08:00
Yusuke Nakata
d8cce2309f Add GAIL 2018-01-23 12:02:03 +09:00
John Schulman
2dd7d307d7 Add ACER, PPO2, and results_plotter.py 2017-11-16 10:02:32 -08:00
John Schulman
bb40378118 change atari preprocessing to use faster opencv
some logger changes
2017-10-25 09:21:29 -04:00
John Schulman
aa6e58bdf1 fix readmes 2017-08-27 22:22:14 -07:00
John Schulman
3f676f7d1e ACKTR + A2C 2017-08-18 09:25:39 -07:00
Matthias Plappert
882251878f Parameter space noise for DQN and DDPG (#75)
* Export param noise

* Update documentation

* Final finishing touches
2017-07-27 08:10:59 -07:00
Jonas Schneider
5dc00628fe readme fiddling 2017-07-20 09:00:24 -07:00
John Schulman
da99706046 ppo and trpo 2017-07-20 08:52:35 -07:00
cxx
5e73387494 Fix README since BreakOut pretrained model doesn't match the correct tensor shape. Therefore, Pong is used instead. 2017-06-16 15:38:42 +08:00
Tiago Carvalho
1f3c3e33e7 Update README.md 2017-05-31 12:14:28 +01:00
Olivier Moindrot
d2c51f5933 Correct path to script "download_model"
`python -m baselines.deepq.experiments.download_model` becomes `python -m baselines.deepq.experiments.atari.download_model`
2017-05-24 13:13:30 -07:00
Szymon Sidor
958810ed1e Initial commit 2017-05-24 02:34:20 -07:00