baselines

Author	SHA1	Message	Date
pzhokhov	a07fad9066	change rms 2 tfrms switch in vec_normalize to be more explicit (#886 ) * change rms 2 tfrms switch in vec_normalize to be more explicit * modify the vec_normalize / use_tf logic a little bit * typo * use_tf = False by default	2019-04-26 16:14:21 -07:00
Darío Hereñú	b1644157d6	Fixed typo on #092 (#824 )	2019-04-01 15:41:52 -07:00
Christopher Hesse	8607dca99e	Update README.md	2018-11-21 14:57:10 -08:00
pzhokhov	c14d307834	move viz docs to a notebook entirely (#704 ) * viz docs * writing vizualization docs * documenting plot_util * docstrings in plot_util * autopep8 and flake8 * spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc) * rephrased viz.md a little bit * more examples of viz code usage in the docs * replaced vizualization doc with notebook	2018-11-07 17:19:42 -08:00
pzhokhov	c74ce02b9d	visualization code docs / bugfixes (#701 ) * viz docs * writing vizualization docs * documenting plot_util * docstrings in plot_util * autopep8 and flake8 * spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc) * rephrased viz.md a little bit	2018-11-05 14:31:15 -08:00
Erik Doffagne	7bfbcf177e	Fixed typos in README (#635 )	2018-10-04 10:31:22 -07:00
pzhokhov	394339deb5	Update README.md	2018-10-03 20:53:58 -07:00
Peter Zhokhov	34ae3194b4	add a note about DQN algorithms not performing well	2018-09-27 12:51:43 -07:00
pzhokhov	0e7048b89f	Update README.md	2018-09-19 15:04:54 -07:00
pzhokhov	75983bab64	Update README.md	2018-09-19 15:04:01 -07:00
Peter Zhokhov	5c62f5c7dd	added peterz to baselines authorlist	2018-09-11 12:44:51 -07:00
Daniel Angelov	58b1021b28	Add tensorboard start command for convenience (#569 )	2018-09-07 17:04:02 -07:00
Peter Zhokhov	be9118bcd8	git subrepo pull (merge) baselines subrepo: subdir: "baselines" merged: "f2a9b8f2" upstream: origin: "git@github.com:openai/baselines.git" branch: "master" commit: "cc4215ef" git-subrepo: version: "0.4.0" origin: "git@github.com:ingydotnet/git-subrepo.git" commit: "74339e8"	2018-09-06 10:18:13 -07:00
pzhokhov	02a5e7aed5	fixes to readme and baselines/run.py (#80 ) * fixes to readme and baselines/run.py * polish installation section of baselines README * polish installation section of baselines README	2018-09-06 10:18:13 -07:00
uronce-cc	43ed76944b	Fix mean reward per episode after training Pong. (#562 ) * Fix mean reward per episode after training Pong. * Fix typo.	2018-09-05 15:06:29 -07:00
wangjksjtu	e92a6ad8f4	Update README.md (#537 ) １. Delete repetitive section 2. Align the commands	2018-08-27 12:35:48 -07:00
HelgeS	92b9a37257	Updated example commands to run ppo2 (#534 ) The headline mentions PPO, but the command was for A2C	2018-08-23 15:58:27 -07:00
pzhokhov	353bb15e90	deduplicate algorithms in rl-algs and baselines (#18 ) * move vec_env * cleaning up rl_common * tests are passing (but mosts tests are deleted as moved to baselines) * add benchmark runner for smoke tests * removed duplicated algos * route references to rl_algs.a2c to baselines.a2c * route references to rl_algs.a2c to baselines.a2c * unify conftest.py * removing references to duplicated algs from codegen * removing references to duplicated algs from codegen * alex's changes to dummy_vec_env * fixed test_carpole[deepq] testcase by decreasing number of training steps... alex's changes seemed to have fixed the bug and make it train better, but at seed=0 there is a dip in the training curve at 30k steps that fails the test * codegen tests with atol=1e-6 seem to be unstable * rl_common.vec_env -> baselines.common.vec_env mass replace * fixed reference in trpo_mpi * a2c.util references * restored rl_algs.bench in sonic_prob * fix reference in ci/runtests.sh * simplifed expression in baselines/common/cmd_util * further increased rtol to 1e-3 in codegen tests * switched vecenvs to use SimpleImageViewer from gym instead of cv2 * make run.py --play option work with num_envs > 1 * make rosenbrock test reproducible * git subrepo pull (merge) baselines subrepo: subdir: "baselines" merged: "e23524a5" upstream: origin: "git@github.com:openai/baselines.git" branch: "master" commit: "bcde04e7" git-subrepo: version: "0.4.0" origin: "git@github.com:ingydotnet/git-subrepo.git" commit: "74339e8" * updated baselines README (num-timesteps --> num_timesteps) * typo in deepq/README.md	2018-08-17 13:54:11 -07:00
Peter Zhokhov	b222dd0610	updated links in README to point to master	2018-08-13 16:01:24 -07:00
pzhokhov	1870685071	Publish benchmark results (#502 ) * updated benchmark pages with final rewards * use htmlpreview to render pages * use htmlpreview to render pages * use htmlpreview to render pages * updated README to reflect ppo1 being obsolete * removed navbars from published benchmark pages * fixed link in README	2018-08-13 15:59:43 -07:00
pzhokhov	8c2aea2add	refactor a2c, acer, acktr, ppo2, deepq, and trpo_mpi (#490 ) * exported rl-algs * more stuff from rl-algs * run slow tests * re-exported rl_algs * re-exported rl_algs - fixed problems with serialization test and test_cartpole * replaced atari_arg_parser with common_arg_parser * run.py can run algos from both baselines and rl_algs * added approximate humanoid reward with ppo2 into the README for reference * dummy commit to RUN BENCHMARKS * dummy commit to RUN BENCHMARKS * dummy commit to RUN BENCHMARKS * dummy commit to RUN BENCHMARKS * very dummy commit to RUN BENCHMARKS * serialize variables as a dict, not as a list * running_mean_std uses tensorflow variables * fixed import in vec_normalize * dummy commit to RUN BENCHMARKS * dummy commit to RUN BENCHMARKS * flake8 complaints * save all variables to make sure we save the vec_normalize normalization * benchmarks on ppo2 only RUN BENCHMARKS * make_atari_env compatible with mpi * run ppo_mpi benchmarks only RUN BENCHMARKS * hardcode names of retro environments * add defaults * changed default ppo2 lr schedule to linear RUN BENCHMARKS * non-tf normalization benchmark RUN BENCHMARKS * use ncpu=1 for mujoco sessions - gives a bit of a performance speedup * reverted running_mean_std to user property decorators for mean, var, count * reverted VecNormalize to use RunningMeanStd (no tf) * reverted VecNormalize to use RunningMeanStd (no tf) * profiling wip * use VecNormalize with regular RunningMeanStd * added acer runner (missing import) * flake8 complaints * added a note in README about TfRunningMeanStd and serialization of VecNormalize * dummy commit to RUN BENCHMARKS * merged benchmarks branch	2018-08-13 09:56:44 -07:00
pzhokhov	24fe3d6576	Import internal repo (#409 ) * import rl-algs from 2e3a166 commit * extra import of the baselines badge * exported commit with identity test * proper rng seeding in the test_identity	2018-05-21 15:24:00 -07:00
pzhokhov	9cf95a0054	setup travis ci build (#388 ) * simple .travis.yml file * added static syntax checks of common to .travis.yml * dockerizing the build * fix Dockerfile, adding build shield * cleaning up workdir in Dockerfile and .travis.yml * .travis.yml fixed common -> baselines/common for style check	2018-05-03 09:43:28 -07:00
pzhokhov	2b0283b9db	Readme.md detailed installation instructions (#377 ) * changes to README.md files with more detailed installation instructions * md-fying the changes better * link on the word homebrew in readme.md * typos in README.md * README.md * removed extra comma sign * removed sudo from brew command	2018-04-25 17:40:48 -07:00
Matthias Plappert	b71152eea0	Adds support for Hindsight Experience Replay (HER) (#299 ) * Add Hindsight Experience Replay (HER) * Minor improvements	2018-02-26 17:40:16 +01:00
John Schulman	459f007bcc	Merge pull request #260 from uidilr/master Add GAIL	2018-01-25 20:54:20 -08:00
John Schulman	9fa8e1baf1	Lots of cleanups Fixes for new gym version Add @olegklimov and @unixpickle to authors list	2018-01-25 18:54:24 -08:00
Yusuke Nakata	d8cce2309f	Add GAIL	2018-01-23 12:02:03 +09:00
John Schulman	2dd7d307d7	Add ACER, PPO2, and results_plotter.py	2017-11-16 10:02:32 -08:00
John Schulman	bb40378118	change atari preprocessing to use faster opencv some logger changes	2017-10-25 09:21:29 -04:00
John Schulman	aa6e58bdf1	fix readmes	2017-08-27 22:22:14 -07:00
John Schulman	3f676f7d1e	ACKTR + A2C	2017-08-18 09:25:39 -07:00
Matthias Plappert	882251878f	Parameter space noise for DQN and DDPG (#75 ) * Export param noise * Update documentation * Final finishing touches	2017-07-27 08:10:59 -07:00
Jonas Schneider	5dc00628fe	readme fiddling	2017-07-20 09:00:24 -07:00
John Schulman	da99706046	ppo and trpo	2017-07-20 08:52:35 -07:00
cxx	5e73387494	Fix README since BreakOut pretrained model doesn't match the correct tensor shape. Therefore, Pong is used instead.	2017-06-16 15:38:42 +08:00
Tiago Carvalho	1f3c3e33e7	Update README.md	2017-05-31 12:14:28 +01:00
Olivier Moindrot	d2c51f5933	Correct path to script "download_model" `python -m baselines.deepq.experiments.download_model` becomes `python -m baselines.deepq.experiments.atari.download_model`	2017-05-24 13:13:30 -07:00
Szymon Sidor	958810ed1e	Initial commit	2017-05-24 02:34:20 -07:00

39 Commits