baselines

Author	SHA1	Message	Date
Peter Zhokhov	1fc5e137b2	Merge branch 'master' of github.com:openai/baselines into peterz_viz	2018-10-31 12:03:25 -07:00
pzhokhov	ab59de6922	mpi-less baselines (#689 ) * make baselines run without mpi wip * squash-merged latest master * further removing MPI references where unnecessary * more MPI removal * syntax and flake8 * MpiAdam becomes regular Adam if Mpi not present * autopep8 * add assertion to test in mpi_adam; fix trpo_mpi failure without MPI on cartpole * mpiless ddpg	2018-10-31 11:15:41 -07:00
Mathieu Poliquin	a071fa7630	Add retro to ppo2 defaults (#682 ) * Adds retro to ppo2 defaults Created defaults for retro, copied from Atari defaults for now. Tested with SuperMarioBros-Nes * ppo2 retro defaults to atari	2018-10-30 10:17:46 -07:00
Mathieu Poliquin	637bf55da7	Use deepmind wrapper for retro (#685 ) * Use deepmind wrapper for retro * moved wrap_deepmind_retro after Monitor wrapper	2018-10-30 10:16:15 -07:00
AurelianTactics	165c622572	DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError (#680 ) * DDPG has unused 'seed' argument DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for: ``` from baselines.common import set_global_seeds ... def learn(...): ... set_global_seeds(seed) ``` DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds. * DDPG: duplicate variable assignment variable nb_actions assigned same value twice in space of 10 lines nb_actions = env.action_space.shape[-1] * DDPG: noise_type 'normal_x' and 'ou_x' cause assert noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block: ''' if self.action_noise is not None and apply_noise: noise = self.action_noise() assert noise.shape == action.shape action += noise ''' noise is not nested: [number_of_actions] actions is nested: [[number_of_actions]] Can either nest noise or unnest actions * Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert" * DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block: ''' if self.action_noise is not None and apply_noise: noise = self.action_noise() assert noise.shape == action.shape action += noise ''' noise is not nested: [number_of_actions] action is nested: [[number_of_actions]] Hence the shapes do not pass the assert line even though the action += noise line is correct	2018-10-30 10:13:39 -07:00
Peter Zhokhov	6c194a8b15	documenting plot_util	2018-10-30 09:45:51 -07:00
Peter Zhokhov	0d0701f594	writing vizualization docs	2018-10-29 16:15:42 -07:00
Peter Zhokhov	be433fdb83	viz docs	2018-10-29 15:53:50 -07:00
Peter Zhokhov	93c7cc202c	Merge branch 'master' of github.com:openai/baselines	2018-10-29 15:25:38 -07:00
Peter Zhokhov	de36116e3b	update tensorflow version check regex to parse version like 1.2.3rc4 (previously only 1.2.3-rc4)	2018-10-29 15:25:31 -07:00
Mathieu Poliquin	e2b41828af	Set 'cnn' as default network for retro (#683 )	2018-10-29 13:30:41 -07:00
pzhokhov	8e56ddeac2	Multidiscrete action space compatibility for policy gradient-based methods (#677 ) * multidiscrete space compatibility * flake8 and syntax	2018-10-24 11:01:59 -07:00
Juliano Laganá	c3bd8cea66	Adds description of param_noise parameter in deepq.learn method (#675 )	2018-10-24 10:00:31 -07:00
AurelianTactics	84ea7aa1fd	DDPG has unused 'seed' argument (#676 ) DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for: ``` from baselines.common import set_global_seeds ... def learn(...): ... set_global_seeds(seed) ``` DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.	2018-10-24 09:59:46 -07:00
Peter Zhokhov	88300ed54c	fix raise NotImplemented() complaints of latest flake8	2018-10-24 09:57:57 -07:00
pzhokhov	583ba082a2	Update cmd_util.py	2018-10-23 11:22:27 -07:00
pzhokhov	014a5597b1	refactor ACER (#664 ) * make acer use vecframestack * acer passes mnist test with 20k steps * acer with non-image observations and tests * flake8 * test acer serialization with non-recurrent policies	2018-10-23 10:01:25 -07:00
Isaac Poulton	4ed1350326	Fixed TypeError on creating atari vec envs (#671 )	2018-10-23 10:00:09 -07:00
Rishabh Jangir	8513d73355	HER : new functionality, enables demo based training (#474 ) * Add, initialize, normalize and sample from a demo buffer * Modify losses and add cloning loss * Add demo file parameter to train.py * Introduce new params in config.py for demo based training * Change logger.warning to logger.warn in rollout.py;bug * Add data generation file for Fetch environments * Update README file	2018-10-22 19:04:40 -07:00
Xingdong Zuo	c28acb2203	[Clean-up]: delete `running_stat` and `filters` as they are replaced by `running_mean_std` and not used anymore (#614 ) * Delete filters.py * Delete running_stat.py	2018-10-22 19:01:26 -07:00
pzhokhov	c5d9c4a1b2	wrap retro envs correctly for other (non-deepq) algorithms (#669 ) * wrap retro envs correctly for other (non-deepq) algorithms * flake and csh comments * flake and csh comments	2018-10-22 18:36:39 -07:00
pzhokhov	c0fa11a3a7	minor fixes from internal (#665 ) * sync internal changes. Make ddpg work with vecenvs * B -> nenvs for consistency with other algos, small cleanups * eval_done[d]==True -> eval_done[d] * flake8 and numpy.random.random_integers deprecation warning * Merge branch 'master' of github.com:openai/games into peterz_track_baselines_branch	2018-10-22 09:15:04 -07:00
Peter Zhokhov	bd390c2ade	updated docstring for deepq	2018-10-19 17:50:54 -07:00
pzhokhov	d0cc325e14	store session at policy creation time (#655 ) * sync internal changes. Make ddpg work with vecenvs * B -> nenvs for consistency with other algos, small cleanups * eval_done[d]==True -> eval_done[d] * flake8 and numpy.random.random_integers deprecation warning * store session at policy creation time * coexistence tests * fix a typo * autopep8 * ... and flake8 * updated todo links in test_serialization	2018-10-19 08:54:21 -07:00
pzhokhov	fc7f9cec49	disable gym subpackages in setup.py (#661 ) * disable gym subpackages in setup.py * include gym[atari] in test requirements * gym[atari] -> atari-py in test requirements	2018-10-18 16:07:14 -07:00
Matthew Rahtz	3677dc1b23	Set allow_growth=True for MuJoCo session (#643 )	2018-10-18 13:54:39 -07:00
Matthew Rahtz	ef96f3835b	Drop S and M args so that --play works (#636 )	2018-10-16 16:28:23 -07:00
pzhokhov	a03dacd68d	sync internal changes. Make ddpg work with vecenvs (#654 ) * sync internal changes. Make ddpg work with vecenvs * B -> nenvs for consistency with other algos, small cleanups * eval_done[d]==True -> eval_done[d] * flake8 and numpy.random.random_integers deprecation warning	2018-10-16 16:26:46 -07:00
Tianhong Dai	e57f81becc	revise the readme of ddpg (#653 )	2018-10-16 16:22:06 -07:00
Peter Zhokhov	28aca637d0	update benchmark results	2018-10-09 09:48:31 -07:00
Erik Doffagne	7bfbcf177e	Fixed typos in README (#635 )	2018-10-04 10:31:22 -07:00
pzhokhov	394339deb5	Update README.md	2018-10-03 20:53:58 -07:00
pzhokhov	10c205c159	Debug codegen ppo (#123 ) * disabled tests, running benchmarks only * dummy commit to RUN BENCHMARKS * benchmark ppo_metal; disable all but Bullet benchmarks * ppo2, codegen ppo and ppo_metal on Bullet RUN BENCHMARKS * run benchmarks on Roboschool instead RUN BENCHMARKS * run ppo_metal on Roboschool as well RUN BENCHMARKS * install roboschool in cron rcall user_config * dummy commit to RUN BENCHMARKS * import roboschool in codegen/contcontrol_prob.py RUN BENCHMARKS * re-enable tests, flake8 * get entropy from a distribution in Pred RUN BENCHMARKS * gin for hyperparameter injection; try codegen ppo close to baselines ppo RUN BENCHMARKS * provide default value for cg2/bmv_net_ops.py * dummy commit to RUN BENCHMARKS * make tests and benchmarks parallel; use relative path to gin file for rcall compatibility RUN BENCHMARKS * syntax error in run-benchmarks-new.py RUN BENCHMARKS * syntax error in run-benchmarks-new.py RUN BENCHMARKS * path relative to codegen/training for gin files RUN BENCHMARKS * another reconcilliation attempt between codegen ppo and baselines ppo RUN BENCHMARKS * value_network=copy for ppo2 on roboschool RUN BENCHMARKS * make None seed work with torch seeding RUN BENCHMARKS * try sequential batches with ppo2 RUN BENCHMARKS * try ppo without advantage normalization RUN BENCHMARKS * use Distribution to compute ema NLL RUN BENCHMARKS * autopep8 * clip gradient norm in algo_agent RUN BENCHMARKS * try ppo2 without vfloss clipping RUN BENCHMARKS * trying with gamma=0.0 - assumption is, both algos should be equally bad RUN BENCHMARKS * set gamma=0 in ppo2 RUN BENCHMARKS * try with ppo2 with single minibatch RUN BENCHMARKS * try with nminibatches=4, value_network=copy RUN BENCHMARKS * try with nminibatches=1 take two RUN BENCHMARKS * try initialization for vf=0.01 RUN BENCHMARKS * fix the problem with min_istart >= max_istart * i have no idea RUN BENCHMARKS * fix non-shared variance between old and new RUN BENCHMARKS * restored baselines.common.policies * 16 minibatches in ppo_roboschool.gin * fixing results of merge * cleanups * cleanups * fix run-benchmarks-new RUN BENCHMARKS Roboschool8M * fix syntax in run-benchmarks-new RUN BENCHMARKS Roboschool8M * fix test failures * moved gin requirement to codegen/setup.py * remove duplicated build_softq in get_algo.py * linting * run softq on continuous action spaces RUN BENCHMARKS Roboschool8M	2018-10-03 14:38:32 -07:00
pzhokhov	62fe7c4717	disable async acktr (#129 ) * disable async acktr * linting * linting * linting	2018-10-03 14:38:32 -07:00
Xingyou Song	fbdf55ffee	Xsong lqr ddpg (#125 ) * allows vec_envs to work * allows vec_envs to work * fixed branch with correct ddpg * running experiments jointly now * changed to subproc * changed to subproc * changed to subproc * small fix md * removed placeholder * removed placeholder * added ppotest * probably fixed ddpg hyperparam issues * checkpoint * edited readme * added orthogonal * added orthogonal * added ddpg-vecenv * reverted ddpg to old baselines	2018-10-03 14:38:32 -07:00
Christopher Hesse	9ee804c384	minor change to install.py and baselines run.py (#121 )	2018-10-03 14:38:32 -07:00
John Schulman	4cf7dc9644	Big refactor (#124 ) * massive revision inspired by soup: algo folder works * porting rl commands, WIP * various * git subrepo push --remote=git@github.com:openai/codegen.git --branch=refactor codegen subrepo: subdir: "codegen" merged: "aa27e069" upstream: origin: "git@github.com:openai/codegen.git" branch: "refactor" commit: "aa27e069" git-subrepo: version: "0.4.0" origin: "git@github.com:ingydotnet/git-subrepo.git" commit: "74339e8" * various * rewrite RL stuff in new framework * fix almost everything * woohoo tests pass * more tests * reformatting * fixes * write tests for embeddings * re-remove cg2 * pylint * minor * move smooth_helpers import; seems to cause nondeterministic failure in parallel pytest	2018-10-03 14:38:32 -07:00
Xingyou Song	e820b86fdc	ppo2 now has eval stats (#120 ) * ppo2 now has eval stats * fixed spaces * fixed kwargs ordering * whitespace fix	2018-10-03 14:38:32 -07:00
pzhokhov	858afa8d7e	Refactor DDPG (#111 ) * run ddpg on Mujoco benchmark RUN BENCHMARKS * autopep8 * fixed all syntax in refactored ddpg * a little bit more refactoring * autopep8 * identity test with ddpg WIP * enable test_identity with ddpg * refactored ddpg RUN BENCHMARKS * autopep8 * include ddpg into style check * fixing tests RUN BENCHMARKS * set default seed to None RUN BENCHMARKS * run tests and benchmarks in separate buildkite steps RUN BENCHMARKS * cleanup pdb usage * flake8 and cleanups * re-enabled all benchmarks in run-benchmarks-new.py * flake8 complaints * deepq model builder compatible with network functions returning single tensor * remove ddpg test with test_discrete_identity * make ppo_metal use make_vec_env instead of make_atari_env * make ppo_metal use make_vec_env instead of make_atari_env * fixed syntax in ppo_metal.run_atari	2018-10-03 14:38:32 -07:00
pzhokhov	4121d9c1a8	fix DQN learning bug (#632 ) * Update run.py * Update utils.py * Update utils.py	2018-10-03 14:37:40 -07:00
Peter Zhokhov	34ae3194b4	add a note about DQN algorithms not performing well	2018-09-27 12:51:43 -07:00
Thomas Simonini	4402b8eba6	Updated A2C and PPO2 comments (#612 ) * Updated A2C and PPO2 comments * Fixed format errors to respect PEP 8 style guide	2018-09-24 09:54:41 -07:00
ahuhn	555a5cbbb2	Adding num_env to readme example (#609 ) * Adding num_env to readme example * Updated readme example fix	2018-09-21 17:22:56 -07:00
Thomas Simonini	8158f35611	Wrote some comments to explain the A2C and PPO2 implementation (#607 ) * added comments in A2C and PPO2 * Fixed format errors to respect PEP 8 style guide	2018-09-21 13:12:31 -07:00
cclauss	a7fd8a4477	Run flake8 to find syntax errors and undefined names (#439 ) __E901,E999,F821,F822,F823__ are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. The other flake8 issues are merely "style violations" -- useful for readability but they do not effect runtime safety. This PR therefore recommends a flake8 run of those tests on the entire codebase. * F821: undefined name `name` * F822: undefined name `name` in `__all__` * F823: local variable `name` referenced before assignment * E901: SyntaxError or IndentationError * E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree	2018-09-20 16:40:03 -07:00
John Schulman	e791565a60	Codegen more abstract abstract classes 3a (#106 ) * Soup code, arch search on CIFAR-10 * Oh I understood how choice_sequence() worked * Undo some pointless changes * Some beautification 1 * Some beautification 2 * An attempt to debug test_get_algo_outputs() number 70, unsuccessful. * Code style warning * Code style warnings, more * wip * wip * wip * fix almost everything; soup machine still broken * revert mpi_eda changes * minor fixes	2018-09-20 16:19:07 -07:00
XFFXFF	7859f603cd	prioritized experience replay bug (#527 )	2018-09-20 16:16:44 -07:00
pzhokhov	0f4ae2fb2a	refactor acktr (#560 ) * refactor acktr * setup.cfg now tests style/syntax in acktr as well * flake8 complaints * added note about continuous action spaces for acktr into the README.md	2018-09-20 16:05:26 -07:00
pzhokhov	0e7048b89f	Update README.md	2018-09-19 15:04:54 -07:00
pzhokhov	75983bab64	Update README.md	2018-09-19 15:04:01 -07:00

1 2 3 4 5 ...

258 Commits