baselines

Author	SHA1	Message	Date
Peter Zhokhov	6b41b6b984	updated links in README and notebook	2018-11-07 16:23:32 -08:00
Peter Zhokhov	9705773eab	replaced vizualization doc with notebook	2018-11-07 16:18:47 -08:00
pzhokhov	7bb405c7a7	Update viz.md	2018-11-07 14:25:35 -08:00
pzhokhov	8b95576a92	more viz + build fixes (#703 ) * viz docs * writing vizualization docs * documenting plot_util * docstrings in plot_util * autopep8 and flake8 * spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc) * rephrased viz.md a little bit * more examples of viz code usage in the docs	2018-11-06 17:02:20 -08:00
Peter Zhokhov	db9563ebf6	more examples of viz code usage in the docs	2018-11-06 15:25:17 -08:00
Peter Zhokhov	b8bc0f8791	more options in plot_util + docs + freezing build fixes	2018-11-06 14:07:53 -08:00
Peter Zhokhov	9d4fb76ef0	making num_envs and video length smaller in test_video_recorder to prevent hanging on travis	2018-11-06 09:58:43 -08:00
Peter Zhokhov	664ec6faf0	catch bugfixes in gym	2018-11-05 19:19:39 -08:00
Peter Zhokhov	3917321fbe	revert over-spellchecking	2018-11-05 17:00:40 -08:00
coord.e	6e607efa90	Add video recorder (#666 ) * Fix: Return the result of rendering from dummyvecenv * Add: Add a video recorder wrapper for vecenv * Change: Use VecVideoRecorder with --video_monitor flag * Change: Overwrite the metadata only when it isn't defined * Add: Define __del__ to make the file correctly closed in exit * Fix: Bump epidode_id in reset() * Fix: Use hasattr to check the existence of .metadata * Fix: Make directory when it doesn't exist * Change: Kepp recording for `video_length` steps, then close Because reset() is not what it is in normal gym.Env * Add: Enable to specify video_length from command line argument * Delete: Delete default value, None, of video_callable * Change: Use self.recorded_frames and self.recording to manage intervals * Add: Log the status of video recording * Fix: Fix saving path * Change: Place metadata in the base VecEnv * Delete: Delete unused imports * Fix: epidode_id => step_id * Fix: Refine the flag name * Change: Unify the flag name folloing to previous change * [WIP] Add: Add a test of VecVideoRecorder * Fix: Use PongNoFrameskip-v0 because SimpleEnv doesn't have render() * Change; Use TemporaryDirectory * Fix: minimal successful test * Add: Test against parallel environments * Add: Test against different type of VecEnvs * Change: Test against different length and interval of video capture * Delete: Reduce the number of tests * Change: Test if the output video is not empty * Add: Add some comments * Fix: Fix the flag name * Add: Add docstrings * Fix: Install ffmpeg in testing container for VecVideoRecorder's test * Fix: Delete unused things * Fix: Replace `video_callable` with `record_video_trigger` * Fix: Improve the explanation of `record_video_trigger` argument * Fix: Close owning vecenv in VecVideoRecorder.close to resolve memory leak	2018-11-05 14:32:17 -08:00
pzhokhov	c74ce02b9d	visualization code docs / bugfixes (#701 ) * viz docs * writing vizualization docs * documenting plot_util * docstrings in plot_util * autopep8 and flake8 * spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc) * rephrased viz.md a little bit	2018-11-05 14:31:15 -08:00
Peter Zhokhov	fa199534c5	rephrased viz.md a little bit	2018-11-05 14:03:13 -08:00
Peter Zhokhov	09b42f8c26	spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)	2018-11-05 14:00:19 -08:00
Peter Zhokhov	06421877bf	autopep8 and flake8	2018-11-05 10:04:43 -08:00
Peter Zhokhov	527acf123f	docstrings in plot_util	2018-11-05 10:02:45 -08:00
Peter Zhokhov	1fc5e137b2	Merge branch 'master' of github.com:openai/baselines into peterz_viz	2018-10-31 12:03:25 -07:00
pzhokhov	ab59de6922	mpi-less baselines (#689 ) * make baselines run without mpi wip * squash-merged latest master * further removing MPI references where unnecessary * more MPI removal * syntax and flake8 * MpiAdam becomes regular Adam if Mpi not present * autopep8 * add assertion to test in mpi_adam; fix trpo_mpi failure without MPI on cartpole * mpiless ddpg	2018-10-31 11:15:41 -07:00
Mathieu Poliquin	a071fa7630	Add retro to ppo2 defaults (#682 ) * Adds retro to ppo2 defaults Created defaults for retro, copied from Atari defaults for now. Tested with SuperMarioBros-Nes * ppo2 retro defaults to atari	2018-10-30 10:17:46 -07:00
Mathieu Poliquin	637bf55da7	Use deepmind wrapper for retro (#685 ) * Use deepmind wrapper for retro * moved wrap_deepmind_retro after Monitor wrapper	2018-10-30 10:16:15 -07:00
AurelianTactics	165c622572	DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError (#680 ) * DDPG has unused 'seed' argument DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for: ``` from baselines.common import set_global_seeds ... def learn(...): ... set_global_seeds(seed) ``` DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds. * DDPG: duplicate variable assignment variable nb_actions assigned same value twice in space of 10 lines nb_actions = env.action_space.shape[-1] * DDPG: noise_type 'normal_x' and 'ou_x' cause assert noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block: ''' if self.action_noise is not None and apply_noise: noise = self.action_noise() assert noise.shape == action.shape action += noise ''' noise is not nested: [number_of_actions] actions is nested: [[number_of_actions]] Can either nest noise or unnest actions * Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert" * DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block: ''' if self.action_noise is not None and apply_noise: noise = self.action_noise() assert noise.shape == action.shape action += noise ''' noise is not nested: [number_of_actions] action is nested: [[number_of_actions]] Hence the shapes do not pass the assert line even though the action += noise line is correct	2018-10-30 10:13:39 -07:00
Peter Zhokhov	6c194a8b15	documenting plot_util	2018-10-30 09:45:51 -07:00
Peter Zhokhov	0d0701f594	writing vizualization docs	2018-10-29 16:15:42 -07:00
Peter Zhokhov	be433fdb83	viz docs	2018-10-29 15:53:50 -07:00
Peter Zhokhov	93c7cc202c	Merge branch 'master' of github.com:openai/baselines	2018-10-29 15:25:38 -07:00
Peter Zhokhov	de36116e3b	update tensorflow version check regex to parse version like 1.2.3rc4 (previously only 1.2.3-rc4)	2018-10-29 15:25:31 -07:00
Mathieu Poliquin	e2b41828af	Set 'cnn' as default network for retro (#683 )	2018-10-29 13:30:41 -07:00
pzhokhov	8e56ddeac2	Multidiscrete action space compatibility for policy gradient-based methods (#677 ) * multidiscrete space compatibility * flake8 and syntax	2018-10-24 11:01:59 -07:00
Juliano Laganá	c3bd8cea66	Adds description of param_noise parameter in deepq.learn method (#675 )	2018-10-24 10:00:31 -07:00
AurelianTactics	84ea7aa1fd	DDPG has unused 'seed' argument (#676 ) DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for: ``` from baselines.common import set_global_seeds ... def learn(...): ... set_global_seeds(seed) ``` DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.	2018-10-24 09:59:46 -07:00
Peter Zhokhov	88300ed54c	fix raise NotImplemented() complaints of latest flake8	2018-10-24 09:57:57 -07:00
pzhokhov	583ba082a2	Update cmd_util.py	2018-10-23 11:22:27 -07:00
pzhokhov	014a5597b1	refactor ACER (#664 ) * make acer use vecframestack * acer passes mnist test with 20k steps * acer with non-image observations and tests * flake8 * test acer serialization with non-recurrent policies	2018-10-23 10:01:25 -07:00
Isaac Poulton	4ed1350326	Fixed TypeError on creating atari vec envs (#671 )	2018-10-23 10:00:09 -07:00
Rishabh Jangir	8513d73355	HER : new functionality, enables demo based training (#474 ) * Add, initialize, normalize and sample from a demo buffer * Modify losses and add cloning loss * Add demo file parameter to train.py * Introduce new params in config.py for demo based training * Change logger.warning to logger.warn in rollout.py;bug * Add data generation file for Fetch environments * Update README file	2018-10-22 19:04:40 -07:00
Xingdong Zuo	c28acb2203	[Clean-up]: delete `running_stat` and `filters` as they are replaced by `running_mean_std` and not used anymore (#614 ) * Delete filters.py * Delete running_stat.py	2018-10-22 19:01:26 -07:00
pzhokhov	c5d9c4a1b2	wrap retro envs correctly for other (non-deepq) algorithms (#669 ) * wrap retro envs correctly for other (non-deepq) algorithms * flake and csh comments * flake and csh comments	2018-10-22 18:36:39 -07:00
pzhokhov	c0fa11a3a7	minor fixes from internal (#665 ) * sync internal changes. Make ddpg work with vecenvs * B -> nenvs for consistency with other algos, small cleanups * eval_done[d]==True -> eval_done[d] * flake8 and numpy.random.random_integers deprecation warning * Merge branch 'master' of github.com:openai/games into peterz_track_baselines_branch	2018-10-22 09:15:04 -07:00
Peter Zhokhov	bd390c2ade	updated docstring for deepq	2018-10-19 17:50:54 -07:00
pzhokhov	d0cc325e14	store session at policy creation time (#655 ) * sync internal changes. Make ddpg work with vecenvs * B -> nenvs for consistency with other algos, small cleanups * eval_done[d]==True -> eval_done[d] * flake8 and numpy.random.random_integers deprecation warning * store session at policy creation time * coexistence tests * fix a typo * autopep8 * ... and flake8 * updated todo links in test_serialization	2018-10-19 08:54:21 -07:00
pzhokhov	fc7f9cec49	disable gym subpackages in setup.py (#661 ) * disable gym subpackages in setup.py * include gym[atari] in test requirements * gym[atari] -> atari-py in test requirements	2018-10-18 16:07:14 -07:00
Matthew Rahtz	3677dc1b23	Set allow_growth=True for MuJoCo session (#643 )	2018-10-18 13:54:39 -07:00
Matthew Rahtz	ef96f3835b	Drop S and M args so that --play works (#636 )	2018-10-16 16:28:23 -07:00
pzhokhov	a03dacd68d	sync internal changes. Make ddpg work with vecenvs (#654 ) * sync internal changes. Make ddpg work with vecenvs * B -> nenvs for consistency with other algos, small cleanups * eval_done[d]==True -> eval_done[d] * flake8 and numpy.random.random_integers deprecation warning	2018-10-16 16:26:46 -07:00
Tianhong Dai	e57f81becc	revise the readme of ddpg (#653 )	2018-10-16 16:22:06 -07:00
Peter Zhokhov	28aca637d0	update benchmark results	2018-10-09 09:48:31 -07:00
Erik Doffagne	7bfbcf177e	Fixed typos in README (#635 )	2018-10-04 10:31:22 -07:00
pzhokhov	394339deb5	Update README.md	2018-10-03 20:53:58 -07:00
pzhokhov	10c205c159	Debug codegen ppo (#123 ) * disabled tests, running benchmarks only * dummy commit to RUN BENCHMARKS * benchmark ppo_metal; disable all but Bullet benchmarks * ppo2, codegen ppo and ppo_metal on Bullet RUN BENCHMARKS * run benchmarks on Roboschool instead RUN BENCHMARKS * run ppo_metal on Roboschool as well RUN BENCHMARKS * install roboschool in cron rcall user_config * dummy commit to RUN BENCHMARKS * import roboschool in codegen/contcontrol_prob.py RUN BENCHMARKS * re-enable tests, flake8 * get entropy from a distribution in Pred RUN BENCHMARKS * gin for hyperparameter injection; try codegen ppo close to baselines ppo RUN BENCHMARKS * provide default value for cg2/bmv_net_ops.py * dummy commit to RUN BENCHMARKS * make tests and benchmarks parallel; use relative path to gin file for rcall compatibility RUN BENCHMARKS * syntax error in run-benchmarks-new.py RUN BENCHMARKS * syntax error in run-benchmarks-new.py RUN BENCHMARKS * path relative to codegen/training for gin files RUN BENCHMARKS * another reconcilliation attempt between codegen ppo and baselines ppo RUN BENCHMARKS * value_network=copy for ppo2 on roboschool RUN BENCHMARKS * make None seed work with torch seeding RUN BENCHMARKS * try sequential batches with ppo2 RUN BENCHMARKS * try ppo without advantage normalization RUN BENCHMARKS * use Distribution to compute ema NLL RUN BENCHMARKS * autopep8 * clip gradient norm in algo_agent RUN BENCHMARKS * try ppo2 without vfloss clipping RUN BENCHMARKS * trying with gamma=0.0 - assumption is, both algos should be equally bad RUN BENCHMARKS * set gamma=0 in ppo2 RUN BENCHMARKS * try with ppo2 with single minibatch RUN BENCHMARKS * try with nminibatches=4, value_network=copy RUN BENCHMARKS * try with nminibatches=1 take two RUN BENCHMARKS * try initialization for vf=0.01 RUN BENCHMARKS * fix the problem with min_istart >= max_istart * i have no idea RUN BENCHMARKS * fix non-shared variance between old and new RUN BENCHMARKS * restored baselines.common.policies * 16 minibatches in ppo_roboschool.gin * fixing results of merge * cleanups * cleanups * fix run-benchmarks-new RUN BENCHMARKS Roboschool8M * fix syntax in run-benchmarks-new RUN BENCHMARKS Roboschool8M * fix test failures * moved gin requirement to codegen/setup.py * remove duplicated build_softq in get_algo.py * linting * run softq on continuous action spaces RUN BENCHMARKS Roboschool8M	2018-10-03 14:38:32 -07:00
pzhokhov	62fe7c4717	disable async acktr (#129 ) * disable async acktr * linting * linting * linting	2018-10-03 14:38:32 -07:00
Xingyou Song	fbdf55ffee	Xsong lqr ddpg (#125 ) * allows vec_envs to work * allows vec_envs to work * fixed branch with correct ddpg * running experiments jointly now * changed to subproc * changed to subproc * changed to subproc * small fix md * removed placeholder * removed placeholder * added ppotest * probably fixed ddpg hyperparam issues * checkpoint * edited readme * added orthogonal * added orthogonal * added ddpg-vecenv * reverted ddpg to old baselines	2018-10-03 14:38:32 -07:00

1 2 3 4 5 ...

273 Commits