baselines

Author	SHA1	Message	Date
Peter Zhokhov	75200671c4	fix tests - add matplotlib to setup_requires, put mpi4py import in try-except	2019-05-03 16:29:10 -07:00
Peter Zhokhov	46fa1b6453	merge master	2019-05-03 15:57:31 -07:00
Peter Zhokhov	a1a9bd6174	Merge branch 'internal' of github.com:openai/baselines into internal	2019-05-03 15:56:04 -07:00
John Schulman	ef7ac116cb	(onp, np) -> (np, jp), switch jax code to use mark_slow decorator (#363 ) switch to mark_slow decorator	2019-05-03 15:54:27 -07:00
pzhokhov	1fa6ac38f1	JRL PPO test with delayed identity env (#355 ) * add a custom delay to identity_env * min reward 0.8 in delayed identity test * seed the tests, perfect score on delayed_identity_test * delay=1 in delayed_identity_test * flake8 complaints * increased number of steps in fixed_seq_test * seed identity tests to ensure reproducibility * docstrings	2019-05-03 15:54:26 -07:00
Karl Cobbe	07536451ee	Procgen fixes (#352 ) * tweak * documentation * rely on log_comm, remove mpi averaging from wrappers * pass comm for ppo2 initialization * ppo2 logging * experiment tweaks * auto launch tensorboard when using local backend * graph tweaks * pass caller to config * configure logger and tensorboard * make parent dir if necessary * parentdir tweak	2019-05-03 15:54:26 -07:00
Greg Brockman	64dfabb8eb	Add initializer for process-level setup in SubprocVecEnv (#276 ) * Add initializer for process-level setup in SubprocVecEnv Use case: run logger.configure() in each subprocess * Add option to force dummy vec env	2019-05-03 15:54:26 -07:00
John Schulman	f5daca8c22	delete unnecessary stuff (#338 )	2019-05-03 15:54:25 -07:00
pzhokhov	8e0282ee94	ci/runtests.sh - pass all folders to pytest (#342 ) * ci/runtests.sh - pass all folders to pytest * mpi_optimizer_test precision 1e-4 * fixes to tests * search for tests in the entire jax folder, also remove unnecessary humor	2019-05-03 15:54:25 -07:00
Karl Cobbe	ddcab1606d	Procgen Benchmark Updates (#328 ) * directory cleanup * logging, num_experiments * fixes * cleanup * gin fixes * fix local max gpu * resid nx * tweak * num machines and download params * rename * cleanup * create workbench * more reorg * fix * more logging wrappers * lint fix * restore train procgen * restore train procgen * pylint fix * better wrapping * whackamole walls * config sweep * tweak * args sweep * tweak * test workers * mpi_weight * train test comm and high difficulty fix * enjoy show returns * better joint training * tweak * Add —update to args and add gin-config to requirements.txt * add username to download_file * removing gin, procgen_parser * removing gin * procgen args * config fixes * cleanup * cleanup * procgen args fix * fix * rcall syncing * lint * rename mpi_weight * begin composable game * more composable game * tweak * background alpha * use username for sync * fixes * microbatch fix * lure composable game * merge * proc trans update * proc trans update (#307) * finetuning experiment * Change is_local to use `use_rcall` and fix error of `enjoy.py` with multiple ends * graphing help * add --local * change args_dict['env_name'] to ENV_NAME * finetune experiments * tweak * tweak * reorg wrappers, remove is_local * workdir/local fixes * move finetune experiments * default dir and graphing * more graphing * fix * pooled syncing * tweaks * dir fix * tweak * wrapper mpi fix * wind and turrets * composability cleanup * radius cleanup * composable reorg * laser gates * composable tweaks * soft walls * tweak * begin swamp * more swamp * more swamp * fix * hidden mines * use maze layout * tweak * laser gate tweaks * tweaks * tweaks * lure/propel updates * composable midnight * composable coinmaze * composability difficulty * tweak * add step to save_params * composable offsets * composable boxpush * composable combiner * tweak * tweak * always choose correct number of mechanics * fix * rcall local fix * add steps when dump and save parmas * loading rank 1,2,3.. error fix * add experiments.py * fix loading latest weight with no -rest * support more complex run_id and add more examples * fix typo * move post_run_id into experiments.py * add hp_search example * error fix * joint experiments in progress * joint hp finished * typo * error fix * edit experiments * Save experiments set up in code and save weights per step (#319) * add step to save_params * add steps when dump and save parmas * loading rank 1,2,3.. error fix * add experiments.py * fix loading latest weight with no -rest * support more complex run_id and add more examples * fix typo * move post_run_id into experiments.py * add hp_search example * error fix * joint experiments in progress * joint hp finished * typo * error fix * edit experiments * tweaks * graph exp WIP * depth tweaks * move save_all * fix * restore_dir name * restore depth * choose max mechanics * use override mode * tweak frogger * lstm default * fix * patience is composable * hunter is composable * fixed asset seed cleanup * minesweeper is composable * eggcatch is composable * tweak * applesort is composable * chaser game * begin lighter * lighter game * tractor game * boxgather game * plumber game * hitcher game * doorbell game * lawnmower game * connecter game * cannonaim * outrun game * encircle game * spinner game * tweak * tweak * detonator game * driller * driller * mixer * conveyor * conveyor game * joint pcg experiments * fixes * pcg sweep experiment * cannonaim fix * combiner fix * store save time * laseraim fix * lightup fix * detonator tweaks * detonator fixes * driller fix * lawnmower calibration * spinner calibration * propel fix * train experiment * print load time * system independent hashing * remove gin configurable * task ids fix * test_pcg experiment * connecter dense reward * hard_pcg * num train comms * mpi splits envs * tweaks * tweaks * graph tweaks * graph tweaks * lint fix * fix tests * load bugfix * difficulty timeout tweak * tweaks * more graphing * graph tweaks * tweak * download file fix * pcg train envs list * cleanup * tweak * manually name impala layers * tweak * expect fps * backend arg * args tweak * workbench cleanup * move graph files * workbench cleanup * split env name by comma * workbench cleanup * ema graph * remove Dict * use tf.io.gfile * comments for auto-killing jobs * lint fix * write latest file when not saving all and load it when step=None	2019-05-03 15:54:24 -07:00
Christopher Hesse	bc4eef6053	fix tests (#335 )	2019-05-03 15:54:24 -07:00
John Schulman	967fc8c37f	Fixed sequence env minor (#333 ) minor changes to FixedSequenceEnv to allow full score	2019-05-03 15:54:24 -07:00
pzhokhov	a93dde3b2b	extra functionality in baselines.common.plot_util (#310 ) * get plot_util from mt_experiments branch * add labels * unit tests for plot_util	2019-05-03 15:54:23 -07:00
John Schulman	b83a66527d	Add jrl19 as backend for workbench (#324 ) enable jrl in workbench minor logger changes	2019-05-03 15:54:23 -07:00
John Schulman	07cbf1e26a	Grad clipping in MpiAdamOptimizer, transformer changes (#304 ) * transformer mnist experiments * version that only builds one model * work on inverted mnist * Add grad clipping to MpiAdamOptimizer * various * transformer changes, loading * get rid of soft labels * transformer baseline * minor * experiments involving all possible training sets * vary training * minor * get ready for fine-tuning expers * lint * minor	2019-05-03 15:54:23 -07:00
Karl Cobbe	5082e5d34b	Workbench (#303 ) * begin workbench * cleanup * begin procgen config integration * arg tweaks * more args * parameter saving * begin procgen enjoy * tweaks * more workbench * more args sync/restore * cleanup * merge in master * rework args priority * more workbench * more loggign * impala cnn * impala lstm * tweak * tweaks * rl19 time logging * misc fixes * faster pipeline * update local.py * sess and log config tweaks * num processes * logging tweaks * difficulty reward wrapper * logging fixes * gin tweaks * tweak * fix * task id * param loading * more variable loading * entrypoint * tweak * ksync * restore lstm * begin rl19 support * tweak * rl19 rnn * more rl19 integration * fix * cleanup * restore rl19 rnn * cleanup * cleanup * wrappers.get_log_info * cleanup * cleanup * directory cleanup * logging, num_experiments * fixes * cleanup * gin fixes * fix local max gpu * resid nx * num machines and download params * rename * cleanup * create workbench * more reorg * fix * more logging wrappers * lint fix * restore train procgen * restore train procgen * pylint fix * better wrapping * config sweep * args sweep * test workers * mpi_weight * train test comm and high difficulty fix * enjoy show returns * removing gin, procgen_parser * removing gin * procgen args * config fixes * cleanup * cleanup * procgen args fix * fix * rcall syncing * lint * rename mpi_weight * use username for sync * fixes * microbatch fix	2019-05-03 15:54:22 -07:00
Christopher Hesse	376fd88bb8	fix vec monitor infos	2019-05-03 15:54:22 -07:00
pzhokhov	3301089b48	remove bullet extra, constrain gym version to be >= 0.10.0 (#885 ) * remove bullet extra, constrain gym version to be >= 0.10.0 * constrain gym version from above	2019-04-26 16:14:49 -07:00
pzhokhov	a07fad9066	change rms 2 tfrms switch in vec_normalize to be more explicit (#886 ) * change rms 2 tfrms switch in vec_normalize to be more explicit * modify the vec_normalize / use_tf logic a little bit * typo * use_tf = False by default	2019-04-26 16:14:21 -07:00
Taeyeong Jeong	5d8041d18e	Fix indexing LazyFrames (#875 ) Indexing LazyFrames with index i should return the single channel frame	2019-04-19 15:00:09 -07:00
Peter Zhokhov	fa37beb52e	fix commit on atari bms page to point to a public commit	2019-04-06 20:03:32 -07:00
Peter Zhokhov	8a97e0df10	fix shuffling bug in ppo1	2019-04-05 15:23:46 -07:00
pzhokhov	fabbf2c611	short-circuit framestack wrapper with size 1 (#871 )	2019-04-05 15:18:15 -07:00
Xingdong Zuo	5d285b318f	[Update misc_util.py]: clean up unused helper functions (#751 ) * Update misc_util.py * Update misc_util.py	2019-04-05 15:16:26 -07:00
Tim Zaman	49a99c7d23	Add eps to normalization (#797 )	2019-04-05 14:46:01 -07:00
Peter Zhokhov	c79b3373bf	parse colon-separated env_id's	2019-04-05 14:43:09 -07:00
Peter Zhokhov	96b6a31848	Merge branch 'internal' of github.com:openai/baselines into internal	2019-04-05 14:11:09 -07:00
Jacob Hilton	0a48a1fda9	Merge branch 'master' of github.com:openai/baselines into internal	2019-04-03 16:21:48 -07:00
Christopher Hesse	ea20c8a034	add score calculator wrapper, forward property lookups on vecenv wrap… (#300 ) * add score calculator wrapper, forward property lookups on vecenv wrapper, misc cleanup * tests * pylint	2019-04-03 16:20:42 -07:00
pzhokhov	a08af5d07d	make tests use single-threaded session for determinism of KfacOptimizer (#298 ) * make tests use single-threaded session for determinism of KfacOptimizer * updated comment in kfac.py * remove unused sess_config	2019-04-03 16:20:42 -07:00
Oleg Klimov	cc88c8e4c0	remove tensorflow dependency from VecEnv	2019-04-03 16:20:42 -07:00
pzhokhov	f2654082b2	Symshapes - gives codegen ability to evaluate same algo on envs with different ob/ac shapes (#262 ) * finish cherry-pick td3 test commit * removed graph simplification error ingore * merge delayed logger config * merge updated baselines logger * lazy_mpi load * cleanups * use lazy mpi imports in codegen * more lazy mpi * don't pretend that class is a module, just use it as a class * mass-replace mpi4py imports * flake8 * fix previous lazy_mpi imports * removed extra printouts from TdLayer op * silly recursion * running codegen cc experiment * wip * more wip * use actor is input for critic targets, instead of the action taken * batch size 100 * tweak update parameters * tweaking td3 runs * wip * use nenvs=2 for contcontrol (to be comparable with ppo_metal) * wip. Doubts about usefulness of actor in critic target * delayed actor in ActorLoss * score is average of last 100 * skip lack of losses or too many action distributions * 16 envs for contcontrol, replay buffer size equal to horizon (no point in making it longer) * syntax * microfixes * minifixes * run in process logic to bypass tensorflow freezes/failures (per Oleg's suggestion) * random physics for mujoco * random parts sizes with range 0.4 * add notebook with results into x/peterz * variations of ant * roboschool use gym.make kwargs * use float as lowest score after rank transform * rcall from master * wip * re-enable dynamic routing * wip * squash-merge master, resolve conflicts * remove erroneous file * restore normal MPI imports * move wrappers around a little bit * autopep8 * cleanups * cleanup mpi_eda, autopep8 * make activation function of action distribution customizable * cleanups; preparation for a pr * syntax * merge latest master, resolve conflicts * wrap MPI import with try/except * allow import of modules through env id im baselines cmd_util * flake8 complaints * only wrap box action spaces with ClipActionsWrapper * flake8 * fixes to algo_prob according to Oleg's suggestions * use apply_without_scope flag in ActorLoss * remove extra line in algo/core.py * multi-task support * autopep8 * symbolic suffix-shapes (not B,T yet) * test_with_mpi -> with_mpi rename * remove extra blank lines in algo/core * remove extra blank lines in algo/core * remove more blank lines * symbolify shapes in existing algorithms * minor output changes * cleaning up merge conflicts * cleaning up merge conflicts * cleaning up more merge conflicts * restore mpi_map.py from master	2019-04-03 16:20:42 -07:00
Karl Cobbe	dadc2c2eb6	Rl19 metalearning (#261 ) * rl19 metalearning and dict obs * master merge arch fix * lint fixes * view fixes * load vars tweaks * user config cleanup * documentation and revisions * pass train comm to rl19 * cleanup	2019-04-03 16:20:42 -07:00
pzhokhov	d9702e7ccb	codegen continuous control experiment pr (#256 ) * finish cherry-pick td3 test commit * removed graph simplification error ingore * merge delayed logger config * merge updated baselines logger * lazy_mpi load * cleanups * use lazy mpi imports in codegen * more lazy mpi * don't pretend that class is a module, just use it as a class * mass-replace mpi4py imports * flake8 * fix previous lazy_mpi imports * removed extra printouts from TdLayer op * silly recursion * running codegen cc experiment * wip * more wip * use actor is input for critic targets, instead of the action taken * batch size 100 * tweak update parameters * tweaking td3 runs * wip * use nenvs=2 for contcontrol (to be comparable with ppo_metal) * wip. Doubts about usefulness of actor in critic target * delayed actor in ActorLoss * score is average of last 100 * skip lack of losses or too many action distributions * 16 envs for contcontrol, replay buffer size equal to horizon (no point in making it longer) * syntax * microfixes * minifixes * run in process logic to bypass tensorflow freezes/failures (per Oleg's suggestion) * squash-merge master, resolve conflicts * remove erroneous file * restore normal MPI imports * move wrappers around a little bit * autopep8 * cleanups * cleanup mpi_eda, autopep8 * make activation function of action distribution customizable * cleanups; preparation for a pr * syntax * merge latest master, resolve conflicts * wrap MPI import with try/except * allow import of modules through env id im baselines cmd_util * flake8 complaints * only wrap box action spaces with ClipActionsWrapper * flake8 * fixes to algo_prob according to Oleg's suggestions * use apply_without_scope flag in ActorLoss * remove extra line in algo/core.py	2019-04-03 16:20:42 -07:00
Christopher Hesse	f641810ef9	update dmlab30 env (#258 )	2019-04-03 16:20:42 -07:00
Peter Zhokhov	3265098cc6	Merge branch 'master' of github.com:openai/baselines into internal	2019-04-01 16:26:25 -07:00
Sridhar Thiagarajan	6d1c6c78d3	Interface for U.make_session changed (#865 )	2019-04-01 16:24:02 -07:00
JongGyun Kim	62a9c76f18	fix the definition of `TfInput.make_feed_dict`. (#812 )	2019-04-01 15:49:25 -07:00
Hao-Chih, Lin	282c9cc91f	fix small bug in plot_results() (#864 ) Remove the comma behind the last input argument	2019-04-01 15:48:35 -07:00
Peter Zhokhov	096f4d9cf0	neaten up stacking logic in mujoco_dset in gail	2019-04-01 15:47:13 -07:00
Mingfei	16136ddca7	fix bugs: obs_ph normalization in adversary.py (#823 ) * fix bugs: obs_ph normalization in adversary.py * fix bug in reshape obs and acs in Mujobo_Dset	2019-04-01 15:44:31 -07:00
Darío Hereñú	b1644157d6	Fixed typo on #092 (#824 )	2019-04-01 15:41:52 -07:00
Yu Feng	58541db226	MPI refer to workers as ranks, not threads. (#833 )	2019-04-01 15:38:45 -07:00
zlsh80826	c02b575f01	ppo2: use time.perf_counter() instead of time.time() for time measurement (#847 )	2019-04-01 15:37:32 -07:00
Pastafarianist	897fa31548	Avoid using default config while requesting available GPUs (#863 )	2019-03-29 13:25:56 -07:00
Brett Daley	d51f8be8f9	Report episode rewards/length in A2C and ACKTR (#856 )	2019-03-28 09:21:48 -07:00
Jacob Hilton	3f2f45acef	Merge pull request #860 from openai/build-retro-env-framestack-fix run.py framestack bug fix	2019-03-25 14:33:15 -07:00
Jacob Hilton	b64974eb90	build_env now doesn't apply frame stack to retro games twice	2019-03-24 12:27:14 -07:00
pzhokhov	1b092434fc	remove f-strings for python 3.5 compatibility (#854 )	2019-03-16 11:54:47 -07:00
Peter Zhokhov	1259f6ab25	check for environment being vectorized in the play logic in run.py	2019-03-11 17:44:03 -07:00

1 2 3 4 5 ...

398 Commits