baselines

Author	SHA1	Message	Date
XFFXFF	7859f603cd	prioritized experience replay bug (#527 )	2018-09-20 16:16:44 -07:00
pzhokhov	0f4ae2fb2a	refactor acktr (#560 ) * refactor acktr * setup.cfg now tests style/syntax in acktr as well * flake8 complaints * added note about continuous action spaces for acktr into the README.md	2018-09-20 16:05:26 -07:00
pzhokhov	0e7048b89f	Update README.md	2018-09-19 15:04:54 -07:00
pzhokhov	75983bab64	Update README.md	2018-09-19 15:04:01 -07:00
Alfredo Canziani	85be74500d	Add possibility of plotting timesteps vs episodes (#578 ) * Add possibility of plotting timesteps vs episodes * Remove leftover from personal project patch * Auto plt.tight_layout() on resize window event Calls `plt.tight_layout()` if a `resize_event` is issued. This means that the plot will look good even after the user has resized the plotting window.	2018-09-19 09:43:45 -07:00
Geoffrey Irving	115b59d28b	Merge pull request #598 from openai/irving-rc Fix setup.py for tensorflow -rc versions	2018-09-18 15:52:57 -07:00
Xingdong Zuo	d34049cab4	Update running_mean_std.py (#585 )	2018-09-18 14:14:38 -07:00
pzhokhov	59662fff78	rename entcoeff to ent_coef in trpo_mpi for compatibility with other algos (#581 )	2018-09-18 14:13:05 -07:00
Geoffrey Irving	a42c4eb2bb	Fix setup.py for tensorflow -rc versions	2018-09-18 11:35:43 -07:00
R1ckF	68a29d0ab3	--play now works with LSTM (#595 )	2018-09-17 14:33:39 -07:00
Xingdong Zuo	0c6f357936	Delete identity_env.py (#588 )	2018-09-17 09:53:34 -07:00
pzhokhov	4dc697e670	codegen test fixes (#95 ) * fix discovered test failures * autopep8 * test indices up to 123 * testing from index 124 on * add scope to logstd * fix flakiness in test_train_mle * autopep8	2018-09-14 15:43:50 -07:00
Peter Zhokhov	e790f5214b	define mean for CategoricalPd (as softmax of logits)	2018-09-14 15:43:50 -07:00
pzhokhov	fe06c6b4db	continuous action spaces for codegen + some benchmarking (#82 ) * add some docstrings * start making big changes * state machine redesign * sampling seems to work * some reorg * fixed sampling of real vals * json conversion * made it possible to register new commands got nontrivial version of Pred working * consolidate command definitions * add more macro blocks * revived visualization * rename Userdata -> CmdInterpreter make AlgoSmInstance subclass of SmInstance that uses appropriate userdata argument * replace userdata by ci when appropriate * minor test fixes * revamped handmade dir, can run ppo_metal * seed to avoid random test failure * implement AlgoAgent * Autogenerated object that performs all ops and macros * more CmdRecorder changes * move files around * move MatchProb and JtftProb * remove obsolete * fix tests involving AlgoAgent (pending the next commit on ppo_metal code) * ppo_metal: reduce duplication in policy_gen, make sess an attribute of PpoAgent and StochasticPolicy instead of using get_default_session everywhere. * maze_env reformatting, move algo_search script (but stil broken) * move agent.py * fix test on handcrafted agents * tuning/fixing ppo_metal baseline * minor * Fix ppo_metal baseline * Don’t set epcount, tcount unless they’re being used * get rid of old ppo_metal baseline * fixes for handmade/run.py tuning * fix codegen ppo * fix handmade ppo hps * fix test, go back to safe_div * switch to more complex filtering * make sure all handcrafted algos have finite probability * train to maximize logprob of provided samples Trex changes to avoid segfault * AlgoSm also includes global hyperparams * don’t duplicate global hyperparam defaults * create generic_ob_ac_space function * use sorted list of outkeys * revive tsne * todo changes * determinism test * todo + test fix * remove a few deprecated files, rename other tests so they don’t run automatically, fix real test failure * continuous control with codegen * continuous control with codegen * implement continuous action space algodistr * ppo with trex RUN BENCHMARKS * wrap trex in a monitor * dummy commit to RUN BENCHMARKS * adding monitor to trex env RUN BENCHMARKS * adding monitor to trex RUN BENCHMARKS * include monitor into trex env RUN BENCHMARKS * generate nll and predmean using Distribution node * dummy commit to RUN BENCHMARKS * include pybullet into baselines optional dependencies * dummy commit to RUN BENCHMARKS * install games for cron rcall user RUN BENCHMARKS * add --yes flag to install.py in rcall config for cron user RUN BENCHMARKS * both continuous and discrete versions seem to run * fixes to monitor to work with vecenv-like info and rewards RUN BENCHMARKS * dummy commit to RUN BENCHMARKS * removed shape check from one-hot encoding logic in distributions.CategoricalPd * reset logger configuration in codegen/handmade/run.py to be in-line with baselines RUN BENCHMARKS * merged peterz_codegen_benchmarks RUN BENCHMARKS * skip tests RUN BENCHMARKS * working on test failures * save benchmark dicts RUN BENCHMARK * merged peterz_codegen_benchmark RUN BENCHMARKS * add get_git_commit_message to the baselines.common.console_util * dummy commit to RUN BENCHMARKS * merged fixes from peterz_codegen_benchmark RUN BENCHMARKS * fixing failure in test_algo_nll WIP * test_algo_nll passes with both ppo and softq * re-enabled tests * run trex on gpus for 100k total (horizon=100k / 16) RUN BENCHMARKS * merged latest peterz_codegen_benchmarks RUN BENCHMARKS * fixing codegen test failures (logging-related) * fixed name collision in run-benchmarks-new.py RUN BENCHMARKS * fixed name collision in run-benchmarks-new.py RUN BENCHMARKS * fixed import in node_filters.py * test_algo_search passes * some cleanup * dummy commit to RUN BENCHMARKS * merge fast fail for subprocvecenv RUN BENCHMARKS * use SubprocVecEnv in sonic_prob * added deprecation note to shmem_vec_env * allow indexing of distributions * add timeout to pipeline.yaml * typo in pipeline.yml * run tests with --forked option * resolved merge conflict in rl_algs.bench.benchmarks * re-enable parallel tests * fix remaining merge conflicts and syntax * Update trex_prob.py * fixes to ResultsWriter * take baselines/run.py from peterz_codegen branch * actually save stuff to file in VecMonitor RUN BENCHMARKS * enable parallel tests * merge stricter flake8 * merge peterz_codegen_benchmark, resolve conflicts * autopep8 * remove traces of Monitor from trex env, check shapes before encoding in CategoricalPd * asserts and warnings to make q -> distribution change more explicit * fixed assert in CategoricalPd * add header to vec_monitor output file RUN BENCHMARKS * make VecMonitor write header to the output file * remove deprecation message from shmem_vec_env RUN BENCHMARKS * autopep8 * proper shape test in distributions.py * ResultsWriter can take dict headers * dummy commit to RUN BENCHMARKS * replace assert len(qs)==1 with warning RUN BENCHMARKS * removed pdb from ppo2 RUN BENCHMARKS	2018-09-14 15:43:49 -07:00
Peter Zhokhov	1f99a562e3	autopep8	2018-09-11 13:21:52 -07:00
Peter Zhokhov	4e2a888273	Merge commit 'refs/subrepo/baselines/fetch' into subrepo/baselines	2018-09-11 13:19:39 -07:00
Peter Zhokhov	c5b2918607	git subrepo pull (merge) baselines subrepo: subdir: "baselines" merged: "2742f819" upstream: origin: "git@github.com:openai/baselines.git" branch: "master" commit: "5c5a9f4b" git-subrepo: version: "0.4.0" origin: "git@github.com:ingydotnet/git-subrepo.git" commit: "74339e8"	2018-09-11 13:18:43 -07:00
Peter Zhokhov	3bf31a4330	git subrepo commit (merge) baselines subrepo: subdir: "baselines" merged: "0846932a" upstream: origin: "git@github.com:openai/baselines.git" branch: "master" commit: "c5d6f299" git-subrepo: version: "0.4.0" origin: "git@github.com:ingydotnet/git-subrepo.git" commit: "74339e8"	2018-09-11 13:18:43 -07:00
pzhokhov	9070ee7ef3	tighten flake8, autopep8 to fix trailing whitespaces and blank lines with whitespaces (#87 )	2018-09-11 13:18:43 -07:00
Peter Zhokhov	e56803491f	git subrepo pull (merge) baselines subrepo: subdir: "baselines" merged: "5c6a1fd9" upstream: origin: "git@github.com:openai/baselines.git" branch: "master" commit: "23b23332" git-subrepo: version: "0.4.0" origin: "git@github.com:ingydotnet/git-subrepo.git" commit: "74339e8"	2018-09-11 13:18:42 -07:00
pzhokhov	b3bc25d99a	add fast failure when calling methods on a closed subprocvecenv (#84 )	2018-09-11 13:18:42 -07:00
Peter Zhokhov	5c5a9f4b31	autopep8 on deepq/experiments	2018-09-11 12:47:50 -07:00
Peter Zhokhov	5183fa9f29	autopep8 on deepq/experiments	2018-09-11 12:47:50 -07:00
Peter Zhokhov	3bf35cb468	added peterz to baselines authorlist	2018-09-11 12:44:51 -07:00
Peter Zhokhov	5c62f5c7dd	added peterz to baselines authorlist	2018-09-11 12:44:51 -07:00
Peter Zhokhov	29bf587d15	Merge branch 'master' of github.com:openai/baselines	2018-09-11 12:40:29 -07:00
Peter Zhokhov	c5d6f2996c	Merge branch 'master' of github.com:openai/baselines	2018-09-11 12:40:29 -07:00
Peter Zhokhov	06bdc2860c	docstrings about vecenvs	2018-09-11 12:40:23 -07:00
pzhokhov	adaa8aefa8	baselines issue #564 (#574 ) * fixes to enjoy_cartpole, enjoy_mountaincar.py * fixed {train,enjoy}_pong, removed enjoy_retro * set number of timesteps to 1e7 in train_pong * flake8 complaints * use synchronous version fo acktr in test_env_after_learn * flake8	2018-09-10 11:50:59 -07:00
pzhokhov	23b2333238	baselines issue #564 (#574 ) * fixes to enjoy_cartpole, enjoy_mountaincar.py * fixed {train,enjoy}_pong, removed enjoy_retro * set number of timesteps to 1e7 in train_pong * flake8 complaints * use synchronous version fo acktr in test_env_after_learn * flake8	2018-09-10 11:50:59 -07:00
Peter Zhokhov	8614c4ddbf	flake8	2018-09-10 10:41:29 -07:00
Peter Zhokhov	59a7ffb84d	fixe tests of test_env_after_learn	2018-09-10 10:32:42 -07:00
Daniel Angelov	58b1021b28	Add tensorboard start command for convenience (#569 )	2018-09-07 17:04:02 -07:00
Peter Zhokhov	a60e88bff9	git subrepo pull (merge) baselines subrepo: subdir: "baselines" merged: "8785db28" upstream: origin: "git@github.com:openai/baselines.git" branch: "master" commit: "35e95ee8" git-subrepo: version: "0.4.0" origin: "git@github.com:ingydotnet/git-subrepo.git" commit: "74339e8"	2018-09-07 16:35:00 -07:00
pzhokhov	75b93b890e	implement pdfromlatent in BernoulliPdType (#81 ) * implement pdfromlatent in BernoulliPdType * remove env.close() at the end of algorithms * test case for environment after learn * closing env in run.py * fixes for acktr and trpo_mpi * add make_session with new graph for every call in test_env_after_learn * remove extra prints from test_env_after_learn	2018-09-07 16:35:00 -07:00
John Schulman	565b2153d7	Add lots of docstrings (#76 ) * Add lots of docstrings Change hyperparameter transformations for slightly better efficiency and to avoid circular dependency. Now all parameters are stored in a “human-readable” form. * improve pretty-print of nodes and trees * newlines at end-of-file, return graph in render(), assert_valid() fix * split run_algo_search.py into several simpler scripts * add joint_train option to get_prob * minor changes to soln_db and embedding script * Arguments: -> Args: * fix replay, part 1 * fix behavior when using unpickled algos * re-add retrieve_weights * make training scripts more consistent * lint * lint * lint + remove rendering some rendering functionality from trex env as it’s also elsewhere * get rid of warnings * refactor functionality for getting final q-function and losses. revive code for removing useless terms & tests for simplification. * fix vecenv closing * finish removing algo folder (most useful functionality has been moved out of it) * control verbosity of trex * fix tests * rename spec => choice_spec, some comments, asserts, debug prints * fix some tests	2018-09-07 16:34:59 -07:00
Peter Zhokhov	35e95ee85a	fix python 3.5 string format compatibility	2018-09-06 12:00:19 -07:00
Isaac Lascasas	ad219e205d	VecNormalize: set env. returns to zero on resets. (#556 ) * VecNormalize: set env. returns to zero on resets. * VecNormalize: returns reset in step_wait after ret_rms.update.	2018-09-06 10:21:50 -07:00
Peter Zhokhov	be9118bcd8	git subrepo pull (merge) baselines subrepo: subdir: "baselines" merged: "f2a9b8f2" upstream: origin: "git@github.com:openai/baselines.git" branch: "master" commit: "cc4215ef" git-subrepo: version: "0.4.0" origin: "git@github.com:ingydotnet/git-subrepo.git" commit: "74339e8"	2018-09-06 10:18:13 -07:00
pzhokhov	02a5e7aed5	fixes to readme and baselines/run.py (#80 ) * fixes to readme and baselines/run.py * polish installation section of baselines README * polish installation section of baselines README	2018-09-06 10:18:13 -07:00
pzhokhov	87ac8bc317	install roboschool in install.py (#55 ) * putting instructions from README.md into a script * install roboschool as a part of setup.py * install roboschool from install.py * export pkg_config_path * remove compilation step from roboschool/setup.py * removed roboschool install from games install due to extra compilation step * removed unused import from roboschool/setup.py	2018-09-06 10:18:13 -07:00
Tom	cc4215ef4b	refactor common.models via registering reflection (#565 )	2018-09-06 10:16:06 -07:00
Clayton Thorrez	1e9051e87e	fixed warning (#464 )	2018-09-05 15:12:01 -07:00
uronce-cc	43ed76944b	Fix mean reward per episode after training Pong. (#562 ) * Fix mean reward per episode after training Pong. * Fix typo.	2018-09-05 15:06:29 -07:00
Peter Zhokhov	7f08c675bb	git subrepo pull (merge) baselines subrepo: subdir: "baselines" merged: "39f8be8f" upstream: origin: "git@github.com:openai/baselines.git" branch: "master" commit: "0a40206c" git-subrepo: version: "0.4.0" origin: "git@github.com:ingydotnet/git-subrepo.git" commit: "74339e8"	2018-09-04 10:23:40 -07:00
pzhokhov	b3f966aa02	use env.render in dummy_vec_env.render when num_envs == 1 (#74 ) * use env.render in dummy_vec_env.render when num_envs == 1 * use shorter super() syntax per Alex's suggestion	2018-09-04 10:23:40 -07:00
pzhokhov	51cefc933b	make load_variables compatible with old list format (#71 ) * make load_variables compatible with old list format * cosmetic fixes	2018-09-04 10:23:39 -07:00
Christopher Hesse	7bccb2969f	baselines: default logger similar to configure() logger, rcall: don't call logger.configure() for new rl_algs * error if logger looks wrong * check version of logger, call logger.configure() on import * remove changes entry * add version to rl-algs * fix typo * add comment * switch version to string * set logger env variable	2018-09-04 10:23:39 -07:00
uronce-cc	0a40206c6c	ncpu needs to be an integer. (#558 )	2018-08-31 09:02:18 -07:00
Alfredo Canziani	1937826784	Fix alien syntax and apply PEP 8 style (#554 )	2018-08-30 17:21:25 -07:00

1 2 3 4 5 ...

262 Commits