continuous action spaces for codegen + some benchmarking (#82)

* add some docstrings

* start making big changes

* state machine redesign

* sampling seems to work

* some reorg

* fixed sampling of real vals

* json conversion

* made it possible to register new commands
got nontrivial version of Pred working

* consolidate command definitions

* add more macro blocks

* revived visualization

* rename Userdata -> CmdInterpreter
make AlgoSmInstance subclass of SmInstance that uses appropriate userdata argument

* replace userdata by ci when appropriate

* minor test fixes

* revamped handmade dir, can run ppo_metal

* seed to avoid random test failure

* implement AlgoAgent

* Autogenerated object that performs all ops and macros

* more CmdRecorder changes

* move files around

* move MatchProb and JtftProb

* remove obsolete

* fix tests involving AlgoAgent (pending the next commit on ppo_metal code)

* ppo_metal: reduce duplication in policy_gen, make sess an attribute of PpoAgent and StochasticPolicy instead of using get_default_session everywhere.

* maze_env reformatting, move algo_search script (but stil broken)

* move agent.py

* fix test on handcrafted agents

* tuning/fixing ppo_metal baseline

* minor

* Fix ppo_metal baseline

* Don’t set epcount, tcount unless they’re being used

* get rid of old ppo_metal baseline

* fixes for handmade/run.py tuning

* fix codegen ppo

* fix handmade ppo hps

* fix test, go back to safe_div

* switch to more complex filtering

* make sure all handcrafted algos have finite probability

* train to maximize logprob of provided samples
Trex changes to avoid segfault

* AlgoSm also includes global hyperparams

* don’t duplicate global hyperparam defaults

* create generic_ob_ac_space function

* use sorted list of outkeys

* revive tsne

* todo changes

* determinism test

* todo + test fix

* remove a few deprecated files, rename other tests so they don’t run automatically, fix real test failure

* continuous control with codegen

* continuous control with codegen

* implement continuous action space algodistr

* ppo with trex RUN BENCHMARKS

* wrap trex in a monitor

* dummy commit to RUN BENCHMARKS

* adding monitor to trex env RUN BENCHMARKS

* adding monitor to trex RUN BENCHMARKS

* include monitor into trex env RUN BENCHMARKS

* generate nll and predmean using Distribution node

* dummy commit to RUN BENCHMARKS

* include pybullet into baselines optional dependencies

* dummy commit to RUN BENCHMARKS

* install games for cron rcall user RUN BENCHMARKS

* add --yes flag to install.py in rcall config for cron user RUN BENCHMARKS

* both continuous and discrete versions seem to run

* fixes to monitor to work with vecenv-like info and rewards RUN BENCHMARKS

* dummy commit to RUN BENCHMARKS

* removed shape check from one-hot encoding logic in distributions.CategoricalPd

* reset logger configuration in codegen/handmade/run.py to be in-line with baselines RUN BENCHMARKS

* merged peterz_codegen_benchmarks RUN BENCHMARKS

* skip tests RUN BENCHMARKS

* working on test failures

* save benchmark dicts RUN BENCHMARK

* merged peterz_codegen_benchmark RUN BENCHMARKS

* add get_git_commit_message to the baselines.common.console_util

* dummy commit to RUN BENCHMARKS

* merged fixes from peterz_codegen_benchmark RUN BENCHMARKS

* fixing failure in test_algo_nll WIP

* test_algo_nll passes with both ppo and softq

* re-enabled tests

* run trex on gpus for 100k total (horizon=100k / 16) RUN BENCHMARKS

* merged latest peterz_codegen_benchmarks RUN BENCHMARKS

* fixing codegen test failures (logging-related)

* fixed name collision in run-benchmarks-new.py RUN BENCHMARKS

* fixed name collision in run-benchmarks-new.py RUN BENCHMARKS

* fixed import in node_filters.py

* test_algo_search passes

* some cleanup

* dummy commit to RUN BENCHMARKS

* merge fast fail for subprocvecenv RUN BENCHMARKS

* use SubprocVecEnv in sonic_prob

* added deprecation note to shmem_vec_env

* allow indexing of distributions

* add timeout to pipeline.yaml

* typo in pipeline.yml

* run tests with --forked option

* resolved merge conflict in rl_algs.bench.benchmarks

* re-enable parallel tests

* fix remaining merge conflicts and syntax

* Update trex_prob.py

* fixes to ResultsWriter

* take baselines/run.py from peterz_codegen branch

* actually save stuff to file in VecMonitor RUN BENCHMARKS

* enable parallel tests

* merge stricter flake8

* merge peterz_codegen_benchmark, resolve conflicts

* autopep8

* remove traces of Monitor from trex env, check shapes before encoding in CategoricalPd

* asserts and warnings to make q -> distribution change more explicit

* fixed assert in CategoricalPd

* add header to vec_monitor output file RUN BENCHMARKS

* make VecMonitor write header to the output file

* remove deprecation message from shmem_vec_env RUN BENCHMARKS

* autopep8

* proper shape test in distributions.py

* ResultsWriter can take dict headers

* dummy commit to RUN BENCHMARKS

* replace assert len(qs)==1 with warning RUN BENCHMARKS

* removed pdb from ppo2 RUN BENCHMARKS

This commit is contained in:

pzhokhov

2018-09-12 10:14:41 -07:00

committed by

Peter Zhokhov

parent 1f99a562e3

commit fe06c6b4db

9 changed files with 132 additions and 49 deletions

									
										28

baselines/run.py
									
												View File
												
				@@ -154,9 +154,6 @@ def get_default_network(env_type):

				    else:

				        return 'mlp'

				    raise ValueError('Unknown env_type {}'.format(env_type))

				def get_alg_module(alg, submodule=None):

				    submodule = submodule or alg

				    try:

				@@ -182,16 +179,21 @@ def get_learn_function_defaults(alg, env_type):

				    return kwargs

				def parse(v):

				    '''

				    convert value of a command-line arg to a python object if possible, othewise, keep as string

				    '''

				    assert isinstance(v, str)

				    try:

				        return eval(v)

				    except (NameError, SyntaxError):

				        return v

				def parse_cmdline_kwargs(args):

				    '''

				    convert a list of '='-spaced command-line arguments to a dictionary, evaluating python objects when possible

				    '''

				    def parse(v):

				        assert isinstance(v, str)

				        try:

				            return eval(v)

				        except (NameError, SyntaxError):

				            return v

				    return {k: parse(v) for k,v in parse_unknown_args(args).items()}

				def main():

				@@ -199,7 +201,7 @@ def main():

				    arg_parser = common_arg_parser()

				    args, unknown_args = arg_parser.parse_known_args()

				    extra_args = {k: parse(v) for k, v in parse_unknown_args(unknown_args).items()}

				    extra_args = parse_cmdline_kwargs(unknown_args)

				    if MPI is None or MPI.COMM_WORLD.Get_rank() == 0:

				        rank = 0

continuous action spaces for codegen + some benchmarking (#82)

28 baselines/run.py Unescape Escape View File

28

baselines/run.py

View File