* sync internal changes. Make ddpg work with vecenvs
* B -> nenvs for consistency with other algos, small cleanups
* eval_done[d]==True -> eval_done[d]
* flake8 and numpy.random.random_integers deprecation warning
* store session at policy creation time
* coexistence tests
* fix a typo
* autopep8
* ... and flake8
* updated todo links in test_serialization
* sync internal changes. Make ddpg work with vecenvs
* B -> nenvs for consistency with other algos, small cleanups
* eval_done[d]==True -> eval_done[d]
* flake8 and numpy.random.random_integers deprecation warning
* disabled tests, running benchmarks only
* dummy commit to RUN BENCHMARKS
* benchmark ppo_metal; disable all but Bullet benchmarks
* ppo2, codegen ppo and ppo_metal on Bullet RUN BENCHMARKS
* run benchmarks on Roboschool instead RUN BENCHMARKS
* run ppo_metal on Roboschool as well RUN BENCHMARKS
* install roboschool in cron rcall user_config
* dummy commit to RUN BENCHMARKS
* import roboschool in codegen/contcontrol_prob.py RUN BENCHMARKS
* re-enable tests, flake8
* get entropy from a distribution in Pred RUN BENCHMARKS
* gin for hyperparameter injection; try codegen ppo close to baselines ppo RUN BENCHMARKS
* provide default value for cg2/bmv_net_ops.py
* dummy commit to RUN BENCHMARKS
* make tests and benchmarks parallel; use relative path to gin file for rcall compatibility RUN BENCHMARKS
* syntax error in run-benchmarks-new.py RUN BENCHMARKS
* syntax error in run-benchmarks-new.py RUN BENCHMARKS
* path relative to codegen/training for gin files RUN BENCHMARKS
* another reconcilliation attempt between codegen ppo and baselines ppo RUN BENCHMARKS
* value_network=copy for ppo2 on roboschool RUN BENCHMARKS
* make None seed work with torch seeding RUN BENCHMARKS
* try sequential batches with ppo2 RUN BENCHMARKS
* try ppo without advantage normalization RUN BENCHMARKS
* use Distribution to compute ema NLL RUN BENCHMARKS
* autopep8
* clip gradient norm in algo_agent RUN BENCHMARKS
* try ppo2 without vfloss clipping RUN BENCHMARKS
* trying with gamma=0.0 - assumption is, both algos should be equally bad RUN BENCHMARKS
* set gamma=0 in ppo2 RUN BENCHMARKS
* try with ppo2 with single minibatch RUN BENCHMARKS
* try with nminibatches=4, value_network=copy RUN BENCHMARKS
* try with nminibatches=1 take two RUN BENCHMARKS
* try initialization for vf=0.01 RUN BENCHMARKS
* fix the problem with min_istart >= max_istart
* i have no idea RUN BENCHMARKS
* fix non-shared variance between old and new RUN BENCHMARKS
* restored baselines.common.policies
* 16 minibatches in ppo_roboschool.gin
* fixing results of merge
* cleanups
* cleanups
* fix run-benchmarks-new RUN BENCHMARKS Roboschool8M
* fix syntax in run-benchmarks-new RUN BENCHMARKS Roboschool8M
* fix test failures
* moved gin requirement to codegen/setup.py
* remove duplicated build_softq in get_algo.py
* linting
* run softq on continuous action spaces RUN BENCHMARKS Roboschool8M
* run ddpg on Mujoco benchmark RUN BENCHMARKS
* autopep8
* fixed all syntax in refactored ddpg
* a little bit more refactoring
* autopep8
* identity test with ddpg WIP
* enable test_identity with ddpg
* refactored ddpg RUN BENCHMARKS
* autopep8
* include ddpg into style check
* fixing tests RUN BENCHMARKS
* set default seed to None RUN BENCHMARKS
* run tests and benchmarks in separate buildkite steps RUN BENCHMARKS
* cleanup pdb usage
* flake8 and cleanups
* re-enabled all benchmarks in run-benchmarks-new.py
* flake8 complaints
* deepq model builder compatible with network functions returning single tensor
* remove ddpg test with test_discrete_identity
* make ppo_metal use make_vec_env instead of make_atari_env
* make ppo_metal use make_vec_env instead of make_atari_env
* fixed syntax in ppo_metal.run_atari
__E901,E999,F821,F822,F823__ are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. The other flake8 issues are merely "style violations" -- useful for readability but they do not effect runtime safety. This PR therefore recommends a flake8 run of those tests on the entire codebase.
* F821: undefined name `name`
* F822: undefined name `name` in `__all__`
* F823: local variable `name` referenced before assignment
* E901: SyntaxError or IndentationError
* E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree
* Soup code, arch search on CIFAR-10
* Oh I understood how choice_sequence() worked
* Undo some pointless changes
* Some beautification 1
* Some beautification 2
* An attempt to debug test_get_algo_outputs() number 70, unsuccessful.
* Code style warning
* Code style warnings, more
* wip
* wip
* wip
* fix almost everything; soup machine still broken
* revert mpi_eda changes
* minor fixes
* refactor acktr
* setup.cfg now tests style/syntax in acktr as well
* flake8 complaints
* added note about continuous action spaces for acktr into the README.md
* Add possibility of plotting timesteps vs episodes
* Remove leftover from personal project patch
* Auto plt.tight_layout() on resize window event
Calls `plt.tight_layout()` if a `resize_event` is issued.
This means that the plot will look good even after the user has resized the plotting window.
* fix discovered test failures
* autopep8
* test indices up to 123
* testing from index 124 on
* add scope to logstd
* fix flakiness in test_train_mle
* autopep8
* add some docstrings
* start making big changes
* state machine redesign
* sampling seems to work
* some reorg
* fixed sampling of real vals
* json conversion
* made it possible to register new commands
got nontrivial version of Pred working
* consolidate command definitions
* add more macro blocks
* revived visualization
* rename Userdata -> CmdInterpreter
make AlgoSmInstance subclass of SmInstance that uses appropriate userdata argument
* replace userdata by ci when appropriate
* minor test fixes
* revamped handmade dir, can run ppo_metal
* seed to avoid random test failure
* implement AlgoAgent
* Autogenerated object that performs all ops and macros
* more CmdRecorder changes
* move files around
* move MatchProb and JtftProb
* remove obsolete
* fix tests involving AlgoAgent (pending the next commit on ppo_metal code)
* ppo_metal: reduce duplication in policy_gen, make sess an attribute of PpoAgent and StochasticPolicy instead of using get_default_session everywhere.
* maze_env reformatting, move algo_search script (but stil broken)
* move agent.py
* fix test on handcrafted agents
* tuning/fixing ppo_metal baseline
* minor
* Fix ppo_metal baseline
* Don’t set epcount, tcount unless they’re being used
* get rid of old ppo_metal baseline
* fixes for handmade/run.py tuning
* fix codegen ppo
* fix handmade ppo hps
* fix test, go back to safe_div
* switch to more complex filtering
* make sure all handcrafted algos have finite probability
* train to maximize logprob of provided samples
Trex changes to avoid segfault
* AlgoSm also includes global hyperparams
* don’t duplicate global hyperparam defaults
* create generic_ob_ac_space function
* use sorted list of outkeys
* revive tsne
* todo changes
* determinism test
* todo + test fix
* remove a few deprecated files, rename other tests so they don’t run automatically, fix real test failure
* continuous control with codegen
* continuous control with codegen
* implement continuous action space algodistr
* ppo with trex RUN BENCHMARKS
* wrap trex in a monitor
* dummy commit to RUN BENCHMARKS
* adding monitor to trex env RUN BENCHMARKS
* adding monitor to trex RUN BENCHMARKS
* include monitor into trex env RUN BENCHMARKS
* generate nll and predmean using Distribution node
* dummy commit to RUN BENCHMARKS
* include pybullet into baselines optional dependencies
* dummy commit to RUN BENCHMARKS
* install games for cron rcall user RUN BENCHMARKS
* add --yes flag to install.py in rcall config for cron user RUN BENCHMARKS
* both continuous and discrete versions seem to run
* fixes to monitor to work with vecenv-like info and rewards RUN BENCHMARKS
* dummy commit to RUN BENCHMARKS
* removed shape check from one-hot encoding logic in distributions.CategoricalPd
* reset logger configuration in codegen/handmade/run.py to be in-line with baselines RUN BENCHMARKS
* merged peterz_codegen_benchmarks RUN BENCHMARKS
* skip tests RUN BENCHMARKS
* working on test failures
* save benchmark dicts RUN BENCHMARK
* merged peterz_codegen_benchmark RUN BENCHMARKS
* add get_git_commit_message to the baselines.common.console_util
* dummy commit to RUN BENCHMARKS
* merged fixes from peterz_codegen_benchmark RUN BENCHMARKS
* fixing failure in test_algo_nll WIP
* test_algo_nll passes with both ppo and softq
* re-enabled tests
* run trex on gpus for 100k total (horizon=100k / 16) RUN BENCHMARKS
* merged latest peterz_codegen_benchmarks RUN BENCHMARKS
* fixing codegen test failures (logging-related)
* fixed name collision in run-benchmarks-new.py RUN BENCHMARKS
* fixed name collision in run-benchmarks-new.py RUN BENCHMARKS
* fixed import in node_filters.py
* test_algo_search passes
* some cleanup
* dummy commit to RUN BENCHMARKS
* merge fast fail for subprocvecenv RUN BENCHMARKS
* use SubprocVecEnv in sonic_prob
* added deprecation note to shmem_vec_env
* allow indexing of distributions
* add timeout to pipeline.yaml
* typo in pipeline.yml
* run tests with --forked option
* resolved merge conflict in rl_algs.bench.benchmarks
* re-enable parallel tests
* fix remaining merge conflicts and syntax
* Update trex_prob.py
* fixes to ResultsWriter
* take baselines/run.py from peterz_codegen branch
* actually save stuff to file in VecMonitor RUN BENCHMARKS
* enable parallel tests
* merge stricter flake8
* merge peterz_codegen_benchmark, resolve conflicts
* autopep8
* remove traces of Monitor from trex env, check shapes before encoding in CategoricalPd
* asserts and warnings to make q -> distribution change more explicit
* fixed assert in CategoricalPd
* add header to vec_monitor output file RUN BENCHMARKS
* make VecMonitor write header to the output file
* remove deprecation message from shmem_vec_env RUN BENCHMARKS
* autopep8
* proper shape test in distributions.py
* ResultsWriter can take dict headers
* dummy commit to RUN BENCHMARKS
* replace assert len(qs)==1 with warning RUN BENCHMARKS
* removed pdb from ppo2 RUN BENCHMARKS