* lazy_mpi load
* cleanups
* more lazy mpi
* don't pretend that class is a module, just use it as a class
* mass-replace mpi4py imports
* flake8
* fix previous lazy_mpi imports
* silly recursion
* try os.environ hack
* better prefix test, work with mpich
* restored MPI imports
* removed commented import in test_with_mpi
* restored codegen from master
* remove lazy mpi
* restored changes from rl-algs
* remove extra files
* port mpi fix to shmem vec env
* increase the mpi test default timeout
* lazy_mpi load
* cleanups
* more lazy mpi
* don't pretend that class is a module, just use it as a class
* mass-replace mpi4py imports
* flake8
* fix previous lazy_mpi imports
* silly recursion
* try os.environ hack
* better prefix test, work with mpich
* restored MPI imports
* removed commented import in test_with_mpi
* restored codegen from master
* remove lazy mpi
* restored changes from rl-algs
* remove extra files
* address Chris' comments
* delayed logger configuration
* fix typo
* setters and getters for Logger.DEFAULT as well
* do away with fancy property stuff - unable to get it to work with class level methods
* grammar and spaces
* spaces
* use get_current function instead of reading Logger.CURRENT
* autopep8
* play with resnet
* feed_dict version
* coinrun prob and more stats
* fixes to get_choices_specs & hp search
* minor prob fixes
* minor fixes
* minor
* alternative version of rl_algo stuff
* pylint fixes
* fix bugs, move node_filters to soup
* changed how get_algo works
* change how get_algo works, probably broke all tests
* continue previous refactor
* get eval_agent running again
* fixing tests
* fix tests
* fix more tests
* clean up cma stuff
* fix experiment
* minor changes to eval_agent to make ppo_metal use gpu
* make dict space work
* modify mac makefile to use conda
* recurrent layers
* play with bn and resnets
* minor hp changes
* minor
* got rid of use_fb argument and jtft (joint-train-fine-tune) functionality
built test phase directly into AlgoProb
* make new rl algos generateable
* pylint; start fixing tests
* fixing tests
* more test fixes
* pylint
* fix search
* work on search
* hack around infinite loop caused by scan
* algo search fixes
* misc changes for search expt
* enable annealing, overriding options of Op
* pylint fixes
* identity op
* achieve use_last_output through masking so it automatically works in other distributions
* fix tests
* minor
* discrete
* use_last_output to be just a preference, not a hard constraint
* pred delay, pruning
* require nontrivial inputs
* aliases for get_sm
* add probname to probs
* fixes
* small fixes
* fix tests
* fix tests
* fix tests
* minor
* test scripts
* dualgru network improvements
* minor
* work on mysterious bugs
* rcall gpu-usage command for kube
* use cache dir that’s not in code folder, so that it doesn’t get removed by rcall code rsync
* add power mode to gpu usage
* make sure train/test actually different
* remove VR for now
* minor fixes
* simplify soln_db
* minor
* big refactor of mpi eda
* improve mpieda for multitask
* - get rid of timelimit hack
- add __del__ to cleanup SubprocVecEnv
* get multitask working better
* fixes
* working on atari, various
* annotate ops with whether they’re parametrized
* minor
* gym version
* rand atari prob
* minor
* SolnDb bugfix and name change
* pyspy script
* switch conv layers
* fix roboschool/bullet3
* nenvs assertion
* fix rand atari
* get rid of blanket exception catching
fix soln_db bug
* fix rand_atari
* dynamic routing as cmdline arg
* slight modifications to test_mpi_map and pyspy-all
* max_tries argument for run_until_successs
* dedup option in train_mle
* simplify soln_db
* increase atari horizon for 1 experiment
* start implementing reward increment
* ent multiplier
* create cc dsl
other misc fixes
* cc ops
* q_func -> qs in rl_algos_cc.py
* fix PredictDistr
* rl_ops_cc fixes, MakeAction op
* augment algo agent to support cc stuff
* work on ddpg experiments
* fix blocking
temporarily change logger
* allow layer scaling
* pylint fixes
* spawn_method
* isolate ddpg hacks
* improve pruning
* use spawn for subproc
* remove use of python -c in rcall
* fix pylint warning
* fix static
* maybe fix local backend
* switch to DummyVecEnv
* making some fixes via pylint
* pylint fixes
* fixing tests
* fix tests
* fix tests
* write scaffolding for SSL in Codegen
* logger fix
* fix error
* add EMA op to sl_ops
* save many changes
* save
* add upsampler
* add sl ops, enhance state machine
* get ssl search working — some gross hacking
* fix session/graph issue
* fix importing
* work on mle
* - scale embeddings in gru model
- better exception handling in sl_prob
- use emas for test/val
- use non-contrib batch_norm layer
* improve logging
* option to average before dumping in logger
* default arguments, etc
* new ddpg and identity test
* concat fix
* minor
* move realistic ssl stuff to third-party (underscore to dash)
* fixes
* remove realistic_ssl_evaluation
* pylint fixes
* use gym master
* try again
* pass around args without gin
* fix tests
* separate line to install gym
* rename failing tests that should be ignored
* add data aug
* ssl improvements
* use fixed time limit
* try to fix baselines tests
* add score_floor, max_walltime, fiddle with lr decay
* realistic_ssl
* autopep8
* various ssl
- enable blocking grad for simplification
- kl
- multiple final prediction
* fix pruning
* misc ssl stuff
* bring back linear schedule, don’t use allgather for collecting stats
(i’ve been getting nondeterministic errors from the old code)
* save/load weights in SSL, big stepsize
* cleanup SslProb
* fix
* get rid of kl coef
* fix simplification, lower lr
* search over hps
* minor fixes
* minor
* static analysis
* move files and rename things for improved consistency.
still broken, and just saving before making nontrivial changes
* various
* make tests pass
* move coinrun_train to codegen since it depends on codegen
* fixes
* pylint fixes
* improve tests
fix some things
* improve tests
* lint
* fix up db_info.py, tests
* mostly restore master version of envs directory, except for makefile changes
* fix tests
* improve printing
* minor fixes
* fix fixmes
* pruning test
* fixes
* lint
* write new test that makes tf graphs of random algos; fix some bugs it caught
* add —delete flag to rcall upload-code command
* lint
* get cifar10 lazily for testing purposes
* disable codegen ci tests for now
* clean up rl_ops
* rename spec classes
* td3 with identity test
* identity tests without gin files
* remove gin.configurable from AlgoAgent
* comments about reduction in rl_ops_cc
* address @pzhokhov comments
* fix tests
* more linting
* better tests
* clean up filtering a bit
* fix concat
* Added required arguments to the policy builder in the ACER model to
fix the issue #783
* Changed the step model from nbatch to nenvs
* Updated nsteps to be 1.
* Recognize nightly tf builds
* Use LooseVersion instead of StrictVersion to recongnize nightly build numbers
Nightly version numbers are of the form `1.3.0.dev20181215` but it's not a valid version number for `StrictVersion`, while `LooseVersion` still recognizes it.
* joshim5 changes (width and height to WarpFrame wrapper)
* match network output with action distribution via a linear layer only if necessary (#167)
* support color vs. grayscale option in WarpFrame wrapper (#166)
* support color vs. grayscale option in WarpFrame wrapper
* Support color in other wrappers
* Updated per Peters suggestions
* fixing test failures
* ppo2 with microbatches (#168)
* pass microbatch_size to the model during construction
* microbatch fixes and test (#169)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* Peterz joshim5 subclass ppo2 model (#170)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* subclassing the model to make microbatched version of model WIP
* made microbatched model a subclass of ppo2 Model
* flake8 complaint
* mpi-less ppo2 (resolving merge conflict)
* flake8 and mpi4py imports in ppo2/model.py
* more un-mpying
* merge master
* updates to the benchmark viewer code + autopep8 (#184)
* viz docs and syntactic sugar wip
* update viewer yaml to use persistent volume claims
* move plot_util to baselines.common, update links
* use 1Tb hard drive for results viewer
* small updates to benchmark vizualizer code
* autopep8
* autopep8
* any folder can be a benchmark
* massage games image a little bit
* fixed --preload option in app.py
* remove preload from run_viewer.sh
* remove pdb breakpoints
* update bench-viewer.yaml
* fixed bug (#185)
* fixed bug
it's wrong to do the else statement, because no other nodes would start.
* changed the fix slightly
* Refactor her phase 1 (#194)
* add monitor to the rollout envs in her RUN BENCHMARKS her
* Slice -> Slide in her benchmarks RUN BENCHMARKS her
* run her benchmark for 200 epochs
* dummy commit to RUN BENCHMARKS her
* her benchmark for 500 epochs RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* disable saving of policies in her benchmark RUN BENCHMARKS her
* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch
* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch
* launcher refactor wip
* wip
* her works on FetchReach
* her runner refactor RUN BENCHMARKS Fetch1M
* unit test for her
* fixing warnings in mpi_average in her, skip test_fetchreach if mujoco is not present
* pickle-based serialization in her
* remove extra import from subproc_vec_env.py
* investigating differences in rollout.py
* try with old rollout code RUN BENCHMARKS her
* temporarily use DummyVecEnv in cmd_util.py RUN BENCHMARKS her
* dummy commit to RUN BENCHMARKS her
* set info_values in rollout worker in her RUN BENCHMARKS her
* bug in rollout_new.py RUN BENCHMARKS her
* fixed bug in rollout_new.py RUN BENCHMARKS her
* do not use last step because vecenv calls reset and returns obs after reset RUN BENCHMARKS her
* updated buffer sizes RUN BENCHMARKS her
* fixed loading/saving via joblib
* dust off learning from demonstrations in HER, docs, refactor
* add deprecation notice on her play and plot files
* address comments by Matthias
* add monitor to the rollout envs in her RUN BENCHMARKS her
* Slice -> Slide in her benchmarks RUN BENCHMARKS her
* run her benchmark for 200 epochs
* dummy commit to RUN BENCHMARKS her
* her benchmark for 500 epochs RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* disable saving of policies in her benchmark RUN BENCHMARKS her
* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch
* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch
* launcher refactor wip
* wip
* her works on FetchReach
* her runner refactor RUN BENCHMARKS Fetch1M
* unit test for her
* fixing warnings in mpi_average in her, skip test_fetchreach if mujoco is not present
* pickle-based serialization in her
* remove extra import from subproc_vec_env.py
* investigating differences in rollout.py
* try with old rollout code RUN BENCHMARKS her
* temporarily use DummyVecEnv in cmd_util.py RUN BENCHMARKS her
* dummy commit to RUN BENCHMARKS her
* set info_values in rollout worker in her RUN BENCHMARKS her
* bug in rollout_new.py RUN BENCHMARKS her
* fixed bug in rollout_new.py RUN BENCHMARKS her
* do not use last step because vecenv calls reset and returns obs after reset RUN BENCHMARKS her
* updated buffer sizes RUN BENCHMARKS her
* fixed loading/saving via joblib
* dust off learning from demonstrations in HER, docs, refactor
* add deprecation notice on her play and plot files
* address comments by Matthias
* joshim5 changes (width and height to WarpFrame wrapper)
* match network output with action distribution via a linear layer only if necessary (#167)
* support color vs. grayscale option in WarpFrame wrapper (#166)
* support color vs. grayscale option in WarpFrame wrapper
* Support color in other wrappers
* Updated per Peters suggestions
* fixing test failures
* ppo2 with microbatches (#168)
* pass microbatch_size to the model during construction
* microbatch fixes and test (#169)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* Peterz joshim5 subclass ppo2 model (#170)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* subclassing the model to make microbatched version of model WIP
* made microbatched model a subclass of ppo2 Model
* flake8 complaint
* mpi-less ppo2 (resolving merge conflict)
* flake8 and mpi4py imports in ppo2/model.py
* more un-mpying
* merge master
* updates to the benchmark viewer code + autopep8 (#184)
* viz docs and syntactic sugar wip
* update viewer yaml to use persistent volume claims
* move plot_util to baselines.common, update links
* use 1Tb hard drive for results viewer
* small updates to benchmark vizualizer code
* autopep8
* autopep8
* any folder can be a benchmark
* massage games image a little bit
* fixed --preload option in app.py
* remove preload from run_viewer.sh
* remove pdb breakpoints
* update bench-viewer.yaml
* fixed bug (#185)
* fixed bug
it's wrong to do the else statement, because no other nodes would start.
* changed the fix slightly
* viz docs and syntactic sugar wip
* update viewer yaml to use persistent volume claims
* move plot_util to baselines.common, update links
* use 1Tb hard drive for results viewer
* small updates to benchmark vizualizer code
* autopep8
* autopep8
* any folder can be a benchmark
* massage games image a little bit
* fixed --preload option in app.py
* remove preload from run_viewer.sh
* remove pdb breakpoints
* update bench-viewer.yaml
* Added parameter documentation
This parameter was thus far not documented and is non-intuitive when unfamiliar with tf.
* Added parameter documentation
* joshim5 changes (width and height to WarpFrame wrapper)
* match network output with action distribution via a linear layer only if necessary (#167)
* support color vs. grayscale option in WarpFrame wrapper (#166)
* support color vs. grayscale option in WarpFrame wrapper
* Support color in other wrappers
* Updated per Peters suggestions
* fixing test failures
* ppo2 with microbatches (#168)
* pass microbatch_size to the model during construction
* microbatch fixes and test (#169)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* Peterz joshim5 subclass ppo2 model (#170)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* subclassing the model to make microbatched version of model WIP
* made microbatched model a subclass of ppo2 Model
* flake8 complaint
* mpi-less ppo2 (resolving merge conflict)
* flake8 and mpi4py imports in ppo2/model.py
* more un-mpying
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* subclassing the model to make microbatched version of model WIP
* made microbatched model a subclass of ppo2 Model
* flake8 complaint
* DDPG has unused 'seed' argument
DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:
```
from baselines.common import set_global_seeds
...
def learn(...):
...
set_global_seeds(seed)
```
DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.
* DDPG: duplicate variable assignment
variable nb_actions assigned same value twice in space of 10 lines
nb_actions = env.action_space.shape[-1]
* DDPG: noise_type 'normal_x' and 'ou_x' cause assert
noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block:
'''
if self.action_noise is not None and apply_noise:
noise = self.action_noise()
assert noise.shape == action.shape
action += noise
'''
noise is not nested: [number_of_actions]
actions is nested: [[number_of_actions]]
Can either nest noise or unnest actions
* Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert"
* DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError
noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block:
'''
if self.action_noise is not None and apply_noise:
noise = self.action_noise()
assert noise.shape == action.shape
action += noise
'''
noise is not nested: [number_of_actions]
action is nested: [[number_of_actions]]
Hence the shapes do not pass the assert line even though the action += noise line is correct
* Removing Print Spam from Wrapper
Prints a line every time a video is saved or not saved. Seems unnecessary.