* add a custom delay to identity_env
* min reward 0.8 in delayed identity test
* seed the tests, perfect score on delayed_identity_test
* delay=1 in delayed_identity_test
* flake8 complaints
* increased number of steps in fixed_seq_test
* seed identity tests to ensure reproducibility
* docstrings
* tweak
* documentation
* rely on log_comm, remove mpi averaging from wrappers
* pass comm for ppo2 initialization
* ppo2 logging
* experiment tweaks
* auto launch tensorboard when using local backend
* graph tweaks
* pass caller to config
* configure logger and tensorboard
* make parent dir if necessary
* parentdir tweak
* ci/runtests.sh - pass all folders to pytest
* mpi_optimizer_test precision 1e-4
* fixes to tests
* search for tests in the entire jax folder, also remove unnecessary humor
* transformer mnist experiments
* version that only builds one model
* work on inverted mnist
* Add grad clipping to MpiAdamOptimizer
* various
* transformer changes, loading
* get rid of soft labels
* transformer baseline
* minor
* experiments involving all possible training sets
* vary training
* minor
* get ready for fine-tuning expers
* lint
* minor
* change rms 2 tfrms switch in vec_normalize to be more explicit
* modify the vec_normalize / use_tf logic a little bit
* typo
* use_tf = False by default
* finish cherry-pick td3 test commit
* removed graph simplification error ingore
* merge delayed logger config
* merge updated baselines logger
* lazy_mpi load
* cleanups
* use lazy mpi imports in codegen
* more lazy mpi
* don't pretend that class is a module, just use it as a class
* mass-replace mpi4py imports
* flake8
* fix previous lazy_mpi imports
* removed extra printouts from TdLayer op
* silly recursion
* running codegen cc experiment
* wip
* more wip
* use actor is input for critic targets, instead of the action taken
* batch size 100
* tweak update parameters
* tweaking td3 runs
* wip
* use nenvs=2 for contcontrol (to be comparable with ppo_metal)
* wip. Doubts about usefulness of actor in critic target
* delayed actor in ActorLoss
* score is average of last 100
* skip lack of losses or too many action distributions
* 16 envs for contcontrol, replay buffer size equal to horizon (no point in making it longer)
* syntax
* microfixes
* minifixes
* run in process logic to bypass tensorflow freezes/failures (per Oleg's suggestion)
* random physics for mujoco
* random parts sizes with range 0.4
* add notebook with results into x/peterz
* variations of ant
* roboschool use gym.make kwargs
* use float as lowest score after rank transform
* rcall from master
* wip
* re-enable dynamic routing
* wip
* squash-merge master, resolve conflicts
* remove erroneous file
* restore normal MPI imports
* move wrappers around a little bit
* autopep8
* cleanups
* cleanup mpi_eda, autopep8
* make activation function of action distribution customizable
* cleanups; preparation for a pr
* syntax
* merge latest master, resolve conflicts
* wrap MPI import with try/except
* allow import of modules through env id im baselines cmd_util
* flake8 complaints
* only wrap box action spaces with ClipActionsWrapper
* flake8
* fixes to algo_prob according to Oleg's suggestions
* use apply_without_scope flag in ActorLoss
* remove extra line in algo/core.py
* multi-task support
* autopep8
* symbolic suffix-shapes (not B,T yet)
* test_with_mpi -> with_mpi rename
* remove extra blank lines in algo/core
* remove extra blank lines in algo/core
* remove more blank lines
* symbolify shapes in existing algorithms
* minor output changes
* cleaning up merge conflicts
* cleaning up merge conflicts
* cleaning up more merge conflicts
* restore mpi_map.py from master
* finish cherry-pick td3 test commit
* removed graph simplification error ingore
* merge delayed logger config
* merge updated baselines logger
* lazy_mpi load
* cleanups
* use lazy mpi imports in codegen
* more lazy mpi
* don't pretend that class is a module, just use it as a class
* mass-replace mpi4py imports
* flake8
* fix previous lazy_mpi imports
* removed extra printouts from TdLayer op
* silly recursion
* running codegen cc experiment
* wip
* more wip
* use actor is input for critic targets, instead of the action taken
* batch size 100
* tweak update parameters
* tweaking td3 runs
* wip
* use nenvs=2 for contcontrol (to be comparable with ppo_metal)
* wip. Doubts about usefulness of actor in critic target
* delayed actor in ActorLoss
* score is average of last 100
* skip lack of losses or too many action distributions
* 16 envs for contcontrol, replay buffer size equal to horizon (no point in making it longer)
* syntax
* microfixes
* minifixes
* run in process logic to bypass tensorflow freezes/failures (per Oleg's suggestion)
* squash-merge master, resolve conflicts
* remove erroneous file
* restore normal MPI imports
* move wrappers around a little bit
* autopep8
* cleanups
* cleanup mpi_eda, autopep8
* make activation function of action distribution customizable
* cleanups; preparation for a pr
* syntax
* merge latest master, resolve conflicts
* wrap MPI import with try/except
* allow import of modules through env id im baselines cmd_util
* flake8 complaints
* only wrap box action spaces with ClipActionsWrapper
* flake8
* fixes to algo_prob according to Oleg's suggestions
* use apply_without_scope flag in ActorLoss
* remove extra line in algo/core.py