* make baselines run without mpi wip
* squash-merged latest master
* further removing MPI references where unnecessary
* more MPI removal
* syntax and flake8
* MpiAdam becomes regular Adam if Mpi not present
* autopep8
* add assertion to test in mpi_adam; fix trpo_mpi failure without MPI on cartpole
* mpiless ddpg
* Adds retro to ppo2 defaults
Created defaults for retro, copied from Atari defaults for now. Tested with SuperMarioBros-Nes
* ppo2 retro defaults to atari
* DDPG has unused 'seed' argument
DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:
```
from baselines.common import set_global_seeds
...
def learn(...):
...
set_global_seeds(seed)
```
DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.
* DDPG: duplicate variable assignment
variable nb_actions assigned same value twice in space of 10 lines
nb_actions = env.action_space.shape[-1]
* DDPG: noise_type 'normal_x' and 'ou_x' cause assert
noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block:
'''
if self.action_noise is not None and apply_noise:
noise = self.action_noise()
assert noise.shape == action.shape
action += noise
'''
noise is not nested: [number_of_actions]
actions is nested: [[number_of_actions]]
Can either nest noise or unnest actions
* Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert"
* DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError
noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block:
'''
if self.action_noise is not None and apply_noise:
noise = self.action_noise()
assert noise.shape == action.shape
action += noise
'''
noise is not nested: [number_of_actions]
action is nested: [[number_of_actions]]
Hence the shapes do not pass the assert line even though the action += noise line is correct
DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:
```
from baselines.common import set_global_seeds
...
def learn(...):
...
set_global_seeds(seed)
```
DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.
* make acer use vecframestack
* acer passes mnist test with 20k steps
* acer with non-image observations and tests
* flake8
* test acer serialization with non-recurrent policies
* Add, initialize, normalize and sample from a demo buffer
* Modify losses and add cloning loss
* Add demo file parameter to train.py
* Introduce new params in config.py for demo based training
* Change logger.warning to logger.warn in rollout.py;bug
* Add data generation file for Fetch environments
* Update README file
* sync internal changes. Make ddpg work with vecenvs
* B -> nenvs for consistency with other algos, small cleanups
* eval_done[d]==True -> eval_done[d]
* flake8 and numpy.random.random_integers deprecation warning
* Merge branch 'master' of github.com:openai/games into peterz_track_baselines_branch
* sync internal changes. Make ddpg work with vecenvs
* B -> nenvs for consistency with other algos, small cleanups
* eval_done[d]==True -> eval_done[d]
* flake8 and numpy.random.random_integers deprecation warning
* store session at policy creation time
* coexistence tests
* fix a typo
* autopep8
* ... and flake8
* updated todo links in test_serialization
* sync internal changes. Make ddpg work with vecenvs
* B -> nenvs for consistency with other algos, small cleanups
* eval_done[d]==True -> eval_done[d]
* flake8 and numpy.random.random_integers deprecation warning
* disabled tests, running benchmarks only
* dummy commit to RUN BENCHMARKS
* benchmark ppo_metal; disable all but Bullet benchmarks
* ppo2, codegen ppo and ppo_metal on Bullet RUN BENCHMARKS
* run benchmarks on Roboschool instead RUN BENCHMARKS
* run ppo_metal on Roboschool as well RUN BENCHMARKS
* install roboschool in cron rcall user_config
* dummy commit to RUN BENCHMARKS
* import roboschool in codegen/contcontrol_prob.py RUN BENCHMARKS
* re-enable tests, flake8
* get entropy from a distribution in Pred RUN BENCHMARKS
* gin for hyperparameter injection; try codegen ppo close to baselines ppo RUN BENCHMARKS
* provide default value for cg2/bmv_net_ops.py
* dummy commit to RUN BENCHMARKS
* make tests and benchmarks parallel; use relative path to gin file for rcall compatibility RUN BENCHMARKS
* syntax error in run-benchmarks-new.py RUN BENCHMARKS
* syntax error in run-benchmarks-new.py RUN BENCHMARKS
* path relative to codegen/training for gin files RUN BENCHMARKS
* another reconcilliation attempt between codegen ppo and baselines ppo RUN BENCHMARKS
* value_network=copy for ppo2 on roboschool RUN BENCHMARKS
* make None seed work with torch seeding RUN BENCHMARKS
* try sequential batches with ppo2 RUN BENCHMARKS
* try ppo without advantage normalization RUN BENCHMARKS
* use Distribution to compute ema NLL RUN BENCHMARKS
* autopep8
* clip gradient norm in algo_agent RUN BENCHMARKS
* try ppo2 without vfloss clipping RUN BENCHMARKS
* trying with gamma=0.0 - assumption is, both algos should be equally bad RUN BENCHMARKS
* set gamma=0 in ppo2 RUN BENCHMARKS
* try with ppo2 with single minibatch RUN BENCHMARKS
* try with nminibatches=4, value_network=copy RUN BENCHMARKS
* try with nminibatches=1 take two RUN BENCHMARKS
* try initialization for vf=0.01 RUN BENCHMARKS
* fix the problem with min_istart >= max_istart
* i have no idea RUN BENCHMARKS
* fix non-shared variance between old and new RUN BENCHMARKS
* restored baselines.common.policies
* 16 minibatches in ppo_roboschool.gin
* fixing results of merge
* cleanups
* cleanups
* fix run-benchmarks-new RUN BENCHMARKS Roboschool8M
* fix syntax in run-benchmarks-new RUN BENCHMARKS Roboschool8M
* fix test failures
* moved gin requirement to codegen/setup.py
* remove duplicated build_softq in get_algo.py
* linting
* run softq on continuous action spaces RUN BENCHMARKS Roboschool8M
* run ddpg on Mujoco benchmark RUN BENCHMARKS
* autopep8
* fixed all syntax in refactored ddpg
* a little bit more refactoring
* autopep8
* identity test with ddpg WIP
* enable test_identity with ddpg
* refactored ddpg RUN BENCHMARKS
* autopep8
* include ddpg into style check
* fixing tests RUN BENCHMARKS
* set default seed to None RUN BENCHMARKS
* run tests and benchmarks in separate buildkite steps RUN BENCHMARKS
* cleanup pdb usage
* flake8 and cleanups
* re-enabled all benchmarks in run-benchmarks-new.py
* flake8 complaints
* deepq model builder compatible with network functions returning single tensor
* remove ddpg test with test_discrete_identity
* make ppo_metal use make_vec_env instead of make_atari_env
* make ppo_metal use make_vec_env instead of make_atari_env
* fixed syntax in ppo_metal.run_atari
__E901,E999,F821,F822,F823__ are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. The other flake8 issues are merely "style violations" -- useful for readability but they do not effect runtime safety. This PR therefore recommends a flake8 run of those tests on the entire codebase.
* F821: undefined name `name`
* F822: undefined name `name` in `__all__`
* F823: local variable `name` referenced before assignment
* E901: SyntaxError or IndentationError
* E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree
* Soup code, arch search on CIFAR-10
* Oh I understood how choice_sequence() worked
* Undo some pointless changes
* Some beautification 1
* Some beautification 2
* An attempt to debug test_get_algo_outputs() number 70, unsuccessful.
* Code style warning
* Code style warnings, more
* wip
* wip
* wip
* fix almost everything; soup machine still broken
* revert mpi_eda changes
* minor fixes
* refactor acktr
* setup.cfg now tests style/syntax in acktr as well
* flake8 complaints
* added note about continuous action spaces for acktr into the README.md