* viz docs
* writing vizualization docs
* documenting plot_util
* docstrings in plot_util
* autopep8 and flake8
* spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)
* rephrased viz.md a little bit
* more examples of viz code usage in the docs
* Fix: Return the result of rendering from dummyvecenv
* Add: Add a video recorder wrapper for vecenv
* Change: Use VecVideoRecorder with --video_monitor flag
* Change: Overwrite the metadata only when it isn't defined
* Add: Define __del__ to make the file correctly closed in exit
* Fix: Bump epidode_id in reset()
* Fix: Use hasattr to check the existence of .metadata
* Fix: Make directory when it doesn't exist
* Change: Kepp recording for `video_length` steps, then close
Because reset() is not what it is in normal gym.Env
* Add: Enable to specify video_length from command line argument
* Delete: Delete default value, None, of video_callable
* Change: Use self.recorded_frames and self.recording to manage intervals
* Add: Log the status of video recording
* Fix: Fix saving path
* Change: Place metadata in the base VecEnv
* Delete: Delete unused imports
* Fix: epidode_id => step_id
* Fix: Refine the flag name
* Change: Unify the flag name folloing to previous change
* [WIP] Add: Add a test of VecVideoRecorder
* Fix: Use PongNoFrameskip-v0 because SimpleEnv doesn't have render()
* Change; Use TemporaryDirectory
* Fix: minimal successful test
* Add: Test against parallel environments
* Add: Test against different type of VecEnvs
* Change: Test against different length and interval of video capture
* Delete: Reduce the number of tests
* Change: Test if the output video is not empty
* Add: Add some comments
* Fix: Fix the flag name
* Add: Add docstrings
* Fix: Install ffmpeg in testing container for VecVideoRecorder's test
* Fix: Delete unused things
* Fix: Replace `video_callable` with `record_video_trigger`
* Fix: Improve the explanation of `record_video_trigger` argument
* Fix: Close owning vecenv in VecVideoRecorder.close to resolve memory
leak
* viz docs
* writing vizualization docs
* documenting plot_util
* docstrings in plot_util
* autopep8 and flake8
* spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)
* rephrased viz.md a little bit
* make baselines run without mpi wip
* squash-merged latest master
* further removing MPI references where unnecessary
* more MPI removal
* syntax and flake8
* MpiAdam becomes regular Adam if Mpi not present
* autopep8
* add assertion to test in mpi_adam; fix trpo_mpi failure without MPI on cartpole
* mpiless ddpg
* Adds retro to ppo2 defaults
Created defaults for retro, copied from Atari defaults for now. Tested with SuperMarioBros-Nes
* ppo2 retro defaults to atari
* DDPG has unused 'seed' argument
DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:
```
from baselines.common import set_global_seeds
...
def learn(...):
...
set_global_seeds(seed)
```
DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.
* DDPG: duplicate variable assignment
variable nb_actions assigned same value twice in space of 10 lines
nb_actions = env.action_space.shape[-1]
* DDPG: noise_type 'normal_x' and 'ou_x' cause assert
noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block:
'''
if self.action_noise is not None and apply_noise:
noise = self.action_noise()
assert noise.shape == action.shape
action += noise
'''
noise is not nested: [number_of_actions]
actions is nested: [[number_of_actions]]
Can either nest noise or unnest actions
* Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert"
* DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError
noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block:
'''
if self.action_noise is not None and apply_noise:
noise = self.action_noise()
assert noise.shape == action.shape
action += noise
'''
noise is not nested: [number_of_actions]
action is nested: [[number_of_actions]]
Hence the shapes do not pass the assert line even though the action += noise line is correct
DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:
```
from baselines.common import set_global_seeds
...
def learn(...):
...
set_global_seeds(seed)
```
DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.
* make acer use vecframestack
* acer passes mnist test with 20k steps
* acer with non-image observations and tests
* flake8
* test acer serialization with non-recurrent policies
* Add, initialize, normalize and sample from a demo buffer
* Modify losses and add cloning loss
* Add demo file parameter to train.py
* Introduce new params in config.py for demo based training
* Change logger.warning to logger.warn in rollout.py;bug
* Add data generation file for Fetch environments
* Update README file
* sync internal changes. Make ddpg work with vecenvs
* B -> nenvs for consistency with other algos, small cleanups
* eval_done[d]==True -> eval_done[d]
* flake8 and numpy.random.random_integers deprecation warning
* Merge branch 'master' of github.com:openai/games into peterz_track_baselines_branch
* sync internal changes. Make ddpg work with vecenvs
* B -> nenvs for consistency with other algos, small cleanups
* eval_done[d]==True -> eval_done[d]
* flake8 and numpy.random.random_integers deprecation warning
* store session at policy creation time
* coexistence tests
* fix a typo
* autopep8
* ... and flake8
* updated todo links in test_serialization
* sync internal changes. Make ddpg work with vecenvs
* B -> nenvs for consistency with other algos, small cleanups
* eval_done[d]==True -> eval_done[d]
* flake8 and numpy.random.random_integers deprecation warning
* disabled tests, running benchmarks only
* dummy commit to RUN BENCHMARKS
* benchmark ppo_metal; disable all but Bullet benchmarks
* ppo2, codegen ppo and ppo_metal on Bullet RUN BENCHMARKS
* run benchmarks on Roboschool instead RUN BENCHMARKS
* run ppo_metal on Roboschool as well RUN BENCHMARKS
* install roboschool in cron rcall user_config
* dummy commit to RUN BENCHMARKS
* import roboschool in codegen/contcontrol_prob.py RUN BENCHMARKS
* re-enable tests, flake8
* get entropy from a distribution in Pred RUN BENCHMARKS
* gin for hyperparameter injection; try codegen ppo close to baselines ppo RUN BENCHMARKS
* provide default value for cg2/bmv_net_ops.py
* dummy commit to RUN BENCHMARKS
* make tests and benchmarks parallel; use relative path to gin file for rcall compatibility RUN BENCHMARKS
* syntax error in run-benchmarks-new.py RUN BENCHMARKS
* syntax error in run-benchmarks-new.py RUN BENCHMARKS
* path relative to codegen/training for gin files RUN BENCHMARKS
* another reconcilliation attempt between codegen ppo and baselines ppo RUN BENCHMARKS
* value_network=copy for ppo2 on roboschool RUN BENCHMARKS
* make None seed work with torch seeding RUN BENCHMARKS
* try sequential batches with ppo2 RUN BENCHMARKS
* try ppo without advantage normalization RUN BENCHMARKS
* use Distribution to compute ema NLL RUN BENCHMARKS
* autopep8
* clip gradient norm in algo_agent RUN BENCHMARKS
* try ppo2 without vfloss clipping RUN BENCHMARKS
* trying with gamma=0.0 - assumption is, both algos should be equally bad RUN BENCHMARKS
* set gamma=0 in ppo2 RUN BENCHMARKS
* try with ppo2 with single minibatch RUN BENCHMARKS
* try with nminibatches=4, value_network=copy RUN BENCHMARKS
* try with nminibatches=1 take two RUN BENCHMARKS
* try initialization for vf=0.01 RUN BENCHMARKS
* fix the problem with min_istart >= max_istart
* i have no idea RUN BENCHMARKS
* fix non-shared variance between old and new RUN BENCHMARKS
* restored baselines.common.policies
* 16 minibatches in ppo_roboschool.gin
* fixing results of merge
* cleanups
* cleanups
* fix run-benchmarks-new RUN BENCHMARKS Roboschool8M
* fix syntax in run-benchmarks-new RUN BENCHMARKS Roboschool8M
* fix test failures
* moved gin requirement to codegen/setup.py
* remove duplicated build_softq in get_algo.py
* linting
* run softq on continuous action spaces RUN BENCHMARKS Roboschool8M