* fix#795: Making tf_util._Function consistent
The fix involves using the placeholder name to crossreference passed
kwargs values, just like the tf_util.function expects. Also, the givens
are updated before the parameters to make it behave like it's supposed
to.
* test: Adding test for issue #795
* Added required arguments to the policy builder in the ACER model to
fix the issue #783
* Changed the step model from nbatch to nenvs
* Updated nsteps to be 1.
* Recognize nightly tf builds
* Use LooseVersion instead of StrictVersion to recongnize nightly build numbers
Nightly version numbers are of the form `1.3.0.dev20181215` but it's not a valid version number for `StrictVersion`, while `LooseVersion` still recognizes it.
* joshim5 changes (width and height to WarpFrame wrapper)
* match network output with action distribution via a linear layer only if necessary (#167)
* support color vs. grayscale option in WarpFrame wrapper (#166)
* support color vs. grayscale option in WarpFrame wrapper
* Support color in other wrappers
* Updated per Peters suggestions
* fixing test failures
* ppo2 with microbatches (#168)
* pass microbatch_size to the model during construction
* microbatch fixes and test (#169)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* Peterz joshim5 subclass ppo2 model (#170)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* subclassing the model to make microbatched version of model WIP
* made microbatched model a subclass of ppo2 Model
* flake8 complaint
* mpi-less ppo2 (resolving merge conflict)
* flake8 and mpi4py imports in ppo2/model.py
* more un-mpying
* merge master
* updates to the benchmark viewer code + autopep8 (#184)
* viz docs and syntactic sugar wip
* update viewer yaml to use persistent volume claims
* move plot_util to baselines.common, update links
* use 1Tb hard drive for results viewer
* small updates to benchmark vizualizer code
* autopep8
* autopep8
* any folder can be a benchmark
* massage games image a little bit
* fixed --preload option in app.py
* remove preload from run_viewer.sh
* remove pdb breakpoints
* update bench-viewer.yaml
* fixed bug (#185)
* fixed bug
it's wrong to do the else statement, because no other nodes would start.
* changed the fix slightly
* Refactor her phase 1 (#194)
* add monitor to the rollout envs in her RUN BENCHMARKS her
* Slice -> Slide in her benchmarks RUN BENCHMARKS her
* run her benchmark for 200 epochs
* dummy commit to RUN BENCHMARKS her
* her benchmark for 500 epochs RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* disable saving of policies in her benchmark RUN BENCHMARKS her
* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch
* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch
* launcher refactor wip
* wip
* her works on FetchReach
* her runner refactor RUN BENCHMARKS Fetch1M
* unit test for her
* fixing warnings in mpi_average in her, skip test_fetchreach if mujoco is not present
* pickle-based serialization in her
* remove extra import from subproc_vec_env.py
* investigating differences in rollout.py
* try with old rollout code RUN BENCHMARKS her
* temporarily use DummyVecEnv in cmd_util.py RUN BENCHMARKS her
* dummy commit to RUN BENCHMARKS her
* set info_values in rollout worker in her RUN BENCHMARKS her
* bug in rollout_new.py RUN BENCHMARKS her
* fixed bug in rollout_new.py RUN BENCHMARKS her
* do not use last step because vecenv calls reset and returns obs after reset RUN BENCHMARKS her
* updated buffer sizes RUN BENCHMARKS her
* fixed loading/saving via joblib
* dust off learning from demonstrations in HER, docs, refactor
* add deprecation notice on her play and plot files
* address comments by Matthias
* joshim5 changes (width and height to WarpFrame wrapper)
* match network output with action distribution via a linear layer only if necessary (#167)
* support color vs. grayscale option in WarpFrame wrapper (#166)
* support color vs. grayscale option in WarpFrame wrapper
* Support color in other wrappers
* Updated per Peters suggestions
* fixing test failures
* ppo2 with microbatches (#168)
* pass microbatch_size to the model during construction
* microbatch fixes and test (#169)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* Peterz joshim5 subclass ppo2 model (#170)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* subclassing the model to make microbatched version of model WIP
* made microbatched model a subclass of ppo2 Model
* flake8 complaint
* mpi-less ppo2 (resolving merge conflict)
* flake8 and mpi4py imports in ppo2/model.py
* more un-mpying
* merge master
* updates to the benchmark viewer code + autopep8 (#184)
* viz docs and syntactic sugar wip
* update viewer yaml to use persistent volume claims
* move plot_util to baselines.common, update links
* use 1Tb hard drive for results viewer
* small updates to benchmark vizualizer code
* autopep8
* autopep8
* any folder can be a benchmark
* massage games image a little bit
* fixed --preload option in app.py
* remove preload from run_viewer.sh
* remove pdb breakpoints
* update bench-viewer.yaml
* fixed bug (#185)
* fixed bug
it's wrong to do the else statement, because no other nodes would start.
* changed the fix slightly
* Added parameter documentation
This parameter was thus far not documented and is non-intuitive when unfamiliar with tf.
* Added parameter documentation
* joshim5 changes (width and height to WarpFrame wrapper)
* match network output with action distribution via a linear layer only if necessary (#167)
* support color vs. grayscale option in WarpFrame wrapper (#166)
* support color vs. grayscale option in WarpFrame wrapper
* Support color in other wrappers
* Updated per Peters suggestions
* fixing test failures
* ppo2 with microbatches (#168)
* pass microbatch_size to the model during construction
* microbatch fixes and test (#169)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* Peterz joshim5 subclass ppo2 model (#170)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* subclassing the model to make microbatched version of model WIP
* made microbatched model a subclass of ppo2 Model
* flake8 complaint
* mpi-less ppo2 (resolving merge conflict)
* flake8 and mpi4py imports in ppo2/model.py
* more un-mpying
* DDPG has unused 'seed' argument
DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:
```
from baselines.common import set_global_seeds
...
def learn(...):
...
set_global_seeds(seed)
```
DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.
* DDPG: duplicate variable assignment
variable nb_actions assigned same value twice in space of 10 lines
nb_actions = env.action_space.shape[-1]
* DDPG: noise_type 'normal_x' and 'ou_x' cause assert
noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block:
'''
if self.action_noise is not None and apply_noise:
noise = self.action_noise()
assert noise.shape == action.shape
action += noise
'''
noise is not nested: [number_of_actions]
actions is nested: [[number_of_actions]]
Can either nest noise or unnest actions
* Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert"
* DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError
noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block:
'''
if self.action_noise is not None and apply_noise:
noise = self.action_noise()
assert noise.shape == action.shape
action += noise
'''
noise is not nested: [number_of_actions]
action is nested: [[number_of_actions]]
Hence the shapes do not pass the assert line even though the action += noise line is correct
* Removing Print Spam from Wrapper
Prints a line every time a video is saved or not saved. Seems unnecessary.
* joshim5 changes (width and height to WarpFrame wrapper)
* match network output with action distribution via a linear layer only if necessary (#167)
* support color vs. grayscale option in WarpFrame wrapper (#166)
* support color vs. grayscale option in WarpFrame wrapper
* Support color in other wrappers
* Updated per Peters suggestions
* fixing test failures
* viz docs
* writing vizualization docs
* documenting plot_util
* docstrings in plot_util
* autopep8 and flake8
* spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)
* rephrased viz.md a little bit
* more examples of viz code usage in the docs
* replaced vizualization doc with notebook
* viz docs
* writing vizualization docs
* documenting plot_util
* docstrings in plot_util
* autopep8 and flake8
* spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)
* rephrased viz.md a little bit
* more examples of viz code usage in the docs
* Fix: Return the result of rendering from dummyvecenv
* Add: Add a video recorder wrapper for vecenv
* Change: Use VecVideoRecorder with --video_monitor flag
* Change: Overwrite the metadata only when it isn't defined
* Add: Define __del__ to make the file correctly closed in exit
* Fix: Bump epidode_id in reset()
* Fix: Use hasattr to check the existence of .metadata
* Fix: Make directory when it doesn't exist
* Change: Kepp recording for `video_length` steps, then close
Because reset() is not what it is in normal gym.Env
* Add: Enable to specify video_length from command line argument
* Delete: Delete default value, None, of video_callable
* Change: Use self.recorded_frames and self.recording to manage intervals
* Add: Log the status of video recording
* Fix: Fix saving path
* Change: Place metadata in the base VecEnv
* Delete: Delete unused imports
* Fix: epidode_id => step_id
* Fix: Refine the flag name
* Change: Unify the flag name folloing to previous change
* [WIP] Add: Add a test of VecVideoRecorder
* Fix: Use PongNoFrameskip-v0 because SimpleEnv doesn't have render()
* Change; Use TemporaryDirectory
* Fix: minimal successful test
* Add: Test against parallel environments
* Add: Test against different type of VecEnvs
* Change: Test against different length and interval of video capture
* Delete: Reduce the number of tests
* Change: Test if the output video is not empty
* Add: Add some comments
* Fix: Fix the flag name
* Add: Add docstrings
* Fix: Install ffmpeg in testing container for VecVideoRecorder's test
* Fix: Delete unused things
* Fix: Replace `video_callable` with `record_video_trigger`
* Fix: Improve the explanation of `record_video_trigger` argument
* Fix: Close owning vecenv in VecVideoRecorder.close to resolve memory
leak
* viz docs
* writing vizualization docs
* documenting plot_util
* docstrings in plot_util
* autopep8 and flake8
* spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)
* rephrased viz.md a little bit
* make baselines run without mpi wip
* squash-merged latest master
* further removing MPI references where unnecessary
* more MPI removal
* syntax and flake8
* MpiAdam becomes regular Adam if Mpi not present
* autopep8
* add assertion to test in mpi_adam; fix trpo_mpi failure without MPI on cartpole
* mpiless ddpg
* Adds retro to ppo2 defaults
Created defaults for retro, copied from Atari defaults for now. Tested with SuperMarioBros-Nes
* ppo2 retro defaults to atari
* DDPG has unused 'seed' argument
DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:
```
from baselines.common import set_global_seeds
...
def learn(...):
...
set_global_seeds(seed)
```
DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.
* DDPG: duplicate variable assignment
variable nb_actions assigned same value twice in space of 10 lines
nb_actions = env.action_space.shape[-1]
* DDPG: noise_type 'normal_x' and 'ou_x' cause assert
noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block:
'''
if self.action_noise is not None and apply_noise:
noise = self.action_noise()
assert noise.shape == action.shape
action += noise
'''
noise is not nested: [number_of_actions]
actions is nested: [[number_of_actions]]
Can either nest noise or unnest actions
* Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert"
* DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError
noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block:
'''
if self.action_noise is not None and apply_noise:
noise = self.action_noise()
assert noise.shape == action.shape
action += noise
'''
noise is not nested: [number_of_actions]
action is nested: [[number_of_actions]]
Hence the shapes do not pass the assert line even though the action += noise line is correct
DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:
```
from baselines.common import set_global_seeds
...
def learn(...):
...
set_global_seeds(seed)
```
DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.