Commit Graph

295 Commits

Author SHA1 Message Date
pzhokhov
001597586d updates to the benchmark viewer code + autopep8 (#184)
* viz docs and syntactic sugar wip

* update viewer yaml to use persistent volume claims

* move plot_util to baselines.common, update links

* use 1Tb hard drive for results viewer

* small updates to benchmark vizualizer code

* autopep8

* autopep8

* any folder can be a benchmark

* massage games image a little bit

* fixed --preload option in app.py

* remove preload from run_viewer.sh

* remove pdb breakpoints

* update bench-viewer.yaml
2018-11-26 16:42:20 -08:00
Peter Zhokhov
1ddab4bdb5 Merge branch 'master' of github.com:openai/baselines into internal 2018-11-14 14:54:16 -08:00
sedand
d3fed181b5 Fixed comment on example usage in jupyter-notebook (#396)
Cause of error: Import name must be results_plotter, not log_viewer.
2018-11-14 14:50:59 -08:00
Roman Ring
339d5640b9 add docs for layer_norm param in DQN baseline (#107) 2018-11-14 12:22:42 -08:00
Buck Shlegeris
a75bc37a40 fix typo in a comment (#161) 2018-11-14 12:20:55 -08:00
Peter Zhokhov
87b3a04a38 autopep8 2018-11-14 12:16:53 -08:00
Brent Komer
c5b1a1b643 typo fix (#230) 2018-11-13 13:08:32 -08:00
JohannesAck
c59a10947d Parameter documentation for tf_util.function (#349)
* Added parameter documentation

This parameter was thus far not documented and is non-intuitive when unfamiliar with tf.

* Added parameter documentation
2018-11-13 13:03:48 -08:00
Peter Zhokhov
776a134218 merge master 2018-11-13 11:24:57 -08:00
James Alan Preiss
5cd66010dc case-insensitive sort for human-readable logger (#289) 2018-11-13 11:09:11 -08:00
Xiaoquan Kong
0a13da8dfe Change variable name from inpt to input_ (#297) 2018-11-13 11:08:21 -08:00
Vladislav Zavadskyy
18b6390be6 Typo fix (#287) 2018-11-13 11:03:55 -08:00
pzhokhov
52255beda5 microbatches in ppo2, custom frame size in WarpFrame, matching fc layer only when needed (#707)
* joshim5 changes (width and height to WarpFrame wrapper)

* match network output with action distribution via a linear layer only if necessary (#167)

* support color vs. grayscale option in WarpFrame wrapper (#166)

* support color vs. grayscale option in WarpFrame wrapper

* Support color in other wrappers

* Updated per Peters suggestions

* fixing test failures

* ppo2 with microbatches (#168)

* pass microbatch_size to the model during construction

* microbatch fixes and test (#169)

* microbatch fixes and test

* tiny cleanup

* added assertions to the test

* vpg-related fix

* Peterz joshim5 subclass ppo2 model (#170)

* microbatch fixes and test

* tiny cleanup

* added assertions to the test

* vpg-related fix

* subclassing the model to make microbatched version of model WIP

* made microbatched model a subclass of ppo2 Model

* flake8 complaint

* mpi-less ppo2 (resolving merge conflict)

* flake8 and mpi4py imports in ppo2/model.py

* more un-mpying
2018-11-09 11:18:05 -08:00
Peter Zhokhov
0b8126f949 more un-mpying 2018-11-09 10:08:39 -08:00
Peter Zhokhov
84323c3d49 flake8 and mpi4py imports in ppo2/model.py 2018-11-09 09:32:59 -08:00
Peter Zhokhov
5a2b96abdd Merge branch 'master' of github.com:openai/baselines into internal 2018-11-08 10:36:54 -08:00
Peter Zhokhov
57c23cddd6 mpi-less ppo2 (resolving merge conflict) 2018-11-08 10:36:36 -08:00
pzhokhov
310fbadba3 Peterz joshim5 subclass ppo2 model (#170)
* microbatch fixes and test

* tiny cleanup

* added assertions to the test

* vpg-related fix

* subclassing the model to make microbatched version of model WIP

* made microbatched model a subclass of ppo2 Model

* flake8 complaint
2018-11-08 10:20:49 -08:00
pzhokhov
c424f9889d microbatch fixes and test (#169)
* microbatch fixes and test

* tiny cleanup

* added assertions to the test

* vpg-related fix
2018-11-08 10:20:02 -08:00
peter
a1cef656b8 pass microbatch_size to the model during construction 2018-11-08 10:20:02 -08:00
pzhokhov
b0589da817 ppo2 with microbatches (#168) 2018-11-08 10:20:02 -08:00
AurelianTactics
d80acbb4d1 Removing print spam from Wrapper (#705)
* DDPG has unused 'seed' argument

DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:

```
from baselines.common import set_global_seeds
...
def learn(...):
...
   set_global_seeds(seed)
```

DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.

* DDPG: duplicate variable assignment

variable nb_actions assigned same value twice in space of 10 lines
nb_actions = env.action_space.shape[-1]

* DDPG: noise_type 'normal_x' and 'ou_x' cause assert

noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block:
'''
        if self.action_noise is not None and apply_noise:
            noise = self.action_noise()
            assert noise.shape == action.shape
            action += noise
'''

noise is not nested: [number_of_actions]
actions is nested: [[number_of_actions]]
Can either nest noise or unnest actions

* Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert"

* DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError

noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block:
'''
        if self.action_noise is not None and apply_noise:
            noise = self.action_noise()
            assert noise.shape == action.shape
            action += noise
'''

noise is not nested: [number_of_actions]
action is nested: [[number_of_actions]]
Hence the shapes do not pass the assert line even though the action += noise line is correct

* Removing Print Spam from Wrapper

Prints a line every time a video is saved or not saved. Seems unnecessary.
2018-11-08 10:13:07 -08:00
pzhokhov
556b198454 Internal minifixes (#694)
* joshim5 changes (width and height to WarpFrame wrapper)

* match network output with action distribution via a linear layer only if necessary (#167)

* support color vs. grayscale option in WarpFrame wrapper (#166)

* support color vs. grayscale option in WarpFrame wrapper

* Support color in other wrappers

* Updated per Peters suggestions

* fixing test failures
2018-11-08 10:11:45 -08:00
pzhokhov
cc88804042 Update viz.ipynb 2018-11-07 17:20:52 -08:00
pzhokhov
c14d307834 move viz docs to a notebook entirely (#704)
* viz docs

* writing vizualization docs

* documenting plot_util

* docstrings in plot_util

* autopep8 and flake8

* spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)

* rephrased viz.md a little bit

* more examples of viz code usage in the docs

* replaced vizualization doc with notebook
2018-11-07 17:19:42 -08:00
pzhokhov
0b71d4c6c4 remove unused args of DDPG class (#702) 2018-11-07 17:19:25 -08:00
Peter Zhokhov
021533be6c Merge branch 'master' of github.com:openai/baselines into internal 2018-11-07 16:37:31 -08:00
pzhokhov
7bb405c7a7 Update viz.md 2018-11-07 14:25:35 -08:00
pzhokhov
8b95576a92 more viz + build fixes (#703)
* viz docs

* writing vizualization docs

* documenting plot_util

* docstrings in plot_util

* autopep8 and flake8

* spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)

* rephrased viz.md a little bit

* more examples of viz code usage in the docs
2018-11-06 17:02:20 -08:00
pzhokhov
67a1222267 Merge branch 'master' into internal 2018-11-06 10:26:14 -08:00
Peter Zhokhov
9d4fb76ef0 making num_envs and video length smaller in test_video_recorder to prevent hanging on travis 2018-11-06 09:58:43 -08:00
Peter Zhokhov
664ec6faf0 catch bugfixes in gym 2018-11-05 19:19:39 -08:00
Peter Zhokhov
3917321fbe revert over-spellchecking 2018-11-05 17:00:40 -08:00
coord.e
6e607efa90 Add video recorder (#666)
* Fix: Return the result of rendering from dummyvecenv

* Add: Add a video recorder wrapper for vecenv

* Change: Use VecVideoRecorder with --video_monitor flag

* Change: Overwrite the metadata only when it isn't defined

* Add: Define __del__ to make the file correctly closed in exit

* Fix: Bump epidode_id in reset()

* Fix: Use hasattr to check the existence of .metadata

* Fix: Make directory when it doesn't exist

* Change: Kepp recording for `video_length` steps, then close

Because reset() is not what it is in normal gym.Env

* Add: Enable to specify video_length from command line argument

* Delete: Delete default value, None, of video_callable

* Change: Use self.recorded_frames and self.recording to manage intervals

* Add: Log the status of video recording

* Fix: Fix saving path

* Change: Place metadata in the base VecEnv

* Delete: Delete unused imports

* Fix: epidode_id => step_id

* Fix: Refine the flag name

* Change: Unify the flag name folloing to previous change

* [WIP] Add: Add a test of VecVideoRecorder

* Fix: Use PongNoFrameskip-v0 because SimpleEnv doesn't have render()

* Change; Use TemporaryDirectory

* Fix: minimal successful test

* Add: Test against parallel environments

* Add: Test against different type of VecEnvs

* Change: Test against different length and interval of video capture

* Delete: Reduce the number of tests

* Change: Test if the output video is not empty

* Add: Add some comments

* Fix: Fix the flag name

* Add: Add docstrings

* Fix: Install ffmpeg in testing container for VecVideoRecorder's test

* Fix: Delete unused things

* Fix: Replace `video_callable` with `record_video_trigger`

* Fix: Improve the explanation of `record_video_trigger` argument

* Fix: Close owning vecenv in VecVideoRecorder.close to resolve memory
leak
2018-11-05 14:32:17 -08:00
pzhokhov
c74ce02b9d visualization code docs / bugfixes (#701)
* viz docs

* writing vizualization docs

* documenting plot_util

* docstrings in plot_util

* autopep8 and flake8

* spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)

* rephrased viz.md a little bit
2018-11-05 14:31:15 -08:00
Peter Zhokhov
739ab6fa0e Merge branch 'internal' of github.com:openai/baselines into internal 2018-11-05 14:07:52 -08:00
Peter Zhokhov
6fd2270c47 fixing test failures 2018-10-31 14:11:26 -07:00
Joshua Meier
63151af41a support color vs. grayscale option in WarpFrame wrapper (#166)
* support color vs. grayscale option in WarpFrame wrapper

* Support color in other wrappers

* Updated per Peters suggestions
2018-10-31 14:11:26 -07:00
pzhokhov
e619e42364 match network output with action distribution via a linear layer only if necessary (#167) 2018-10-31 14:11:26 -07:00
Peter Zhokhov
5dbe4c2462 Merge branch 'master' of github.com:openai/baselines into internal 2018-10-31 13:58:29 -07:00
pzhokhov
ab59de6922 mpi-less baselines (#689)
* make baselines run without mpi wip

* squash-merged latest master

* further removing MPI references where unnecessary

* more MPI removal

* syntax and flake8

* MpiAdam becomes regular Adam if Mpi not present

* autopep8

* add assertion to test in mpi_adam; fix trpo_mpi failure without MPI on cartpole

* mpiless ddpg
2018-10-31 11:15:41 -07:00
Peter Zhokhov
5878eb3862 joshim5 changes (width and height to WarpFrame wrapper) 2018-10-30 18:02:03 -07:00
Mathieu Poliquin
a071fa7630 Add retro to ppo2 defaults (#682)
* Adds retro to ppo2 defaults

Created defaults for retro, copied from Atari defaults for now. Tested with SuperMarioBros-Nes

* ppo2 retro defaults to atari
2018-10-30 10:17:46 -07:00
Mathieu Poliquin
637bf55da7 Use deepmind wrapper for retro (#685)
* Use deepmind wrapper for retro

* moved wrap_deepmind_retro after Monitor wrapper
2018-10-30 10:16:15 -07:00
AurelianTactics
165c622572 DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError (#680)
* DDPG has unused 'seed' argument

DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:

```
from baselines.common import set_global_seeds
...
def learn(...):
...
   set_global_seeds(seed)
```

DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.

* DDPG: duplicate variable assignment

variable nb_actions assigned same value twice in space of 10 lines
nb_actions = env.action_space.shape[-1]

* DDPG: noise_type 'normal_x' and 'ou_x' cause assert

noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block:
'''
        if self.action_noise is not None and apply_noise:
            noise = self.action_noise()
            assert noise.shape == action.shape
            action += noise
'''

noise is not nested: [number_of_actions]
actions is nested: [[number_of_actions]]
Can either nest noise or unnest actions

* Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert"

* DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError

noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block:
'''
        if self.action_noise is not None and apply_noise:
            noise = self.action_noise()
            assert noise.shape == action.shape
            action += noise
'''

noise is not nested: [number_of_actions]
action is nested: [[number_of_actions]]
Hence the shapes do not pass the assert line even though the action += noise line is correct
2018-10-30 10:13:39 -07:00
Peter Zhokhov
93c7cc202c Merge branch 'master' of github.com:openai/baselines 2018-10-29 15:25:38 -07:00
Peter Zhokhov
de36116e3b update tensorflow version check regex to parse version like 1.2.3rc4 (previously only 1.2.3-rc4) 2018-10-29 15:25:31 -07:00
Mathieu Poliquin
e2b41828af Set 'cnn' as default network for retro (#683) 2018-10-29 13:30:41 -07:00
pzhokhov
8e56ddeac2 Multidiscrete action space compatibility for policy gradient-based methods (#677)
* multidiscrete space compatibility

* flake8 and syntax
2018-10-24 11:01:59 -07:00
Juliano Laganá
c3bd8cea66 Adds description of param_noise parameter in deepq.learn method (#675) 2018-10-24 10:00:31 -07:00