Commit Graph

273 Commits

Author SHA1 Message Date
Peter Zhokhov
87b3a04a38 autopep8 2018-11-14 12:16:53 -08:00
Brent Komer
c5b1a1b643 typo fix (#230) 2018-11-13 13:08:32 -08:00
JohannesAck
c59a10947d Parameter documentation for tf_util.function (#349)
* Added parameter documentation

This parameter was thus far not documented and is non-intuitive when unfamiliar with tf.

* Added parameter documentation
2018-11-13 13:03:48 -08:00
James Alan Preiss
5cd66010dc case-insensitive sort for human-readable logger (#289) 2018-11-13 11:09:11 -08:00
Xiaoquan Kong
0a13da8dfe Change variable name from inpt to input_ (#297) 2018-11-13 11:08:21 -08:00
Vladislav Zavadskyy
18b6390be6 Typo fix (#287) 2018-11-13 11:03:55 -08:00
pzhokhov
52255beda5 microbatches in ppo2, custom frame size in WarpFrame, matching fc layer only when needed (#707)
* joshim5 changes (width and height to WarpFrame wrapper)

* match network output with action distribution via a linear layer only if necessary (#167)

* support color vs. grayscale option in WarpFrame wrapper (#166)

* support color vs. grayscale option in WarpFrame wrapper

* Support color in other wrappers

* Updated per Peters suggestions

* fixing test failures

* ppo2 with microbatches (#168)

* pass microbatch_size to the model during construction

* microbatch fixes and test (#169)

* microbatch fixes and test

* tiny cleanup

* added assertions to the test

* vpg-related fix

* Peterz joshim5 subclass ppo2 model (#170)

* microbatch fixes and test

* tiny cleanup

* added assertions to the test

* vpg-related fix

* subclassing the model to make microbatched version of model WIP

* made microbatched model a subclass of ppo2 Model

* flake8 complaint

* mpi-less ppo2 (resolving merge conflict)

* flake8 and mpi4py imports in ppo2/model.py

* more un-mpying
2018-11-09 11:18:05 -08:00
AurelianTactics
d80acbb4d1 Removing print spam from Wrapper (#705)
* DDPG has unused 'seed' argument

DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:

```
from baselines.common import set_global_seeds
...
def learn(...):
...
   set_global_seeds(seed)
```

DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.

* DDPG: duplicate variable assignment

variable nb_actions assigned same value twice in space of 10 lines
nb_actions = env.action_space.shape[-1]

* DDPG: noise_type 'normal_x' and 'ou_x' cause assert

noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block:
'''
        if self.action_noise is not None and apply_noise:
            noise = self.action_noise()
            assert noise.shape == action.shape
            action += noise
'''

noise is not nested: [number_of_actions]
actions is nested: [[number_of_actions]]
Can either nest noise or unnest actions

* Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert"

* DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError

noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block:
'''
        if self.action_noise is not None and apply_noise:
            noise = self.action_noise()
            assert noise.shape == action.shape
            action += noise
'''

noise is not nested: [number_of_actions]
action is nested: [[number_of_actions]]
Hence the shapes do not pass the assert line even though the action += noise line is correct

* Removing Print Spam from Wrapper

Prints a line every time a video is saved or not saved. Seems unnecessary.
2018-11-08 10:13:07 -08:00
pzhokhov
556b198454 Internal minifixes (#694)
* joshim5 changes (width and height to WarpFrame wrapper)

* match network output with action distribution via a linear layer only if necessary (#167)

* support color vs. grayscale option in WarpFrame wrapper (#166)

* support color vs. grayscale option in WarpFrame wrapper

* Support color in other wrappers

* Updated per Peters suggestions

* fixing test failures
2018-11-08 10:11:45 -08:00
pzhokhov
cc88804042 Update viz.ipynb 2018-11-07 17:20:52 -08:00
pzhokhov
c14d307834 move viz docs to a notebook entirely (#704)
* viz docs

* writing vizualization docs

* documenting plot_util

* docstrings in plot_util

* autopep8 and flake8

* spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)

* rephrased viz.md a little bit

* more examples of viz code usage in the docs

* replaced vizualization doc with notebook
2018-11-07 17:19:42 -08:00
pzhokhov
0b71d4c6c4 remove unused args of DDPG class (#702) 2018-11-07 17:19:25 -08:00
pzhokhov
7bb405c7a7 Update viz.md 2018-11-07 14:25:35 -08:00
pzhokhov
8b95576a92 more viz + build fixes (#703)
* viz docs

* writing vizualization docs

* documenting plot_util

* docstrings in plot_util

* autopep8 and flake8

* spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)

* rephrased viz.md a little bit

* more examples of viz code usage in the docs
2018-11-06 17:02:20 -08:00
Peter Zhokhov
9d4fb76ef0 making num_envs and video length smaller in test_video_recorder to prevent hanging on travis 2018-11-06 09:58:43 -08:00
Peter Zhokhov
664ec6faf0 catch bugfixes in gym 2018-11-05 19:19:39 -08:00
Peter Zhokhov
3917321fbe revert over-spellchecking 2018-11-05 17:00:40 -08:00
coord.e
6e607efa90 Add video recorder (#666)
* Fix: Return the result of rendering from dummyvecenv

* Add: Add a video recorder wrapper for vecenv

* Change: Use VecVideoRecorder with --video_monitor flag

* Change: Overwrite the metadata only when it isn't defined

* Add: Define __del__ to make the file correctly closed in exit

* Fix: Bump epidode_id in reset()

* Fix: Use hasattr to check the existence of .metadata

* Fix: Make directory when it doesn't exist

* Change: Kepp recording for `video_length` steps, then close

Because reset() is not what it is in normal gym.Env

* Add: Enable to specify video_length from command line argument

* Delete: Delete default value, None, of video_callable

* Change: Use self.recorded_frames and self.recording to manage intervals

* Add: Log the status of video recording

* Fix: Fix saving path

* Change: Place metadata in the base VecEnv

* Delete: Delete unused imports

* Fix: epidode_id => step_id

* Fix: Refine the flag name

* Change: Unify the flag name folloing to previous change

* [WIP] Add: Add a test of VecVideoRecorder

* Fix: Use PongNoFrameskip-v0 because SimpleEnv doesn't have render()

* Change; Use TemporaryDirectory

* Fix: minimal successful test

* Add: Test against parallel environments

* Add: Test against different type of VecEnvs

* Change: Test against different length and interval of video capture

* Delete: Reduce the number of tests

* Change: Test if the output video is not empty

* Add: Add some comments

* Fix: Fix the flag name

* Add: Add docstrings

* Fix: Install ffmpeg in testing container for VecVideoRecorder's test

* Fix: Delete unused things

* Fix: Replace `video_callable` with `record_video_trigger`

* Fix: Improve the explanation of `record_video_trigger` argument

* Fix: Close owning vecenv in VecVideoRecorder.close to resolve memory
leak
2018-11-05 14:32:17 -08:00
pzhokhov
c74ce02b9d visualization code docs / bugfixes (#701)
* viz docs

* writing vizualization docs

* documenting plot_util

* docstrings in plot_util

* autopep8 and flake8

* spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)

* rephrased viz.md a little bit
2018-11-05 14:31:15 -08:00
pzhokhov
ab59de6922 mpi-less baselines (#689)
* make baselines run without mpi wip

* squash-merged latest master

* further removing MPI references where unnecessary

* more MPI removal

* syntax and flake8

* MpiAdam becomes regular Adam if Mpi not present

* autopep8

* add assertion to test in mpi_adam; fix trpo_mpi failure without MPI on cartpole

* mpiless ddpg
2018-10-31 11:15:41 -07:00
Mathieu Poliquin
a071fa7630 Add retro to ppo2 defaults (#682)
* Adds retro to ppo2 defaults

Created defaults for retro, copied from Atari defaults for now. Tested with SuperMarioBros-Nes

* ppo2 retro defaults to atari
2018-10-30 10:17:46 -07:00
Mathieu Poliquin
637bf55da7 Use deepmind wrapper for retro (#685)
* Use deepmind wrapper for retro

* moved wrap_deepmind_retro after Monitor wrapper
2018-10-30 10:16:15 -07:00
AurelianTactics
165c622572 DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError (#680)
* DDPG has unused 'seed' argument

DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:

```
from baselines.common import set_global_seeds
...
def learn(...):
...
   set_global_seeds(seed)
```

DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.

* DDPG: duplicate variable assignment

variable nb_actions assigned same value twice in space of 10 lines
nb_actions = env.action_space.shape[-1]

* DDPG: noise_type 'normal_x' and 'ou_x' cause assert

noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block:
'''
        if self.action_noise is not None and apply_noise:
            noise = self.action_noise()
            assert noise.shape == action.shape
            action += noise
'''

noise is not nested: [number_of_actions]
actions is nested: [[number_of_actions]]
Can either nest noise or unnest actions

* Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert"

* DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError

noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block:
'''
        if self.action_noise is not None and apply_noise:
            noise = self.action_noise()
            assert noise.shape == action.shape
            action += noise
'''

noise is not nested: [number_of_actions]
action is nested: [[number_of_actions]]
Hence the shapes do not pass the assert line even though the action += noise line is correct
2018-10-30 10:13:39 -07:00
Peter Zhokhov
93c7cc202c Merge branch 'master' of github.com:openai/baselines 2018-10-29 15:25:38 -07:00
Peter Zhokhov
de36116e3b update tensorflow version check regex to parse version like 1.2.3rc4 (previously only 1.2.3-rc4) 2018-10-29 15:25:31 -07:00
Mathieu Poliquin
e2b41828af Set 'cnn' as default network for retro (#683) 2018-10-29 13:30:41 -07:00
pzhokhov
8e56ddeac2 Multidiscrete action space compatibility for policy gradient-based methods (#677)
* multidiscrete space compatibility

* flake8 and syntax
2018-10-24 11:01:59 -07:00
Juliano Laganá
c3bd8cea66 Adds description of param_noise parameter in deepq.learn method (#675) 2018-10-24 10:00:31 -07:00
AurelianTactics
84ea7aa1fd DDPG has unused 'seed' argument (#676)
DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:

```
from baselines.common import set_global_seeds
...
def learn(...):
...
   set_global_seeds(seed)
```

DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.
2018-10-24 09:59:46 -07:00
Peter Zhokhov
88300ed54c fix raise NotImplemented() complaints of latest flake8 2018-10-24 09:57:57 -07:00
pzhokhov
583ba082a2 Update cmd_util.py 2018-10-23 11:22:27 -07:00
pzhokhov
014a5597b1 refactor ACER (#664)
* make acer use vecframestack

* acer passes mnist test with 20k steps

* acer with non-image observations and tests

* flake8

* test acer serialization with non-recurrent policies
2018-10-23 10:01:25 -07:00
Isaac Poulton
4ed1350326 Fixed TypeError on creating atari vec envs (#671) 2018-10-23 10:00:09 -07:00
Rishabh Jangir
8513d73355 HER : new functionality, enables demo based training (#474)
* Add, initialize, normalize and sample from a demo buffer

* Modify losses and add cloning loss

* Add demo file parameter to train.py

* Introduce new params in config.py for demo based training

* Change logger.warning to logger.warn in rollout.py;bug

* Add data generation file for Fetch environments

* Update README file
2018-10-22 19:04:40 -07:00
Xingdong Zuo
c28acb2203 [Clean-up]: delete running_stat and filters as they are replaced by running_mean_std and not used anymore (#614)
* Delete filters.py

* Delete running_stat.py
2018-10-22 19:01:26 -07:00
pzhokhov
c5d9c4a1b2 wrap retro envs correctly for other (non-deepq) algorithms (#669)
* wrap retro envs correctly for other (non-deepq) algorithms

* flake and csh comments

* flake and csh comments
2018-10-22 18:36:39 -07:00
pzhokhov
c0fa11a3a7 minor fixes from internal (#665)
* sync internal changes. Make ddpg work with vecenvs

* B -> nenvs for consistency with other algos, small cleanups

* eval_done[d]==True -> eval_done[d]

* flake8 and numpy.random.random_integers deprecation warning

* Merge branch 'master' of github.com:openai/games into peterz_track_baselines_branch
2018-10-22 09:15:04 -07:00
Peter Zhokhov
bd390c2ade updated docstring for deepq 2018-10-19 17:50:54 -07:00
pzhokhov
d0cc325e14 store session at policy creation time (#655)
* sync internal changes. Make ddpg work with vecenvs

* B -> nenvs for consistency with other algos, small cleanups

* eval_done[d]==True -> eval_done[d]

* flake8 and numpy.random.random_integers deprecation warning

* store session at policy creation time

* coexistence tests

* fix a typo

* autopep8

* ... and flake8

* updated todo links in test_serialization
2018-10-19 08:54:21 -07:00
pzhokhov
fc7f9cec49 disable gym subpackages in setup.py (#661)
* disable gym subpackages in setup.py

* include gym[atari] in test requirements

* gym[atari] -> atari-py in test requirements
2018-10-18 16:07:14 -07:00
Matthew Rahtz
3677dc1b23 Set allow_growth=True for MuJoCo session (#643) 2018-10-18 13:54:39 -07:00
Matthew Rahtz
ef96f3835b Drop S and M args so that --play works (#636) 2018-10-16 16:28:23 -07:00
pzhokhov
a03dacd68d sync internal changes. Make ddpg work with vecenvs (#654)
* sync internal changes. Make ddpg work with vecenvs

* B -> nenvs for consistency with other algos, small cleanups

* eval_done[d]==True -> eval_done[d]

* flake8 and numpy.random.random_integers deprecation warning
2018-10-16 16:26:46 -07:00
Tianhong Dai
e57f81becc revise the readme of ddpg (#653) 2018-10-16 16:22:06 -07:00
Peter Zhokhov
28aca637d0 update benchmark results 2018-10-09 09:48:31 -07:00
Erik Doffagne
7bfbcf177e Fixed typos in README (#635) 2018-10-04 10:31:22 -07:00
pzhokhov
394339deb5 Update README.md 2018-10-03 20:53:58 -07:00
pzhokhov
10c205c159 Debug codegen ppo (#123)
* disabled tests, running benchmarks only

* dummy commit to RUN BENCHMARKS

* benchmark ppo_metal; disable all but Bullet benchmarks

* ppo2, codegen ppo and ppo_metal on Bullet RUN BENCHMARKS

* run benchmarks on Roboschool instead RUN BENCHMARKS

* run ppo_metal on Roboschool as well RUN BENCHMARKS

* install roboschool in cron rcall user_config

* dummy commit to RUN BENCHMARKS

* import roboschool in codegen/contcontrol_prob.py RUN BENCHMARKS

* re-enable tests, flake8

* get entropy from a distribution in Pred RUN BENCHMARKS

* gin for hyperparameter injection; try codegen ppo close to baselines ppo RUN BENCHMARKS

* provide default value for cg2/bmv_net_ops.py

* dummy commit to RUN BENCHMARKS

* make tests and benchmarks parallel; use relative path to gin file for rcall compatibility RUN BENCHMARKS

* syntax error in run-benchmarks-new.py RUN BENCHMARKS

* syntax error in run-benchmarks-new.py RUN BENCHMARKS

* path relative to codegen/training for gin files RUN BENCHMARKS

* another reconcilliation attempt between codegen ppo and baselines ppo RUN BENCHMARKS

* value_network=copy for ppo2 on roboschool RUN BENCHMARKS

* make None seed work with torch seeding RUN BENCHMARKS

* try sequential batches with ppo2 RUN BENCHMARKS

* try ppo without advantage normalization RUN BENCHMARKS

* use Distribution to compute ema NLL RUN BENCHMARKS

* autopep8

* clip gradient norm in algo_agent RUN BENCHMARKS

* try ppo2 without vfloss clipping RUN BENCHMARKS

* trying with gamma=0.0 - assumption is, both algos should be equally bad RUN BENCHMARKS

* set gamma=0 in ppo2 RUN BENCHMARKS

* try with ppo2 with single minibatch RUN BENCHMARKS

* try with nminibatches=4, value_network=copy RUN BENCHMARKS

* try with nminibatches=1 take two RUN BENCHMARKS

* try initialization for vf=0.01 RUN BENCHMARKS

* fix the problem with min_istart >= max_istart

* i have no idea RUN BENCHMARKS

* fix non-shared variance between old and new RUN BENCHMARKS

* restored baselines.common.policies

* 16 minibatches in ppo_roboschool.gin

* fixing results of merge

* cleanups

* cleanups

* fix run-benchmarks-new RUN BENCHMARKS Roboschool8M

* fix syntax in run-benchmarks-new RUN BENCHMARKS Roboschool8M

* fix test failures

* moved gin requirement to codegen/setup.py

* remove duplicated build_softq in get_algo.py

* linting

* run softq on continuous action spaces RUN BENCHMARKS Roboschool8M
2018-10-03 14:38:32 -07:00
pzhokhov
62fe7c4717 disable async acktr (#129)
* disable async acktr

* linting

* linting

* linting
2018-10-03 14:38:32 -07:00
Xingyou Song
fbdf55ffee Xsong lqr ddpg (#125)
* allows vec_envs to work

* allows vec_envs to work

* fixed branch with correct ddpg

* running experiments jointly now

* changed to subproc

* changed to subproc

* changed to subproc

* small fix md

* removed placeholder

* removed placeholder

* added ppotest

* probably fixed ddpg hyperparam issues

* checkpoint

* edited readme

* added orthogonal

* added orthogonal

* added ddpg-vecenv

* reverted ddpg to old baselines
2018-10-03 14:38:32 -07:00