Commit Graph

258 Commits

Author SHA1 Message Date
Peter Zhokhov
1fc5e137b2 Merge branch 'master' of github.com:openai/baselines into peterz_viz 2018-10-31 12:03:25 -07:00
pzhokhov
ab59de6922 mpi-less baselines (#689)
* make baselines run without mpi wip

* squash-merged latest master

* further removing MPI references where unnecessary

* more MPI removal

* syntax and flake8

* MpiAdam becomes regular Adam if Mpi not present

* autopep8

* add assertion to test in mpi_adam; fix trpo_mpi failure without MPI on cartpole

* mpiless ddpg
2018-10-31 11:15:41 -07:00
Mathieu Poliquin
a071fa7630 Add retro to ppo2 defaults (#682)
* Adds retro to ppo2 defaults

Created defaults for retro, copied from Atari defaults for now. Tested with SuperMarioBros-Nes

* ppo2 retro defaults to atari
2018-10-30 10:17:46 -07:00
Mathieu Poliquin
637bf55da7 Use deepmind wrapper for retro (#685)
* Use deepmind wrapper for retro

* moved wrap_deepmind_retro after Monitor wrapper
2018-10-30 10:16:15 -07:00
AurelianTactics
165c622572 DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError (#680)
* DDPG has unused 'seed' argument

DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:

```
from baselines.common import set_global_seeds
...
def learn(...):
...
   set_global_seeds(seed)
```

DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.

* DDPG: duplicate variable assignment

variable nb_actions assigned same value twice in space of 10 lines
nb_actions = env.action_space.shape[-1]

* DDPG: noise_type 'normal_x' and 'ou_x' cause assert

noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block:
'''
        if self.action_noise is not None and apply_noise:
            noise = self.action_noise()
            assert noise.shape == action.shape
            action += noise
'''

noise is not nested: [number_of_actions]
actions is nested: [[number_of_actions]]
Can either nest noise or unnest actions

* Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert"

* DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError

noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block:
'''
        if self.action_noise is not None and apply_noise:
            noise = self.action_noise()
            assert noise.shape == action.shape
            action += noise
'''

noise is not nested: [number_of_actions]
action is nested: [[number_of_actions]]
Hence the shapes do not pass the assert line even though the action += noise line is correct
2018-10-30 10:13:39 -07:00
Peter Zhokhov
6c194a8b15 documenting plot_util 2018-10-30 09:45:51 -07:00
Peter Zhokhov
0d0701f594 writing vizualization docs 2018-10-29 16:15:42 -07:00
Peter Zhokhov
be433fdb83 viz docs 2018-10-29 15:53:50 -07:00
Peter Zhokhov
93c7cc202c Merge branch 'master' of github.com:openai/baselines 2018-10-29 15:25:38 -07:00
Peter Zhokhov
de36116e3b update tensorflow version check regex to parse version like 1.2.3rc4 (previously only 1.2.3-rc4) 2018-10-29 15:25:31 -07:00
Mathieu Poliquin
e2b41828af Set 'cnn' as default network for retro (#683) 2018-10-29 13:30:41 -07:00
pzhokhov
8e56ddeac2 Multidiscrete action space compatibility for policy gradient-based methods (#677)
* multidiscrete space compatibility

* flake8 and syntax
2018-10-24 11:01:59 -07:00
Juliano Laganá
c3bd8cea66 Adds description of param_noise parameter in deepq.learn method (#675) 2018-10-24 10:00:31 -07:00
AurelianTactics
84ea7aa1fd DDPG has unused 'seed' argument (#676)
DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for:

```
from baselines.common import set_global_seeds
...
def learn(...):
...
   set_global_seeds(seed)
```

DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds.
2018-10-24 09:59:46 -07:00
Peter Zhokhov
88300ed54c fix raise NotImplemented() complaints of latest flake8 2018-10-24 09:57:57 -07:00
pzhokhov
583ba082a2 Update cmd_util.py 2018-10-23 11:22:27 -07:00
pzhokhov
014a5597b1 refactor ACER (#664)
* make acer use vecframestack

* acer passes mnist test with 20k steps

* acer with non-image observations and tests

* flake8

* test acer serialization with non-recurrent policies
2018-10-23 10:01:25 -07:00
Isaac Poulton
4ed1350326 Fixed TypeError on creating atari vec envs (#671) 2018-10-23 10:00:09 -07:00
Rishabh Jangir
8513d73355 HER : new functionality, enables demo based training (#474)
* Add, initialize, normalize and sample from a demo buffer

* Modify losses and add cloning loss

* Add demo file parameter to train.py

* Introduce new params in config.py for demo based training

* Change logger.warning to logger.warn in rollout.py;bug

* Add data generation file for Fetch environments

* Update README file
2018-10-22 19:04:40 -07:00
Xingdong Zuo
c28acb2203 [Clean-up]: delete running_stat and filters as they are replaced by running_mean_std and not used anymore (#614)
* Delete filters.py

* Delete running_stat.py
2018-10-22 19:01:26 -07:00
pzhokhov
c5d9c4a1b2 wrap retro envs correctly for other (non-deepq) algorithms (#669)
* wrap retro envs correctly for other (non-deepq) algorithms

* flake and csh comments

* flake and csh comments
2018-10-22 18:36:39 -07:00
pzhokhov
c0fa11a3a7 minor fixes from internal (#665)
* sync internal changes. Make ddpg work with vecenvs

* B -> nenvs for consistency with other algos, small cleanups

* eval_done[d]==True -> eval_done[d]

* flake8 and numpy.random.random_integers deprecation warning

* Merge branch 'master' of github.com:openai/games into peterz_track_baselines_branch
2018-10-22 09:15:04 -07:00
Peter Zhokhov
bd390c2ade updated docstring for deepq 2018-10-19 17:50:54 -07:00
pzhokhov
d0cc325e14 store session at policy creation time (#655)
* sync internal changes. Make ddpg work with vecenvs

* B -> nenvs for consistency with other algos, small cleanups

* eval_done[d]==True -> eval_done[d]

* flake8 and numpy.random.random_integers deprecation warning

* store session at policy creation time

* coexistence tests

* fix a typo

* autopep8

* ... and flake8

* updated todo links in test_serialization
2018-10-19 08:54:21 -07:00
pzhokhov
fc7f9cec49 disable gym subpackages in setup.py (#661)
* disable gym subpackages in setup.py

* include gym[atari] in test requirements

* gym[atari] -> atari-py in test requirements
2018-10-18 16:07:14 -07:00
Matthew Rahtz
3677dc1b23 Set allow_growth=True for MuJoCo session (#643) 2018-10-18 13:54:39 -07:00
Matthew Rahtz
ef96f3835b Drop S and M args so that --play works (#636) 2018-10-16 16:28:23 -07:00
pzhokhov
a03dacd68d sync internal changes. Make ddpg work with vecenvs (#654)
* sync internal changes. Make ddpg work with vecenvs

* B -> nenvs for consistency with other algos, small cleanups

* eval_done[d]==True -> eval_done[d]

* flake8 and numpy.random.random_integers deprecation warning
2018-10-16 16:26:46 -07:00
Tianhong Dai
e57f81becc revise the readme of ddpg (#653) 2018-10-16 16:22:06 -07:00
Peter Zhokhov
28aca637d0 update benchmark results 2018-10-09 09:48:31 -07:00
Erik Doffagne
7bfbcf177e Fixed typos in README (#635) 2018-10-04 10:31:22 -07:00
pzhokhov
394339deb5 Update README.md 2018-10-03 20:53:58 -07:00
pzhokhov
10c205c159 Debug codegen ppo (#123)
* disabled tests, running benchmarks only

* dummy commit to RUN BENCHMARKS

* benchmark ppo_metal; disable all but Bullet benchmarks

* ppo2, codegen ppo and ppo_metal on Bullet RUN BENCHMARKS

* run benchmarks on Roboschool instead RUN BENCHMARKS

* run ppo_metal on Roboschool as well RUN BENCHMARKS

* install roboschool in cron rcall user_config

* dummy commit to RUN BENCHMARKS

* import roboschool in codegen/contcontrol_prob.py RUN BENCHMARKS

* re-enable tests, flake8

* get entropy from a distribution in Pred RUN BENCHMARKS

* gin for hyperparameter injection; try codegen ppo close to baselines ppo RUN BENCHMARKS

* provide default value for cg2/bmv_net_ops.py

* dummy commit to RUN BENCHMARKS

* make tests and benchmarks parallel; use relative path to gin file for rcall compatibility RUN BENCHMARKS

* syntax error in run-benchmarks-new.py RUN BENCHMARKS

* syntax error in run-benchmarks-new.py RUN BENCHMARKS

* path relative to codegen/training for gin files RUN BENCHMARKS

* another reconcilliation attempt between codegen ppo and baselines ppo RUN BENCHMARKS

* value_network=copy for ppo2 on roboschool RUN BENCHMARKS

* make None seed work with torch seeding RUN BENCHMARKS

* try sequential batches with ppo2 RUN BENCHMARKS

* try ppo without advantage normalization RUN BENCHMARKS

* use Distribution to compute ema NLL RUN BENCHMARKS

* autopep8

* clip gradient norm in algo_agent RUN BENCHMARKS

* try ppo2 without vfloss clipping RUN BENCHMARKS

* trying with gamma=0.0 - assumption is, both algos should be equally bad RUN BENCHMARKS

* set gamma=0 in ppo2 RUN BENCHMARKS

* try with ppo2 with single minibatch RUN BENCHMARKS

* try with nminibatches=4, value_network=copy RUN BENCHMARKS

* try with nminibatches=1 take two RUN BENCHMARKS

* try initialization for vf=0.01 RUN BENCHMARKS

* fix the problem with min_istart >= max_istart

* i have no idea RUN BENCHMARKS

* fix non-shared variance between old and new RUN BENCHMARKS

* restored baselines.common.policies

* 16 minibatches in ppo_roboschool.gin

* fixing results of merge

* cleanups

* cleanups

* fix run-benchmarks-new RUN BENCHMARKS Roboschool8M

* fix syntax in run-benchmarks-new RUN BENCHMARKS Roboschool8M

* fix test failures

* moved gin requirement to codegen/setup.py

* remove duplicated build_softq in get_algo.py

* linting

* run softq on continuous action spaces RUN BENCHMARKS Roboschool8M
2018-10-03 14:38:32 -07:00
pzhokhov
62fe7c4717 disable async acktr (#129)
* disable async acktr

* linting

* linting

* linting
2018-10-03 14:38:32 -07:00
Xingyou Song
fbdf55ffee Xsong lqr ddpg (#125)
* allows vec_envs to work

* allows vec_envs to work

* fixed branch with correct ddpg

* running experiments jointly now

* changed to subproc

* changed to subproc

* changed to subproc

* small fix md

* removed placeholder

* removed placeholder

* added ppotest

* probably fixed ddpg hyperparam issues

* checkpoint

* edited readme

* added orthogonal

* added orthogonal

* added ddpg-vecenv

* reverted ddpg to old baselines
2018-10-03 14:38:32 -07:00
Christopher Hesse
9ee804c384 minor change to install.py and baselines run.py (#121) 2018-10-03 14:38:32 -07:00
John Schulman
4cf7dc9644 Big refactor (#124)
* massive revision inspired by soup: algo folder works

* porting rl commands, WIP

* various

* git subrepo push --remote=git@github.com:openai/codegen.git --branch=refactor codegen

subrepo:
  subdir:   "codegen"
  merged:   "aa27e069"
upstream:
  origin:   "git@github.com:openai/codegen.git"
  branch:   "refactor"
  commit:   "aa27e069"
git-subrepo:
  version:  "0.4.0"
  origin:   "git@github.com:ingydotnet/git-subrepo.git"
  commit:   "74339e8"

* various

* rewrite RL stuff in new framework

* fix almost everything

* woohoo tests pass

* more tests

* reformatting

* fixes

* write tests for embeddings

* re-remove cg2

* pylint

* minor

* move smooth_helpers import; seems to cause nondeterministic failure in parallel pytest
2018-10-03 14:38:32 -07:00
Xingyou Song
e820b86fdc ppo2 now has eval stats (#120)
* ppo2 now has eval stats

* fixed spaces

* fixed kwargs ordering

* whitespace fix
2018-10-03 14:38:32 -07:00
pzhokhov
858afa8d7e Refactor DDPG (#111)
* run ddpg on Mujoco benchmark RUN BENCHMARKS

* autopep8

* fixed all syntax in refactored ddpg

* a little bit more refactoring

* autopep8

* identity test with ddpg WIP

* enable test_identity with ddpg

* refactored ddpg RUN BENCHMARKS

* autopep8

* include ddpg into style check

* fixing tests RUN BENCHMARKS

* set default seed to None RUN BENCHMARKS

* run tests and benchmarks in separate buildkite steps RUN BENCHMARKS

* cleanup pdb usage

* flake8 and cleanups

* re-enabled all benchmarks in run-benchmarks-new.py

* flake8 complaints

* deepq model builder compatible with network functions returning single tensor

* remove ddpg test with test_discrete_identity

* make ppo_metal use make_vec_env instead of make_atari_env

* make ppo_metal use make_vec_env instead of make_atari_env

* fixed syntax in ppo_metal.run_atari
2018-10-03 14:38:32 -07:00
pzhokhov
4121d9c1a8 fix DQN learning bug (#632)
* Update run.py

* Update utils.py

* Update utils.py
2018-10-03 14:37:40 -07:00
Peter Zhokhov
34ae3194b4 add a note about DQN algorithms not performing well 2018-09-27 12:51:43 -07:00
Thomas Simonini
4402b8eba6 Updated A2C and PPO2 comments (#612)
* Updated A2C and PPO2 comments

* Fixed format errors to respect PEP 8 style guide
2018-09-24 09:54:41 -07:00
ahuhn
555a5cbbb2 Adding num_env to readme example (#609)
* Adding num_env to readme example

* Updated readme example fix
2018-09-21 17:22:56 -07:00
Thomas Simonini
8158f35611 Wrote some comments to explain the A2C and PPO2 implementation (#607)
* added comments in A2C and PPO2

* Fixed format errors to respect PEP 8 style guide
2018-09-21 13:12:31 -07:00
cclauss
a7fd8a4477 Run flake8 to find syntax errors and undefined names (#439)
__E901,E999,F821,F822,F823__ are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. The other flake8 issues are merely "style violations" -- useful for readability but they do not effect runtime safety.  This PR therefore recommends a flake8 run of those tests on the entire codebase.
* F821: undefined name `name`
* F822: undefined name `name` in `__all__`
* F823: local variable `name` referenced before assignment
* E901: SyntaxError or IndentationError
* E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree
2018-09-20 16:40:03 -07:00
John Schulman
e791565a60 Codegen more abstract abstract classes 3a (#106)
* Soup code, arch search on CIFAR-10

* Oh I understood how choice_sequence() worked

* Undo some pointless changes

* Some beautification 1

* Some beautification 2

* An attempt to debug test_get_algo_outputs() number 70, unsuccessful.

* Code style warning

* Code style warnings, more

* wip

* wip

* wip

* fix almost everything; soup machine still broken

* revert mpi_eda changes

* minor fixes
2018-09-20 16:19:07 -07:00
XFFXFF
7859f603cd prioritized experience replay bug (#527) 2018-09-20 16:16:44 -07:00
pzhokhov
0f4ae2fb2a refactor acktr (#560)
* refactor acktr

* setup.cfg now tests style/syntax in acktr as well

* flake8 complaints

* added note about continuous action spaces for acktr into the README.md
2018-09-20 16:05:26 -07:00
pzhokhov
0e7048b89f Update README.md 2018-09-19 15:04:54 -07:00
pzhokhov
75983bab64 Update README.md 2018-09-19 15:04:01 -07:00