Compare commits

..

78 Commits

Author SHA1 Message Date
Peter Zhokhov
1ab9fae0b5 test fixes 2019-05-03 16:36:03 -07:00
Peter Zhokhov
75200671c4 fix tests - add matplotlib to setup_requires, put mpi4py import in try-except 2019-05-03 16:29:10 -07:00
Peter Zhokhov
46fa1b6453 merge master 2019-05-03 15:57:31 -07:00
Peter Zhokhov
a1a9bd6174 Merge branch 'internal' of github.com:openai/baselines into internal 2019-05-03 15:56:04 -07:00
John Schulman
ef7ac116cb (onp, np) -> (np, jp), switch jax code to use mark_slow decorator (#363)
switch to mark_slow decorator
2019-05-03 15:54:27 -07:00
pzhokhov
1fa6ac38f1 JRL PPO test with delayed identity env (#355)
* add a custom delay to identity_env

* min reward 0.8 in delayed identity test

* seed the tests, perfect score on delayed_identity_test

* delay=1 in delayed_identity_test

* flake8 complaints

* increased number of steps in fixed_seq_test

* seed identity tests to ensure reproducibility

* docstrings
2019-05-03 15:54:26 -07:00
Karl Cobbe
07536451ee Procgen fixes (#352)
* tweak

* documentation

* rely on log_comm, remove mpi averaging from wrappers

* pass comm for ppo2 initialization

* ppo2 logging

* experiment tweaks

* auto launch tensorboard when using local backend

* graph tweaks

* pass caller to config

* configure logger and tensorboard

* make parent dir if necessary

* parentdir tweak
2019-05-03 15:54:26 -07:00
Greg Brockman
64dfabb8eb Add initializer for process-level setup in SubprocVecEnv (#276)
* Add initializer for process-level setup in SubprocVecEnv

Use case: run logger.configure() in each subprocess

* Add option to force dummy vec env
2019-05-03 15:54:26 -07:00
John Schulman
f5daca8c22 delete unnecessary stuff (#338) 2019-05-03 15:54:25 -07:00
pzhokhov
8e0282ee94 ci/runtests.sh - pass all folders to pytest (#342)
* ci/runtests.sh - pass all folders to pytest

* mpi_optimizer_test precision 1e-4

* fixes to tests

* search for tests in the entire jax folder, also remove unnecessary humor
2019-05-03 15:54:25 -07:00
Karl Cobbe
ddcab1606d Procgen Benchmark Updates (#328)
* directory cleanup

* logging, num_experiments

* fixes

* cleanup

* gin fixes

* fix local max gpu

* resid nx

* tweak

* num machines and download params

* rename

* cleanup

* create workbench

* more reorg

* fix

* more logging wrappers

* lint fix

* restore train procgen

* restore train procgen

* pylint fix

* better wrapping

* whackamole walls

* config sweep

* tweak

* args sweep

* tweak

* test workers

* mpi_weight

* train test comm and high difficulty fix

* enjoy show returns

* better joint training

* tweak

* Add —update to args and add gin-config to requirements.txt

* add username to download_file

* removing gin, procgen_parser

* removing gin

* procgen args

* config fixes

* cleanup

* cleanup

* procgen args fix

* fix

* rcall syncing

* lint

* rename mpi_weight

* begin composable game

* more composable game

* tweak

* background alpha

* use username for sync

* fixes

* microbatch fix

* lure composable game

* merge

* proc trans update

* proc trans update (#307)

* finetuning experiment

* Change is_local to use `use_rcall` and fix error of `enjoy.py` with multiple ends

* graphing help

* add --local

* change args_dict['env_name'] to ENV_NAME

* finetune experiments

* tweak

* tweak

* reorg wrappers, remove is_local

* workdir/local fixes

* move finetune experiments

* default dir and graphing

* more graphing

* fix

* pooled syncing

* tweaks

* dir fix

* tweak

* wrapper mpi fix

* wind and turrets

* composability cleanup

* radius cleanup

* composable reorg

* laser gates

* composable tweaks

* soft walls

* tweak

* begin swamp

* more swamp

* more swamp

* fix

* hidden mines

* use maze layout

* tweak

* laser gate tweaks

* tweaks

* tweaks

* lure/propel updates

* composable midnight

* composable coinmaze

* composability difficulty

* tweak

* add step to save_params

* composable offsets

* composable boxpush

* composable combiner

* tweak

* tweak

* always choose correct number of mechanics

* fix

* rcall local fix

* add steps when dump and save parmas

* loading rank 1,2,3.. error fix

* add experiments.py

* fix loading latest weight with no -rest

* support more complex run_id and add more examples

* fix typo

* move post_run_id into experiments.py

* add hp_search example

* error fix

* joint experiments in progress

* joint hp finished

* typo

* error fix

* edit experiments

* Save experiments set up in code and  save weights per step (#319)

* add step to save_params

* add steps when dump and save parmas

* loading rank 1,2,3.. error fix

* add experiments.py

* fix loading latest weight with no -rest

* support more complex run_id and add more examples

* fix typo

* move post_run_id into experiments.py

* add hp_search example

* error fix

* joint experiments in progress

* joint hp finished

* typo

* error fix

* edit experiments

* tweaks

* graph exp WIP

* depth tweaks

* move save_all

* fix

* restore_dir name

* restore depth

* choose max mechanics

* use override mode

* tweak frogger

* lstm default

* fix

* patience is composable

* hunter is composable

* fixed asset seed cleanup

* minesweeper is composable

* eggcatch is composable

* tweak

* applesort is composable

* chaser game

* begin lighter

* lighter game

* tractor game

* boxgather game

* plumber game

* hitcher game

* doorbell game

* lawnmower game

* connecter game

* cannonaim

* outrun game

* encircle game

* spinner game

* tweak

* tweak

* detonator game

* driller

* driller

* mixer

* conveyor

* conveyor game

* joint pcg experiments

* fixes

* pcg sweep experiment

* cannonaim fix

* combiner fix

* store save time

* laseraim fix

* lightup fix

* detonator tweaks

* detonator fixes

* driller fix

* lawnmower calibration

* spinner calibration

* propel fix

* train experiment

* print load time

* system independent hashing

* remove gin configurable

* task ids fix

* test_pcg experiment

* connecter dense reward

* hard_pcg

* num train comms

* mpi splits envs

* tweaks

* tweaks

* graph tweaks

* graph tweaks

* lint fix

* fix tests

* load bugfix

* difficulty timeout tweak

* tweaks

* more graphing

* graph tweaks

* tweak

* download file fix

* pcg train envs list

* cleanup

* tweak

* manually name impala layers

* tweak

* expect fps

* backend arg

* args tweak

* workbench cleanup

* move graph files

* workbench cleanup

* split env name by comma

* workbench cleanup

* ema graph

* remove Dict

* use tf.io.gfile

* comments for auto-killing jobs

* lint fix

* write latest file when not saving all and load it when step=None
2019-05-03 15:54:24 -07:00
Christopher Hesse
bc4eef6053 fix tests (#335) 2019-05-03 15:54:24 -07:00
John Schulman
967fc8c37f Fixed sequence env minor (#333)
minor changes to FixedSequenceEnv to allow full score
2019-05-03 15:54:24 -07:00
pzhokhov
a93dde3b2b extra functionality in baselines.common.plot_util (#310)
* get plot_util from mt_experiments branch

* add labels

* unit tests for plot_util
2019-05-03 15:54:23 -07:00
John Schulman
b83a66527d Add jrl19 as backend for workbench (#324)
enable jrl in workbench
minor logger changes
2019-05-03 15:54:23 -07:00
John Schulman
07cbf1e26a Grad clipping in MpiAdamOptimizer, transformer changes (#304)
* transformer mnist experiments

* version that only builds one model

* work on inverted mnist

* Add grad clipping to MpiAdamOptimizer

* various

* transformer changes, loading

* get rid of soft labels

* transformer baseline

* minor

* experiments involving all possible training sets

* vary training

* minor

* get ready for fine-tuning expers

* lint

* minor
2019-05-03 15:54:23 -07:00
Karl Cobbe
5082e5d34b Workbench (#303)
* begin workbench

* cleanup

* begin procgen config integration

* arg tweaks

* more args

* parameter saving

* begin procgen enjoy

* tweaks

* more workbench

* more args sync/restore

* cleanup

* merge in master

* rework args priority

* more workbench

* more loggign

* impala cnn

* impala lstm

* tweak

* tweaks

* rl19 time logging

* misc fixes

* faster pipeline

* update local.py

* sess and log config tweaks

* num processes

* logging tweaks

* difficulty reward wrapper

* logging fixes

* gin tweaks

* tweak

* fix

* task id

* param loading

* more variable loading

* entrypoint

* tweak

* ksync

* restore lstm

* begin rl19 support

* tweak

* rl19 rnn

* more rl19 integration

* fix

* cleanup

* restore rl19 rnn

* cleanup

* cleanup

* wrappers.get_log_info

* cleanup

* cleanup

* directory cleanup

* logging, num_experiments

* fixes

* cleanup

* gin fixes

* fix local max gpu

* resid nx

* num machines and download params

* rename

* cleanup

* create workbench

* more reorg

* fix

* more logging wrappers

* lint fix

* restore train procgen

* restore train procgen

* pylint fix

* better wrapping

* config sweep

* args sweep

* test workers

* mpi_weight

* train test comm and high difficulty fix

* enjoy show returns

* removing gin, procgen_parser

* removing gin

* procgen args

* config fixes

* cleanup

* cleanup

* procgen args fix

* fix

* rcall syncing

* lint

* rename mpi_weight

* use username for sync

* fixes

* microbatch fix
2019-05-03 15:54:22 -07:00
Christopher Hesse
376fd88bb8 fix vec monitor infos 2019-05-03 15:54:22 -07:00
Peter Zhokhov
96b6a31848 Merge branch 'internal' of github.com:openai/baselines into internal 2019-04-05 14:11:09 -07:00
Jacob Hilton
0a48a1fda9 Merge branch 'master' of github.com:openai/baselines into internal 2019-04-03 16:21:48 -07:00
Christopher Hesse
ea20c8a034 add score calculator wrapper, forward property lookups on vecenv wrap… (#300)
* add score calculator wrapper, forward property lookups on vecenv wrapper, misc cleanup

* tests

* pylint
2019-04-03 16:20:42 -07:00
pzhokhov
a08af5d07d make tests use single-threaded session for determinism of KfacOptimizer (#298)
* make tests use single-threaded session for determinism of KfacOptimizer

* updated comment in kfac.py

* remove unused sess_config
2019-04-03 16:20:42 -07:00
Oleg Klimov
cc88c8e4c0 remove tensorflow dependency from VecEnv 2019-04-03 16:20:42 -07:00
pzhokhov
f2654082b2 Symshapes - gives codegen ability to evaluate same algo on envs with different ob/ac shapes (#262)
* finish cherry-pick td3 test commit

* removed graph simplification error ingore

* merge delayed logger config

* merge updated baselines logger

* lazy_mpi load

* cleanups

* use lazy mpi imports in codegen

* more lazy mpi

* don't pretend that class is a module, just use it as a class

* mass-replace mpi4py imports

* flake8

* fix previous lazy_mpi imports

* removed extra printouts from TdLayer op

* silly recursion

* running codegen cc experiment

* wip

* more wip

* use actor is input for critic targets, instead of the action taken

* batch size 100

* tweak update parameters

* tweaking td3 runs

* wip

* use nenvs=2 for contcontrol (to be comparable with ppo_metal)

* wip. Doubts about usefulness of actor in critic target

* delayed actor in ActorLoss

* score is average of last 100

* skip lack of losses or too many action distributions

* 16 envs for contcontrol, replay buffer size equal to horizon (no point in making it longer)

* syntax

* microfixes

* minifixes

* run in process logic to bypass tensorflow freezes/failures (per Oleg's suggestion)

* random physics for mujoco

* random parts sizes with range 0.4

* add notebook with results into x/peterz

* variations of ant

* roboschool use gym.make kwargs

* use float as lowest score after rank transform

* rcall from master

* wip

* re-enable dynamic routing

* wip

* squash-merge master, resolve conflicts

* remove erroneous file

* restore normal MPI imports

* move wrappers around a little bit

* autopep8

* cleanups

* cleanup mpi_eda, autopep8

* make activation function of action distribution customizable

* cleanups; preparation for a pr

* syntax

* merge latest master, resolve conflicts

* wrap MPI import with try/except

* allow import of modules through env id im baselines cmd_util

* flake8 complaints

* only wrap box action spaces with ClipActionsWrapper

* flake8

* fixes to algo_prob according to Oleg's suggestions

* use apply_without_scope flag in ActorLoss

* remove extra line in algo/core.py

* multi-task support

* autopep8

* symbolic suffix-shapes (not B,T yet)

* test_with_mpi -> with_mpi rename

* remove extra blank lines in algo/core

* remove extra blank lines in algo/core

* remove more blank lines

* symbolify shapes in existing algorithms

* minor output changes

* cleaning up merge conflicts

* cleaning up merge conflicts

* cleaning up more merge conflicts

* restore mpi_map.py from master
2019-04-03 16:20:42 -07:00
Karl Cobbe
dadc2c2eb6 Rl19 metalearning (#261)
* rl19 metalearning and dict obs

* master merge arch fix

* lint fixes

* view fixes

* load vars tweaks

* user config cleanup

* documentation and revisions

* pass train comm to rl19

* cleanup
2019-04-03 16:20:42 -07:00
pzhokhov
d9702e7ccb codegen continuous control experiment pr (#256)
* finish cherry-pick td3 test commit

* removed graph simplification error ingore

* merge delayed logger config

* merge updated baselines logger

* lazy_mpi load

* cleanups

* use lazy mpi imports in codegen

* more lazy mpi

* don't pretend that class is a module, just use it as a class

* mass-replace mpi4py imports

* flake8

* fix previous lazy_mpi imports

* removed extra printouts from TdLayer op

* silly recursion

* running codegen cc experiment

* wip

* more wip

* use actor is input for critic targets, instead of the action taken

* batch size 100

* tweak update parameters

* tweaking td3 runs

* wip

* use nenvs=2 for contcontrol (to be comparable with ppo_metal)

* wip. Doubts about usefulness of actor in critic target

* delayed actor in ActorLoss

* score is average of last 100

* skip lack of losses or too many action distributions

* 16 envs for contcontrol, replay buffer size equal to horizon (no point in making it longer)

* syntax

* microfixes

* minifixes

* run in process logic to bypass tensorflow freezes/failures (per Oleg's suggestion)

* squash-merge master, resolve conflicts

* remove erroneous file

* restore normal MPI imports

* move wrappers around a little bit

* autopep8

* cleanups

* cleanup mpi_eda, autopep8

* make activation function of action distribution customizable

* cleanups; preparation for a pr

* syntax

* merge latest master, resolve conflicts

* wrap MPI import with try/except

* allow import of modules through env id im baselines cmd_util

* flake8 complaints

* only wrap box action spaces with ClipActionsWrapper

* flake8

* fixes to algo_prob according to Oleg's suggestions

* use apply_without_scope flag in ActorLoss

* remove extra line in algo/core.py
2019-04-03 16:20:42 -07:00
Christopher Hesse
f641810ef9 update dmlab30 env (#258) 2019-04-03 16:20:42 -07:00
Peter Zhokhov
3265098cc6 Merge branch 'master' of github.com:openai/baselines into internal 2019-04-01 16:26:25 -07:00
Peter Zhokhov
5bc6f53960 merged master 2019-03-11 17:31:03 -07:00
Peter Zhokhov
fa5cb1e1f5 merged master 2019-02-27 15:05:24 -08:00
Peter Zhokhov
6dedd5d241 flake8 complaints in baselines/her 2019-02-26 16:51:11 -08:00
Peter Zhokhov
5c7da772a4 Merge branch 'master' of github.com:openai/games
the commit.
2019-02-26 16:51:11 -08:00
Christopher Hesse
a4188f4b36 minor changes to baselines (#243)
* minor changes to baselines

* fix spaces reference

* remove flake8 disable comments and fix import

* okay maybe don't add spec to vec_env
2019-02-26 15:43:24 -08:00
John Schulman
fb6fd51fe6 Rl19 (#232)
* everyrl initial commit

* add keep_buf argument to VecMonitor

* logger changes: set_comm and fix to mpi_mean functionality

* if filename not provided, don't create ResultsWriter

* change variable syncing function to simplify its usage. now you should initialize from all mpi processes

* everyrl coinrun changes

* tf_distr changes, bugfix

* get_one

* bring back get_next to temporarily restore code

* lint fixes

* fix test

* rename profile function

* rename gaussian

* fix coinrun training script

* rl19

* remove everyrl dir which appeared in the merge for some reason

* readme

* fiddle with ddpg

* make ddpg work

* steps_total argument

* gpu count

* clean up hyperparams and shape math

* logging + saving

* configuration stuff

* fixes, smoke tests

* fix stats

* make load_results return dicts -- easier to create the same kind of objects with some other mechanism for passing to downstream functions

* benchmarks

* fix tests

* add dqn to tests, fix it

* minor

* turned annotated transformer (pytorch) into a script

* more refactoring

* jax stuff

* cluster

* minor

* copy & paste alec code

* sign error

* add huber, rename some parameters, snapshotting off by default

* remove jax stuff

* minor

* move maze env

* minor

* remove trailing spaces

* remove trailing space

* lint

* fix test breakage due to gym update

* rename function

* move maze back to codegen

* get recurrent ppo working

* enable both lstm and gru

* script to print table of benchmark results

* various

* fix dqn

* add fixup initializer, remove lastrew

* organize logging stats

* fix silly bug

* refactor models

* fix mpi usage

* check sync

* minor

* change vf coef, hps

* clean up slicing in ppo

* minor fixes

* caching transformer

* docstrings

* xf fixes

* get rid of 'B' and 'BT' arguments

* minor

* transformer example

* remove output_kind from base class until we have a better idea how to use it

* add comments, revert maze stuff

* flake8

* codegen lint

* fix codegen tests

* responded to peter's comments

* lint fixes
2019-02-26 15:43:24 -08:00
Christopher Hesse
ecf5394226 misc changes to vecenvs and run.py for benchmarks (#236)
* misc changes to vecenvs and run.py for benchmarks

* dont seed global gen

* update more references to assert_venvs_equal
2019-02-26 15:43:24 -08:00
Christopher Hesse
0dcaafd717 change random seeding to work with new gym version (#231)
* change random seeding to work with new gym version

* move seeding to seed() method

* fix mnistenv

* actually try some of the tests before pushing

* more deterministic fixed seq
2019-02-26 15:43:24 -08:00
John Schulman
82ebd4a153 Everyrl initial commit & a few minor baselines changes (#226)
* everyrl initial commit

* add keep_buf argument to VecMonitor

* logger changes: set_comm and fix to mpi_mean functionality

* if filename not provided, don't create ResultsWriter

* change variable syncing function to simplify its usage. now you should initialize from all mpi processes

* everyrl coinrun changes

* tf_distr changes, bugfix

* get_one

* bring back get_next to temporarily restore code

* lint fixes

* fix test

* rename profile function

* rename gaussian

* fix coinrun training script
2019-02-26 15:43:24 -08:00
Peter Zhokhov
cd8d3389ba remove forked argument in front of tests - does not play nicely with subprocvecenv in spawned processes; analog of forked in ddpg/test_smoke 2019-01-24 17:49:02 -08:00
Peter Zhokhov
0c949b0680 flake8; removed special logic for discrete spaces in dummy_vec_env 2019-01-24 15:57:18 -08:00
Peter Zhokhov
0e0dd77f61 mpi test fixes 2019-01-24 15:46:58 -08:00
Peter Zhokhov
e868bdaa1a allow for non-mpi tests 2019-01-24 14:35:41 -08:00
Peter Zhokhov
547764efc9 flake8 fix 2019-01-24 14:33:50 -08:00
Peter Zhokhov
bb05b9ee88 removed unnecessary OrderedDict requirement in subproc_vec_env 2019-01-24 14:29:35 -08:00
Karl Cobbe
1d56af90d3 Vecenv refactor (#223)
* update karl util

* restore pvi flag

* change rcall auto cpu behavior, move gin.configurable, add os.makedirs

* vecenv refactor

* aux buf index fix

* add num aux obs

* reset level with enter

* restore high difficulty flag

* bugfix

* restore train_coinrun.py

* tweaks

* renaming

* renaming

* better arguments handling

* more options

* options cleanup

* game data refactor

* more options

* args for train_procgen

* add close handler to interactive base class

* use debug build if debug=True, fix range on aux_obs

* add ProcGenEnv to __init__.py, add missing imports to procgen.py

* export RemoveDictWrapper and build, update train_procgen.py, move assets download into env creation and replace init_assets_and_build with just build

* fix formatting issues

* only call global init once

* fix path in setup.py

* revert part of makefile

* ignore IDE files and folders

* vec remove dict

* export VecRemoveDictObs

* remove RemoveDictWrapper

* remove IDE files

* move shared .h and .cpp files to common folder, update build to use those, dedupe env.cpp

* fix missing header

* try unified build function

* remove old scripts dir

* add comment on build

* upload libenv with render fixes

* tell qthreads to die when we unload the library

* pyglet.app.run is garbage

* static fixes

* whoops

* actually vsync is on

* cleanup

* cleanup

* extern C for libenv interface

* parse util rcall arg

* high difficulty fix

* game type enums

* ProcGenEnv subclasses

* game type cleanup

* unrecognized key

* unrecognized game type

* parse util reorg

* args management

* typo fix

* GinParser

* arg tweaks

* tweak

* restore start_level/num_levels setting

* fix create_procgen_env interface

* build fix

* procgen args in init signature

* fix

* build fix

* fix logger usage in ppo_metal/run_retro
2019-01-24 14:29:35 -08:00
pzhokhov
d760c363bc make default logger configuration the same as call to logger.configure() (#222) 2019-01-24 14:29:35 -08:00
Christopher Hesse
4ee173c30b baselines: export vecenvs from folder (#221)
* baselines: export vecenvs from folder

* put missing function back in

* add missing imports

* more imports

* longer mpi timeout?
2019-01-24 14:29:35 -08:00
John Schulman
ef1e80621a whitespace + RUN BENCHMARKS 2019-01-24 14:29:35 -08:00
John Schulman
3d800a99dc more timesteps in humanoid run 2019-01-24 14:29:35 -08:00
John Schulman
27b8644936 remove clip_frac schedule from ppo2 2019-01-24 14:29:35 -08:00
John Schulman
45063be393 change humanoid hyperparameters, get rid of clip_Frac annealing, as it's apparently dangerous 2019-01-24 14:29:35 -08:00
Christopher Hesse
8c547e5973 use spawn for shmem vec env as well (#2) (#219)
* lazy_mpi load

* cleanups

* more lazy mpi

* don't pretend that class is a module, just use it as a class

* mass-replace mpi4py imports

* flake8

* fix previous lazy_mpi imports

* silly recursion

* try os.environ hack

* better prefix test, work with mpich

* restored MPI imports

* removed commented import in test_with_mpi

* restored codegen from master

* remove lazy mpi

* restored changes from rl-algs

* remove extra files

* port mpi fix to shmem vec env

* increase the mpi test default timeout
2019-01-24 14:29:35 -08:00
pzhokhov
a538e3c8f7 disable mpi in subprocesses (#213)
* lazy_mpi load

* cleanups

* more lazy mpi

* don't pretend that class is a module, just use it as a class

* mass-replace mpi4py imports

* flake8

* fix previous lazy_mpi imports

* silly recursion

* try os.environ hack

* better prefix test, work with mpich

* restored MPI imports

* removed commented import in test_with_mpi

* restored codegen from master

* remove lazy mpi

* restored changes from rl-algs

* remove extra files

* address Chris' comments
2019-01-24 14:29:35 -08:00
pzhokhov
3a8f35a7e9 delayed logger configuration (#208)
* delayed logger configuration

* fix typo

* setters and getters for Logger.DEFAULT as well

* do away with fancy property stuff - unable to get it to work with class level methods

* grammar and spaces

* spaces

* use get_current function instead of reading Logger.CURRENT

* autopep8
2019-01-24 14:29:35 -08:00
John Schulman
370ee27750 1.5 months of codegen changes (#196)
* play with resnet

* feed_dict version

* coinrun prob and more stats

* fixes to get_choices_specs & hp search

* minor prob fixes

* minor fixes

* minor

* alternative version of rl_algo stuff

* pylint fixes

* fix bugs, move node_filters to soup

* changed how get_algo works

* change how get_algo works, probably broke all tests

* continue previous refactor

* get eval_agent running again

* fixing tests

* fix tests

* fix more tests

* clean up cma stuff

* fix experiment

* minor changes to eval_agent to make ppo_metal use gpu

* make dict space work

* modify mac makefile to use conda

* recurrent layers

* play with bn and resnets

* minor hp changes

* minor

* got rid of use_fb argument and jtft (joint-train-fine-tune) functionality
built test phase directly into AlgoProb

* make new rl algos generateable

* pylint; start fixing tests

* fixing tests

* more test fixes

* pylint

* fix search

* work on search

* hack around infinite loop caused by scan

* algo search fixes

* misc changes for search expt

* enable annealing, overriding options of Op

* pylint fixes

* identity op

* achieve use_last_output through masking so it automatically works in other distributions

* fix tests

* minor

* discrete

* use_last_output to be just a preference, not a hard constraint

* pred delay, pruning

* require nontrivial inputs

* aliases for get_sm

* add probname to probs

* fixes

* small fixes

* fix tests

* fix tests

* fix tests

* minor

* test scripts

* dualgru network improvements

* minor

* work on mysterious bugs

* rcall gpu-usage command for kube

* use cache dir that’s not in code folder, so that it doesn’t get removed by rcall code rsync

* add power mode to gpu usage

* make sure train/test actually different

* remove VR for now

* minor fixes

* simplify soln_db

* minor

* big refactor of mpi eda

* improve mpieda for multitask

* - get rid of timelimit hack
- add __del__ to cleanup SubprocVecEnv

* get multitask working better

* fixes

* working on atari, various

* annotate ops with whether they’re parametrized

* minor

* gym version

* rand atari prob

* minor

* SolnDb bugfix and name change

* pyspy script

* switch conv layers

* fix roboschool/bullet3

* nenvs assertion

* fix rand atari

* get rid of blanket exception catching
fix soln_db bug

* fix rand_atari

* dynamic routing as cmdline arg

* slight modifications to test_mpi_map and pyspy-all

* max_tries argument for run_until_successs

* dedup option in train_mle

* simplify soln_db

* increase atari horizon for 1 experiment

* start implementing reward increment

* ent multiplier

* create cc dsl
other misc fixes

* cc ops

* q_func -> qs in rl_algos_cc.py

* fix PredictDistr

* rl_ops_cc fixes, MakeAction op

* augment algo agent to support cc stuff

* work on ddpg experiments

* fix blocking
temporarily change logger

* allow layer scaling

* pylint fixes

* spawn_method

* isolate ddpg hacks

* improve pruning

* use spawn for subproc

* remove use of python -c in rcall

* fix pylint warning

* fix static

* maybe fix local backend

* switch to DummyVecEnv

* making some fixes via pylint

* pylint fixes

* fixing tests

* fix tests

* fix tests

* write scaffolding for SSL in Codegen

* logger fix

* fix error

* add EMA op to sl_ops

* save many changes

* save

* add upsampler

* add sl ops, enhance state machine

* get ssl search working — some gross hacking

* fix session/graph issue

* fix importing

* work on mle

* - scale embeddings in gru model
- better exception handling in sl_prob
- use emas for test/val
- use non-contrib batch_norm layer

* improve logging

* option to average before dumping in logger

* default arguments, etc

* new ddpg and identity test

* concat fix

* minor

* move realistic ssl stuff to third-party (underscore to dash)

* fixes

* remove realistic_ssl_evaluation

* pylint fixes

* use gym master

* try again

* pass around args without gin

* fix tests

* separate line to install gym

* rename failing tests that should be ignored

* add data aug

* ssl improvements

* use fixed time limit

* try to fix baselines tests

* add score_floor, max_walltime, fiddle with lr decay

* realistic_ssl

* autopep8

* various ssl
- enable blocking grad for simplification
- kl
- multiple final prediction

* fix pruning

* misc ssl stuff

* bring back linear schedule, don’t use allgather for collecting stats
(i’ve been getting nondeterministic errors from the old code)

* save/load weights in SSL, big stepsize

* cleanup SslProb

* fix

* get rid of kl coef

* fix simplification, lower lr

* search over hps

* minor fixes

* minor

* static analysis

* move files and rename things for improved consistency.
still broken, and just saving before making nontrivial changes

* various

* make tests pass

* move coinrun_train to codegen since it depends on codegen

* fixes

* pylint fixes

* improve tests
fix some things

* improve tests

* lint

* fix up db_info.py, tests

* mostly restore master version of envs directory, except for makefile changes

* fix tests

* improve printing

* minor fixes

* fix fixmes

* pruning test

* fixes

* lint

* write new test that makes tf graphs of random algos; fix some bugs it caught

* add —delete flag to rcall upload-code command

* lint

* get cifar10 lazily for testing purposes

* disable codegen ci tests for now

* clean up rl_ops

* rename spec classes

* td3 with identity test

* identity tests without gin files

* remove gin.configurable from AlgoAgent

* comments about reduction in rl_ops_cc

* address @pzhokhov comments

* fix tests

* more linting

* better tests

* clean up filtering a bit

* fix concat
2019-01-24 14:29:35 -08:00
Peter Zhokhov
8fe79aa76d Merge branch 'master' of github.com:openai/baselines into internal 2019-01-24 14:28:35 -08:00
pzhokhov
152971d6d4 Refactor her phase 1 (#194)
* add monitor to the rollout envs in her RUN BENCHMARKS her

* Slice -> Slide in her benchmarks RUN BENCHMARKS her

* run her benchmark for 200 epochs

* dummy commit to RUN BENCHMARKS her

* her benchmark for 500 epochs RUN BENCHMARKS her

* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her

* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her

* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her

* disable saving of policies in her benchmark RUN BENCHMARKS her

* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch

* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch

* launcher refactor wip

* wip

* her works on FetchReach

* her runner refactor RUN BENCHMARKS Fetch1M

* unit test for her

* fixing warnings in mpi_average in her, skip test_fetchreach if mujoco is not present

* pickle-based serialization in her

* remove extra import from subproc_vec_env.py

* investigating differences in rollout.py

* try with old rollout code RUN BENCHMARKS her

* temporarily use DummyVecEnv in cmd_util.py RUN BENCHMARKS her

* dummy commit to RUN BENCHMARKS her

* set info_values in rollout worker in her RUN BENCHMARKS her

* bug in rollout_new.py RUN BENCHMARKS her

* fixed bug in rollout_new.py RUN BENCHMARKS her

* do not use last step because vecenv calls reset and returns obs after reset RUN BENCHMARKS her

* updated buffer sizes RUN BENCHMARKS her

* fixed loading/saving via joblib

* dust off learning from demonstrations in HER, docs, refactor

* add deprecation notice on her play and plot files

* address comments by Matthias
2018-12-18 17:47:36 -08:00
Peter Zhokhov
c4afffbb39 Merge branch 'master' of github.com:openai/baselines into internal 2018-11-29 17:31:58 -08:00
Peter Zhokhov
5b74b437d8 Merge branch 'master' of github.com:openai/baselines into internal 2018-11-26 16:43:10 -08:00
Srizzle
6509a51b96 fixed bug (#185)
* fixed bug 

it's wrong to do the else statement, because no other nodes would start.

* changed the fix slightly
2018-11-26 16:42:21 -08:00
pzhokhov
001597586d updates to the benchmark viewer code + autopep8 (#184)
* viz docs and syntactic sugar wip

* update viewer yaml to use persistent volume claims

* move plot_util to baselines.common, update links

* use 1Tb hard drive for results viewer

* small updates to benchmark vizualizer code

* autopep8

* autopep8

* any folder can be a benchmark

* massage games image a little bit

* fixed --preload option in app.py

* remove preload from run_viewer.sh

* remove pdb breakpoints

* update bench-viewer.yaml
2018-11-26 16:42:20 -08:00
Peter Zhokhov
1ddab4bdb5 Merge branch 'master' of github.com:openai/baselines into internal 2018-11-14 14:54:16 -08:00
Peter Zhokhov
776a134218 merge master 2018-11-13 11:24:57 -08:00
Peter Zhokhov
0b8126f949 more un-mpying 2018-11-09 10:08:39 -08:00
Peter Zhokhov
84323c3d49 flake8 and mpi4py imports in ppo2/model.py 2018-11-09 09:32:59 -08:00
Peter Zhokhov
5a2b96abdd Merge branch 'master' of github.com:openai/baselines into internal 2018-11-08 10:36:54 -08:00
Peter Zhokhov
57c23cddd6 mpi-less ppo2 (resolving merge conflict) 2018-11-08 10:36:36 -08:00
pzhokhov
310fbadba3 Peterz joshim5 subclass ppo2 model (#170)
* microbatch fixes and test

* tiny cleanup

* added assertions to the test

* vpg-related fix

* subclassing the model to make microbatched version of model WIP

* made microbatched model a subclass of ppo2 Model

* flake8 complaint
2018-11-08 10:20:49 -08:00
pzhokhov
c424f9889d microbatch fixes and test (#169)
* microbatch fixes and test

* tiny cleanup

* added assertions to the test

* vpg-related fix
2018-11-08 10:20:02 -08:00
peter
a1cef656b8 pass microbatch_size to the model during construction 2018-11-08 10:20:02 -08:00
pzhokhov
b0589da817 ppo2 with microbatches (#168) 2018-11-08 10:20:02 -08:00
Peter Zhokhov
021533be6c Merge branch 'master' of github.com:openai/baselines into internal 2018-11-07 16:37:31 -08:00
pzhokhov
67a1222267 Merge branch 'master' into internal 2018-11-06 10:26:14 -08:00
Peter Zhokhov
739ab6fa0e Merge branch 'internal' of github.com:openai/baselines into internal 2018-11-05 14:07:52 -08:00
Peter Zhokhov
6fd2270c47 fixing test failures 2018-10-31 14:11:26 -07:00
Joshua Meier
63151af41a support color vs. grayscale option in WarpFrame wrapper (#166)
* support color vs. grayscale option in WarpFrame wrapper

* Support color in other wrappers

* Updated per Peters suggestions
2018-10-31 14:11:26 -07:00
pzhokhov
e619e42364 match network output with action distribution via a linear layer only if necessary (#167) 2018-10-31 14:11:26 -07:00
Peter Zhokhov
5dbe4c2462 Merge branch 'master' of github.com:openai/baselines into internal 2018-10-31 13:58:29 -07:00
Peter Zhokhov
5878eb3862 joshim5 changes (width and height to WarpFrame wrapper) 2018-10-30 18:02:03 -07:00

View File

@@ -275,133 +275,6 @@ class BernoulliPd(Pd):
def fromflat(cls, flat):
return cls(flat)
def _np_cast(x, dtype):
"""Numpy cast, equivalent to tf.cast"""
return x.astype(dtype)
def decode_tuple_sample(pdtypes, x):
"""
Cast and convert a sample from its dense concatenated state back to constituent parts.
Arguments
---------
:param pdtypes: list<PdType>, a TuplePdType's child PdTypes.
:param x: np.ndarray or tf.Tensor.
Shape is [..., sum(pdtype.sample_shape for pdtype in pdtypes)]
:return output, list<np.ndarray> or list<tf.Tensor>, the split and correctly casted
policy samples.
"""
if isinstance(x, np.ndarray):
cast_fn = _np_cast
numpy_casting = True
else:
cast_fn = tf.cast
numpy_casting = False
so_far = 0
xs = []
for pdtype in pdtypes:
sample_size = pdtype.sample_shape()[0] if len(pdtype.sample_shape()) > 0 else 1
if len(pdtype.sample_shape()) == 0:
slided_x = x[..., so_far]
else:
slided_x = x[..., so_far:so_far + sample_size]
desired_dtype = pdtype.sample_dtype()
if numpy_casting:
desired_dtype = desired_dtype.as_numpy_dtype
if desired_dtype != x:
slided_x = cast_fn(slided_x, desired_dtype)
xs.append(slided_x)
so_far += sample_size
return xs
class TuplePd(Pd):
def __init__(self, sample_dtype, pdtypes, logits):
self.pdtypes = pdtypes
self.sample_dtype = sample_dtype
self.pds = []
so_far = 0
for pdtype in self.pdtypes:
param_shape = pdtype.param_shape()[0]
self.pds.append(pdtype.pdfromflat(logits[..., so_far:so_far + param_shape]))
so_far += param_shape
def flatparam(self):
return tf.concat([pd.flatparam() for pd in self.pds], axis=-1)
def mode(self):
return self.tuple_sample_concat([pd.mode() for pd in self.pds])
def tuple_sample_concat(self, samples):
out = []
for sample, pdtype in zip(samples, self.pdtypes):
if len(pdtype.sample_shape()) == 0:
sample = tf.expand_dims(sample, axis=-1)
if sample.dtype != self.sample_dtype:
sample = tf.cast(sample, self.sample_dtype)
out.append(sample)
return tf.concat(out, axis=-1)
def sample(self):
return self.tuple_sample_concat([pd.sample() for pd in self.pds])
def neglogp(self, x):
return tf.add_n([pd.neglogp(xi) for pd, xi in zip(self.pds, decode_tuple_sample(self.pdtypes, x))])
def entropy(self):
return tf.add_n([pd.entropy() for pd in self.pds])
def _dtype_promotion(old, new):
"""
Find the highest precision common ground between two tensorflow datatypes.
if old is None, it is ignored.
"""
if old is None or (new.is_floating and old.is_integer):
return new
if old.is_floating and old.is_integer:
return old
if (old.is_floating and new.is_floating) or (new.is_integer and new.is_integer):
# take the largest type (e.g. float64 over float32)
return old if old.size > new.size else new
raise ValueError("No idea how to promote {} and {}.".format(old, new))
class TuplePdType(PdType):
def __init__(self, space):
self.internal_pdtypes = [make_pdtype(space) for space in space.spaces]
def decode_sample(self, x):
return decode_tuple_sample(self.internal_pdtypes, x)
def pdclass(self):
return TuplePd
def pdfromflat(self, flat):
return TuplePd(self.sample_dtype(), self.internal_pdtypes, flat)
def param_shape(self):
return [sum([pdtype.param_shape()[0]
for pdtype in self.internal_pdtypes])]
def sample_shape(self):
return [sum([pdtype.sample_shape()[0] if len(pdtype.sample_shape()) > 0 else 1
for pdtype in self.internal_pdtypes])]
def sample_dtype(self):
dtype = None
for pdtype in self.internal_pdtypes:
dtype = _dtype_promotion(dtype, pdtype.sample_dtype())
return dtype
def make_pdtype(ac_space):
from gym import spaces
if isinstance(ac_space, spaces.Box):
@@ -413,12 +286,9 @@ def make_pdtype(ac_space):
return MultiCategoricalPdType(ac_space.nvec)
elif isinstance(ac_space, spaces.MultiBinary):
return BernoulliPdType(ac_space.n)
elif isinstance(ac_space, spaces.Tuple):
return TuplePdType(ac_space)
else:
raise NotImplementedError
def shape_el(v, i):
maybe = v.get_shape()[i]
if maybe is not None: