* initial implementaion of ppo2_rnn.
* set lstm memory as tf.GraphKeys.LOCAL_VARIABLES.
* replace dones with tf.placeholder_with_default.
* improves for 'play' option.
* removed unnecessary TODO .
* improve lstm code.
* move learning rate placeholer to optimizer scope.
* support the microbatched model.
* sync cnn lstm layer with originals.
* add cnn_lnlstm layer.
* fix a case when `states` is None.
* add initial_state variable to help test.
* make ppo2 rnn test available.
* rename 'obs' with 'observations'.
rename 'transition' with 'transitions'.
fix forgetting `dones` in the replay buffer.
fix a misuse of `states` and `next_states` in the replay buffer.
* make initialization once.
make `test_fixed_sequence` compatible with ppo2.
* adjust input shape.
* fix checking of a model input args in `simple_test` function.
* disable warning on purpose.
* support the play.
* improve scopes to compatible with multiple models (i.e, other tensorflow global/local variables)
* clean the scope of ppo2 policy model.
* name the memory variable of PPO RNNs more describly
* wrap the initializations in ppo2.
* remove redundant lines.
* update `REAMD.md`.
* add RNN layers.
* add the result of HalfCheeta-v2 env experiment.
* correct a typo.
* add RNN class.
* rename `nlstm` with `num_units` in RNN builder functions.
* remove state saving.
* reuse RNNs in a2c.utils.
* revert baselines/run.py.
* replace `ppo2.step()` with original interface.
* revert `baselines/common/tests/util.py`.
* remove redundant lines.
* revert `baselines/common/test/util.py` to b875fb7.
* remove `states` variable.
* move RNN class to `baselines/ppo2/layers.py' and revert `baselines/common/models.py` to 858afa8.
* rename `model.step_as_dict` with `model.step_with_dict`.
* removed `ppo_lstm_mlp`.
* fix 02e26fd.
* joshim5 changes (width and height to WarpFrame wrapper)
* match network output with action distribution via a linear layer only if necessary (#167)
* support color vs. grayscale option in WarpFrame wrapper (#166)
* support color vs. grayscale option in WarpFrame wrapper
* Support color in other wrappers
* Updated per Peters suggestions
* fixing test failures
* ppo2 with microbatches (#168)
* pass microbatch_size to the model during construction
* microbatch fixes and test (#169)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* Peterz joshim5 subclass ppo2 model (#170)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* subclassing the model to make microbatched version of model WIP
* made microbatched model a subclass of ppo2 Model
* flake8 complaint
* mpi-less ppo2 (resolving merge conflict)
* flake8 and mpi4py imports in ppo2/model.py
* more un-mpying
* merge master
* updates to the benchmark viewer code + autopep8 (#184)
* viz docs and syntactic sugar wip
* update viewer yaml to use persistent volume claims
* move plot_util to baselines.common, update links
* use 1Tb hard drive for results viewer
* small updates to benchmark vizualizer code
* autopep8
* autopep8
* any folder can be a benchmark
* massage games image a little bit
* fixed --preload option in app.py
* remove preload from run_viewer.sh
* remove pdb breakpoints
* update bench-viewer.yaml
* fixed bug (#185)
* fixed bug
it's wrong to do the else statement, because no other nodes would start.
* changed the fix slightly
* Refactor her phase 1 (#194)
* add monitor to the rollout envs in her RUN BENCHMARKS her
* Slice -> Slide in her benchmarks RUN BENCHMARKS her
* run her benchmark for 200 epochs
* dummy commit to RUN BENCHMARKS her
* her benchmark for 500 epochs RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* disable saving of policies in her benchmark RUN BENCHMARKS her
* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch
* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch
* launcher refactor wip
* wip
* her works on FetchReach
* her runner refactor RUN BENCHMARKS Fetch1M
* unit test for her
* fixing warnings in mpi_average in her, skip test_fetchreach if mujoco is not present
* pickle-based serialization in her
* remove extra import from subproc_vec_env.py
* investigating differences in rollout.py
* try with old rollout code RUN BENCHMARKS her
* temporarily use DummyVecEnv in cmd_util.py RUN BENCHMARKS her
* dummy commit to RUN BENCHMARKS her
* set info_values in rollout worker in her RUN BENCHMARKS her
* bug in rollout_new.py RUN BENCHMARKS her
* fixed bug in rollout_new.py RUN BENCHMARKS her
* do not use last step because vecenv calls reset and returns obs after reset RUN BENCHMARKS her
* updated buffer sizes RUN BENCHMARKS her
* fixed loading/saving via joblib
* dust off learning from demonstrations in HER, docs, refactor
* add deprecation notice on her play and plot files
* address comments by Matthias
* 1.5 months of codegen changes (#196)
* play with resnet
* feed_dict version
* coinrun prob and more stats
* fixes to get_choices_specs & hp search
* minor prob fixes
* minor fixes
* minor
* alternative version of rl_algo stuff
* pylint fixes
* fix bugs, move node_filters to soup
* changed how get_algo works
* change how get_algo works, probably broke all tests
* continue previous refactor
* get eval_agent running again
* fixing tests
* fix tests
* fix more tests
* clean up cma stuff
* fix experiment
* minor changes to eval_agent to make ppo_metal use gpu
* make dict space work
* modify mac makefile to use conda
* recurrent layers
* play with bn and resnets
* minor hp changes
* minor
* got rid of use_fb argument and jtft (joint-train-fine-tune) functionality
built test phase directly into AlgoProb
* make new rl algos generateable
* pylint; start fixing tests
* fixing tests
* more test fixes
* pylint
* fix search
* work on search
* hack around infinite loop caused by scan
* algo search fixes
* misc changes for search expt
* enable annealing, overriding options of Op
* pylint fixes
* identity op
* achieve use_last_output through masking so it automatically works in other distributions
* fix tests
* minor
* discrete
* use_last_output to be just a preference, not a hard constraint
* pred delay, pruning
* require nontrivial inputs
* aliases for get_sm
* add probname to probs
* fixes
* small fixes
* fix tests
* fix tests
* fix tests
* minor
* test scripts
* dualgru network improvements
* minor
* work on mysterious bugs
* rcall gpu-usage command for kube
* use cache dir that’s not in code folder, so that it doesn’t get removed by rcall code rsync
* add power mode to gpu usage
* make sure train/test actually different
* remove VR for now
* minor fixes
* simplify soln_db
* minor
* big refactor of mpi eda
* improve mpieda for multitask
* - get rid of timelimit hack
- add __del__ to cleanup SubprocVecEnv
* get multitask working better
* fixes
* working on atari, various
* annotate ops with whether they’re parametrized
* minor
* gym version
* rand atari prob
* minor
* SolnDb bugfix and name change
* pyspy script
* switch conv layers
* fix roboschool/bullet3
* nenvs assertion
* fix rand atari
* get rid of blanket exception catching
fix soln_db bug
* fix rand_atari
* dynamic routing as cmdline arg
* slight modifications to test_mpi_map and pyspy-all
* max_tries argument for run_until_successs
* dedup option in train_mle
* simplify soln_db
* increase atari horizon for 1 experiment
* start implementing reward increment
* ent multiplier
* create cc dsl
other misc fixes
* cc ops
* q_func -> qs in rl_algos_cc.py
* fix PredictDistr
* rl_ops_cc fixes, MakeAction op
* augment algo agent to support cc stuff
* work on ddpg experiments
* fix blocking
temporarily change logger
* allow layer scaling
* pylint fixes
* spawn_method
* isolate ddpg hacks
* improve pruning
* use spawn for subproc
* remove use of python -c in rcall
* fix pylint warning
* fix static
* maybe fix local backend
* switch to DummyVecEnv
* making some fixes via pylint
* pylint fixes
* fixing tests
* fix tests
* fix tests
* write scaffolding for SSL in Codegen
* logger fix
* fix error
* add EMA op to sl_ops
* save many changes
* save
* add upsampler
* add sl ops, enhance state machine
* get ssl search working — some gross hacking
* fix session/graph issue
* fix importing
* work on mle
* - scale embeddings in gru model
- better exception handling in sl_prob
- use emas for test/val
- use non-contrib batch_norm layer
* improve logging
* option to average before dumping in logger
* default arguments, etc
* new ddpg and identity test
* concat fix
* minor
* move realistic ssl stuff to third-party (underscore to dash)
* fixes
* remove realistic_ssl_evaluation
* pylint fixes
* use gym master
* try again
* pass around args without gin
* fix tests
* separate line to install gym
* rename failing tests that should be ignored
* add data aug
* ssl improvements
* use fixed time limit
* try to fix baselines tests
* add score_floor, max_walltime, fiddle with lr decay
* realistic_ssl
* autopep8
* various ssl
- enable blocking grad for simplification
- kl
- multiple final prediction
* fix pruning
* misc ssl stuff
* bring back linear schedule, don’t use allgather for collecting stats
(i’ve been getting nondeterministic errors from the old code)
* save/load weights in SSL, big stepsize
* cleanup SslProb
* fix
* get rid of kl coef
* fix simplification, lower lr
* search over hps
* minor fixes
* minor
* static analysis
* move files and rename things for improved consistency.
still broken, and just saving before making nontrivial changes
* various
* make tests pass
* move coinrun_train to codegen since it depends on codegen
* fixes
* pylint fixes
* improve tests
fix some things
* improve tests
* lint
* fix up db_info.py, tests
* mostly restore master version of envs directory, except for makefile changes
* fix tests
* improve printing
* minor fixes
* fix fixmes
* pruning test
* fixes
* lint
* write new test that makes tf graphs of random algos; fix some bugs it caught
* add —delete flag to rcall upload-code command
* lint
* get cifar10 lazily for testing purposes
* disable codegen ci tests for now
* clean up rl_ops
* rename spec classes
* td3 with identity test
* identity tests without gin files
* remove gin.configurable from AlgoAgent
* comments about reduction in rl_ops_cc
* address @pzhokhov comments
* fix tests
* more linting
* better tests
* clean up filtering a bit
* fix concat
* delayed logger configuration (#208)
* delayed logger configuration
* fix typo
* setters and getters for Logger.DEFAULT as well
* do away with fancy property stuff - unable to get it to work with class level methods
* grammar and spaces
* spaces
* use get_current function instead of reading Logger.CURRENT
* autopep8
* disable mpi in subprocesses (#213)
* lazy_mpi load
* cleanups
* more lazy mpi
* don't pretend that class is a module, just use it as a class
* mass-replace mpi4py imports
* flake8
* fix previous lazy_mpi imports
* silly recursion
* try os.environ hack
* better prefix test, work with mpich
* restored MPI imports
* removed commented import in test_with_mpi
* restored codegen from master
* remove lazy mpi
* restored changes from rl-algs
* remove extra files
* address Chris' comments
* use spawn for shmem vec env as well (#2) (#219)
* lazy_mpi load
* cleanups
* more lazy mpi
* don't pretend that class is a module, just use it as a class
* mass-replace mpi4py imports
* flake8
* fix previous lazy_mpi imports
* silly recursion
* try os.environ hack
* better prefix test, work with mpich
* restored MPI imports
* removed commented import in test_with_mpi
* restored codegen from master
* remove lazy mpi
* restored changes from rl-algs
* remove extra files
* port mpi fix to shmem vec env
* increase the mpi test default timeout
* change humanoid hyperparameters, get rid of clip_Frac annealing, as it's apparently dangerous
* remove clip_frac schedule from ppo2
* more timesteps in humanoid run
* whitespace + RUN BENCHMARKS
* baselines: export vecenvs from folder (#221)
* baselines: export vecenvs from folder
* put missing function back in
* add missing imports
* more imports
* longer mpi timeout?
* make default logger configuration the same as call to logger.configure() (#222)
* Vecenv refactor (#223)
* update karl util
* restore pvi flag
* change rcall auto cpu behavior, move gin.configurable, add os.makedirs
* vecenv refactor
* aux buf index fix
* add num aux obs
* reset level with enter
* restore high difficulty flag
* bugfix
* restore train_coinrun.py
* tweaks
* renaming
* renaming
* better arguments handling
* more options
* options cleanup
* game data refactor
* more options
* args for train_procgen
* add close handler to interactive base class
* use debug build if debug=True, fix range on aux_obs
* add ProcGenEnv to __init__.py, add missing imports to procgen.py
* export RemoveDictWrapper and build, update train_procgen.py, move assets download into env creation and replace init_assets_and_build with just build
* fix formatting issues
* only call global init once
* fix path in setup.py
* revert part of makefile
* ignore IDE files and folders
* vec remove dict
* export VecRemoveDictObs
* remove RemoveDictWrapper
* remove IDE files
* move shared .h and .cpp files to common folder, update build to use those, dedupe env.cpp
* fix missing header
* try unified build function
* remove old scripts dir
* add comment on build
* upload libenv with render fixes
* tell qthreads to die when we unload the library
* pyglet.app.run is garbage
* static fixes
* whoops
* actually vsync is on
* cleanup
* cleanup
* extern C for libenv interface
* parse util rcall arg
* high difficulty fix
* game type enums
* ProcGenEnv subclasses
* game type cleanup
* unrecognized key
* unrecognized game type
* parse util reorg
* args management
* typo fix
* GinParser
* arg tweaks
* tweak
* restore start_level/num_levels setting
* fix create_procgen_env interface
* build fix
* procgen args in init signature
* fix
* build fix
* fix logger usage in ppo_metal/run_retro
* removed unnecessary OrderedDict requirement in subproc_vec_env
* flake8 fix
* allow for non-mpi tests
* mpi test fixes
* flake8; removed special logic for discrete spaces in dummy_vec_env
* remove forked argument in front of tests - does not play nicely with subprocvecenv in spawned processes; analog of forked in ddpg/test_smoke
* Everyrl initial commit & a few minor baselines changes (#226)
* everyrl initial commit
* add keep_buf argument to VecMonitor
* logger changes: set_comm and fix to mpi_mean functionality
* if filename not provided, don't create ResultsWriter
* change variable syncing function to simplify its usage. now you should initialize from all mpi processes
* everyrl coinrun changes
* tf_distr changes, bugfix
* get_one
* bring back get_next to temporarily restore code
* lint fixes
* fix test
* rename profile function
* rename gaussian
* fix coinrun training script
* change random seeding to work with new gym version (#231)
* change random seeding to work with new gym version
* move seeding to seed() method
* fix mnistenv
* actually try some of the tests before pushing
* more deterministic fixed seq
* misc changes to vecenvs and run.py for benchmarks (#236)
* misc changes to vecenvs and run.py for benchmarks
* dont seed global gen
* update more references to assert_venvs_equal
* Rl19 (#232)
* everyrl initial commit
* add keep_buf argument to VecMonitor
* logger changes: set_comm and fix to mpi_mean functionality
* if filename not provided, don't create ResultsWriter
* change variable syncing function to simplify its usage. now you should initialize from all mpi processes
* everyrl coinrun changes
* tf_distr changes, bugfix
* get_one
* bring back get_next to temporarily restore code
* lint fixes
* fix test
* rename profile function
* rename gaussian
* fix coinrun training script
* rl19
* remove everyrl dir which appeared in the merge for some reason
* readme
* fiddle with ddpg
* make ddpg work
* steps_total argument
* gpu count
* clean up hyperparams and shape math
* logging + saving
* configuration stuff
* fixes, smoke tests
* fix stats
* make load_results return dicts -- easier to create the same kind of objects with some other mechanism for passing to downstream functions
* benchmarks
* fix tests
* add dqn to tests, fix it
* minor
* turned annotated transformer (pytorch) into a script
* more refactoring
* jax stuff
* cluster
* minor
* copy & paste alec code
* sign error
* add huber, rename some parameters, snapshotting off by default
* remove jax stuff
* minor
* move maze env
* minor
* remove trailing spaces
* remove trailing space
* lint
* fix test breakage due to gym update
* rename function
* move maze back to codegen
* get recurrent ppo working
* enable both lstm and gru
* script to print table of benchmark results
* various
* fix dqn
* add fixup initializer, remove lastrew
* organize logging stats
* fix silly bug
* refactor models
* fix mpi usage
* check sync
* minor
* change vf coef, hps
* clean up slicing in ppo
* minor fixes
* caching transformer
* docstrings
* xf fixes
* get rid of 'B' and 'BT' arguments
* minor
* transformer example
* remove output_kind from base class until we have a better idea how to use it
* add comments, revert maze stuff
* flake8
* codegen lint
* fix codegen tests
* responded to peter's comments
* lint fixes
* minor changes to baselines (#243)
* minor changes to baselines
* fix spaces reference
* remove flake8 disable comments and fix import
* okay maybe don't add spec to vec_env
* Merge branch 'master' of github.com:openai/games
the commit.
* flake8 complaints in baselines/her
* fix#795: Making tf_util._Function consistent
The fix involves using the placeholder name to crossreference passed
kwargs values, just like the tf_util.function expects. Also, the givens
are updated before the parameters to make it behave like it's supposed
to.
* test: Adding test for issue #795
* Added required arguments to the policy builder in the ACER model to
fix the issue #783
* Changed the step model from nbatch to nenvs
* Updated nsteps to be 1.
* Recognize nightly tf builds
* Use LooseVersion instead of StrictVersion to recongnize nightly build numbers
Nightly version numbers are of the form `1.3.0.dev20181215` but it's not a valid version number for `StrictVersion`, while `LooseVersion` still recognizes it.
* joshim5 changes (width and height to WarpFrame wrapper)
* match network output with action distribution via a linear layer only if necessary (#167)
* support color vs. grayscale option in WarpFrame wrapper (#166)
* support color vs. grayscale option in WarpFrame wrapper
* Support color in other wrappers
* Updated per Peters suggestions
* fixing test failures
* ppo2 with microbatches (#168)
* pass microbatch_size to the model during construction
* microbatch fixes and test (#169)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* Peterz joshim5 subclass ppo2 model (#170)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* subclassing the model to make microbatched version of model WIP
* made microbatched model a subclass of ppo2 Model
* flake8 complaint
* mpi-less ppo2 (resolving merge conflict)
* flake8 and mpi4py imports in ppo2/model.py
* more un-mpying
* merge master
* updates to the benchmark viewer code + autopep8 (#184)
* viz docs and syntactic sugar wip
* update viewer yaml to use persistent volume claims
* move plot_util to baselines.common, update links
* use 1Tb hard drive for results viewer
* small updates to benchmark vizualizer code
* autopep8
* autopep8
* any folder can be a benchmark
* massage games image a little bit
* fixed --preload option in app.py
* remove preload from run_viewer.sh
* remove pdb breakpoints
* update bench-viewer.yaml
* fixed bug (#185)
* fixed bug
it's wrong to do the else statement, because no other nodes would start.
* changed the fix slightly
* Refactor her phase 1 (#194)
* add monitor to the rollout envs in her RUN BENCHMARKS her
* Slice -> Slide in her benchmarks RUN BENCHMARKS her
* run her benchmark for 200 epochs
* dummy commit to RUN BENCHMARKS her
* her benchmark for 500 epochs RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* disable saving of policies in her benchmark RUN BENCHMARKS her
* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch
* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch
* launcher refactor wip
* wip
* her works on FetchReach
* her runner refactor RUN BENCHMARKS Fetch1M
* unit test for her
* fixing warnings in mpi_average in her, skip test_fetchreach if mujoco is not present
* pickle-based serialization in her
* remove extra import from subproc_vec_env.py
* investigating differences in rollout.py
* try with old rollout code RUN BENCHMARKS her
* temporarily use DummyVecEnv in cmd_util.py RUN BENCHMARKS her
* dummy commit to RUN BENCHMARKS her
* set info_values in rollout worker in her RUN BENCHMARKS her
* bug in rollout_new.py RUN BENCHMARKS her
* fixed bug in rollout_new.py RUN BENCHMARKS her
* do not use last step because vecenv calls reset and returns obs after reset RUN BENCHMARKS her
* updated buffer sizes RUN BENCHMARKS her
* fixed loading/saving via joblib
* dust off learning from demonstrations in HER, docs, refactor
* add deprecation notice on her play and plot files
* address comments by Matthias
* joshim5 changes (width and height to WarpFrame wrapper)
* match network output with action distribution via a linear layer only if necessary (#167)
* support color vs. grayscale option in WarpFrame wrapper (#166)
* support color vs. grayscale option in WarpFrame wrapper
* Support color in other wrappers
* Updated per Peters suggestions
* fixing test failures
* ppo2 with microbatches (#168)
* pass microbatch_size to the model during construction
* microbatch fixes and test (#169)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* Peterz joshim5 subclass ppo2 model (#170)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* subclassing the model to make microbatched version of model WIP
* made microbatched model a subclass of ppo2 Model
* flake8 complaint
* mpi-less ppo2 (resolving merge conflict)
* flake8 and mpi4py imports in ppo2/model.py
* more un-mpying
* merge master
* updates to the benchmark viewer code + autopep8 (#184)
* viz docs and syntactic sugar wip
* update viewer yaml to use persistent volume claims
* move plot_util to baselines.common, update links
* use 1Tb hard drive for results viewer
* small updates to benchmark vizualizer code
* autopep8
* autopep8
* any folder can be a benchmark
* massage games image a little bit
* fixed --preload option in app.py
* remove preload from run_viewer.sh
* remove pdb breakpoints
* update bench-viewer.yaml
* fixed bug (#185)
* fixed bug
it's wrong to do the else statement, because no other nodes would start.
* changed the fix slightly
* Added parameter documentation
This parameter was thus far not documented and is non-intuitive when unfamiliar with tf.
* Added parameter documentation