* joshim5 changes (width and height to WarpFrame wrapper)
* match network output with action distribution via a linear layer only if necessary (#167)
* support color vs. grayscale option in WarpFrame wrapper (#166)
* support color vs. grayscale option in WarpFrame wrapper
* Support color in other wrappers
* Updated per Peters suggestions
* fixing test failures
* ppo2 with microbatches (#168)
* pass microbatch_size to the model during construction
* microbatch fixes and test (#169)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* Peterz joshim5 subclass ppo2 model (#170)
* microbatch fixes and test
* tiny cleanup
* added assertions to the test
* vpg-related fix
* subclassing the model to make microbatched version of model WIP
* made microbatched model a subclass of ppo2 Model
* flake8 complaint
* mpi-less ppo2 (resolving merge conflict)
* flake8 and mpi4py imports in ppo2/model.py
* more un-mpying
* merge master
* updates to the benchmark viewer code + autopep8 (#184)
* viz docs and syntactic sugar wip
* update viewer yaml to use persistent volume claims
* move plot_util to baselines.common, update links
* use 1Tb hard drive for results viewer
* small updates to benchmark vizualizer code
* autopep8
* autopep8
* any folder can be a benchmark
* massage games image a little bit
* fixed --preload option in app.py
* remove preload from run_viewer.sh
* remove pdb breakpoints
* update bench-viewer.yaml
* fixed bug (#185)
* fixed bug
it's wrong to do the else statement, because no other nodes would start.
* changed the fix slightly
* Refactor her phase 1 (#194)
* add monitor to the rollout envs in her RUN BENCHMARKS her
* Slice -> Slide in her benchmarks RUN BENCHMARKS her
* run her benchmark for 200 epochs
* dummy commit to RUN BENCHMARKS her
* her benchmark for 500 epochs RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her
* disable saving of policies in her benchmark RUN BENCHMARKS her
* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch
* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch
* launcher refactor wip
* wip
* her works on FetchReach
* her runner refactor RUN BENCHMARKS Fetch1M
* unit test for her
* fixing warnings in mpi_average in her, skip test_fetchreach if mujoco is not present
* pickle-based serialization in her
* remove extra import from subproc_vec_env.py
* investigating differences in rollout.py
* try with old rollout code RUN BENCHMARKS her
* temporarily use DummyVecEnv in cmd_util.py RUN BENCHMARKS her
* dummy commit to RUN BENCHMARKS her
* set info_values in rollout worker in her RUN BENCHMARKS her
* bug in rollout_new.py RUN BENCHMARKS her
* fixed bug in rollout_new.py RUN BENCHMARKS her
* do not use last step because vecenv calls reset and returns obs after reset RUN BENCHMARKS her
* updated buffer sizes RUN BENCHMARKS her
* fixed loading/saving via joblib
* dust off learning from demonstrations in HER, docs, refactor
* add deprecation notice on her play and plot files
* address comments by Matthias
* 1.5 months of codegen changes (#196)
* play with resnet
* feed_dict version
* coinrun prob and more stats
* fixes to get_choices_specs & hp search
* minor prob fixes
* minor fixes
* minor
* alternative version of rl_algo stuff
* pylint fixes
* fix bugs, move node_filters to soup
* changed how get_algo works
* change how get_algo works, probably broke all tests
* continue previous refactor
* get eval_agent running again
* fixing tests
* fix tests
* fix more tests
* clean up cma stuff
* fix experiment
* minor changes to eval_agent to make ppo_metal use gpu
* make dict space work
* modify mac makefile to use conda
* recurrent layers
* play with bn and resnets
* minor hp changes
* minor
* got rid of use_fb argument and jtft (joint-train-fine-tune) functionality
built test phase directly into AlgoProb
* make new rl algos generateable
* pylint; start fixing tests
* fixing tests
* more test fixes
* pylint
* fix search
* work on search
* hack around infinite loop caused by scan
* algo search fixes
* misc changes for search expt
* enable annealing, overriding options of Op
* pylint fixes
* identity op
* achieve use_last_output through masking so it automatically works in other distributions
* fix tests
* minor
* discrete
* use_last_output to be just a preference, not a hard constraint
* pred delay, pruning
* require nontrivial inputs
* aliases for get_sm
* add probname to probs
* fixes
* small fixes
* fix tests
* fix tests
* fix tests
* minor
* test scripts
* dualgru network improvements
* minor
* work on mysterious bugs
* rcall gpu-usage command for kube
* use cache dir that’s not in code folder, so that it doesn’t get removed by rcall code rsync
* add power mode to gpu usage
* make sure train/test actually different
* remove VR for now
* minor fixes
* simplify soln_db
* minor
* big refactor of mpi eda
* improve mpieda for multitask
* - get rid of timelimit hack
- add __del__ to cleanup SubprocVecEnv
* get multitask working better
* fixes
* working on atari, various
* annotate ops with whether they’re parametrized
* minor
* gym version
* rand atari prob
* minor
* SolnDb bugfix and name change
* pyspy script
* switch conv layers
* fix roboschool/bullet3
* nenvs assertion
* fix rand atari
* get rid of blanket exception catching
fix soln_db bug
* fix rand_atari
* dynamic routing as cmdline arg
* slight modifications to test_mpi_map and pyspy-all
* max_tries argument for run_until_successs
* dedup option in train_mle
* simplify soln_db
* increase atari horizon for 1 experiment
* start implementing reward increment
* ent multiplier
* create cc dsl
other misc fixes
* cc ops
* q_func -> qs in rl_algos_cc.py
* fix PredictDistr
* rl_ops_cc fixes, MakeAction op
* augment algo agent to support cc stuff
* work on ddpg experiments
* fix blocking
temporarily change logger
* allow layer scaling
* pylint fixes
* spawn_method
* isolate ddpg hacks
* improve pruning
* use spawn for subproc
* remove use of python -c in rcall
* fix pylint warning
* fix static
* maybe fix local backend
* switch to DummyVecEnv
* making some fixes via pylint
* pylint fixes
* fixing tests
* fix tests
* fix tests
* write scaffolding for SSL in Codegen
* logger fix
* fix error
* add EMA op to sl_ops
* save many changes
* save
* add upsampler
* add sl ops, enhance state machine
* get ssl search working — some gross hacking
* fix session/graph issue
* fix importing
* work on mle
* - scale embeddings in gru model
- better exception handling in sl_prob
- use emas for test/val
- use non-contrib batch_norm layer
* improve logging
* option to average before dumping in logger
* default arguments, etc
* new ddpg and identity test
* concat fix
* minor
* move realistic ssl stuff to third-party (underscore to dash)
* fixes
* remove realistic_ssl_evaluation
* pylint fixes
* use gym master
* try again
* pass around args without gin
* fix tests
* separate line to install gym
* rename failing tests that should be ignored
* add data aug
* ssl improvements
* use fixed time limit
* try to fix baselines tests
* add score_floor, max_walltime, fiddle with lr decay
* realistic_ssl
* autopep8
* various ssl
- enable blocking grad for simplification
- kl
- multiple final prediction
* fix pruning
* misc ssl stuff
* bring back linear schedule, don’t use allgather for collecting stats
(i’ve been getting nondeterministic errors from the old code)
* save/load weights in SSL, big stepsize
* cleanup SslProb
* fix
* get rid of kl coef
* fix simplification, lower lr
* search over hps
* minor fixes
* minor
* static analysis
* move files and rename things for improved consistency.
still broken, and just saving before making nontrivial changes
* various
* make tests pass
* move coinrun_train to codegen since it depends on codegen
* fixes
* pylint fixes
* improve tests
fix some things
* improve tests
* lint
* fix up db_info.py, tests
* mostly restore master version of envs directory, except for makefile changes
* fix tests
* improve printing
* minor fixes
* fix fixmes
* pruning test
* fixes
* lint
* write new test that makes tf graphs of random algos; fix some bugs it caught
* add —delete flag to rcall upload-code command
* lint
* get cifar10 lazily for testing purposes
* disable codegen ci tests for now
* clean up rl_ops
* rename spec classes
* td3 with identity test
* identity tests without gin files
* remove gin.configurable from AlgoAgent
* comments about reduction in rl_ops_cc
* address @pzhokhov comments
* fix tests
* more linting
* better tests
* clean up filtering a bit
* fix concat
* delayed logger configuration (#208)
* delayed logger configuration
* fix typo
* setters and getters for Logger.DEFAULT as well
* do away with fancy property stuff - unable to get it to work with class level methods
* grammar and spaces
* spaces
* use get_current function instead of reading Logger.CURRENT
* autopep8
* disable mpi in subprocesses (#213)
* lazy_mpi load
* cleanups
* more lazy mpi
* don't pretend that class is a module, just use it as a class
* mass-replace mpi4py imports
* flake8
* fix previous lazy_mpi imports
* silly recursion
* try os.environ hack
* better prefix test, work with mpich
* restored MPI imports
* removed commented import in test_with_mpi
* restored codegen from master
* remove lazy mpi
* restored changes from rl-algs
* remove extra files
* address Chris' comments
* use spawn for shmem vec env as well (#2) (#219)
* lazy_mpi load
* cleanups
* more lazy mpi
* don't pretend that class is a module, just use it as a class
* mass-replace mpi4py imports
* flake8
* fix previous lazy_mpi imports
* silly recursion
* try os.environ hack
* better prefix test, work with mpich
* restored MPI imports
* removed commented import in test_with_mpi
* restored codegen from master
* remove lazy mpi
* restored changes from rl-algs
* remove extra files
* port mpi fix to shmem vec env
* increase the mpi test default timeout
* change humanoid hyperparameters, get rid of clip_Frac annealing, as it's apparently dangerous
* remove clip_frac schedule from ppo2
* more timesteps in humanoid run
* whitespace + RUN BENCHMARKS
* baselines: export vecenvs from folder (#221)
* baselines: export vecenvs from folder
* put missing function back in
* add missing imports
* more imports
* longer mpi timeout?
* make default logger configuration the same as call to logger.configure() (#222)
* Vecenv refactor (#223)
* update karl util
* restore pvi flag
* change rcall auto cpu behavior, move gin.configurable, add os.makedirs
* vecenv refactor
* aux buf index fix
* add num aux obs
* reset level with enter
* restore high difficulty flag
* bugfix
* restore train_coinrun.py
* tweaks
* renaming
* renaming
* better arguments handling
* more options
* options cleanup
* game data refactor
* more options
* args for train_procgen
* add close handler to interactive base class
* use debug build if debug=True, fix range on aux_obs
* add ProcGenEnv to __init__.py, add missing imports to procgen.py
* export RemoveDictWrapper and build, update train_procgen.py, move assets download into env creation and replace init_assets_and_build with just build
* fix formatting issues
* only call global init once
* fix path in setup.py
* revert part of makefile
* ignore IDE files and folders
* vec remove dict
* export VecRemoveDictObs
* remove RemoveDictWrapper
* remove IDE files
* move shared .h and .cpp files to common folder, update build to use those, dedupe env.cpp
* fix missing header
* try unified build function
* remove old scripts dir
* add comment on build
* upload libenv with render fixes
* tell qthreads to die when we unload the library
* pyglet.app.run is garbage
* static fixes
* whoops
* actually vsync is on
* cleanup
* cleanup
* extern C for libenv interface
* parse util rcall arg
* high difficulty fix
* game type enums
* ProcGenEnv subclasses
* game type cleanup
* unrecognized key
* unrecognized game type
* parse util reorg
* args management
* typo fix
* GinParser
* arg tweaks
* tweak
* restore start_level/num_levels setting
* fix create_procgen_env interface
* build fix
* procgen args in init signature
* fix
* build fix
* fix logger usage in ppo_metal/run_retro
* removed unnecessary OrderedDict requirement in subproc_vec_env
* flake8 fix
* allow for non-mpi tests
* mpi test fixes
* flake8; removed special logic for discrete spaces in dummy_vec_env
* remove forked argument in front of tests - does not play nicely with subprocvecenv in spawned processes; analog of forked in ddpg/test_smoke
* Everyrl initial commit & a few minor baselines changes (#226)
* everyrl initial commit
* add keep_buf argument to VecMonitor
* logger changes: set_comm and fix to mpi_mean functionality
* if filename not provided, don't create ResultsWriter
* change variable syncing function to simplify its usage. now you should initialize from all mpi processes
* everyrl coinrun changes
* tf_distr changes, bugfix
* get_one
* bring back get_next to temporarily restore code
* lint fixes
* fix test
* rename profile function
* rename gaussian
* fix coinrun training script
* change random seeding to work with new gym version (#231)
* change random seeding to work with new gym version
* move seeding to seed() method
* fix mnistenv
* actually try some of the tests before pushing
* more deterministic fixed seq
* misc changes to vecenvs and run.py for benchmarks (#236)
* misc changes to vecenvs and run.py for benchmarks
* dont seed global gen
* update more references to assert_venvs_equal
* Rl19 (#232)
* everyrl initial commit
* add keep_buf argument to VecMonitor
* logger changes: set_comm and fix to mpi_mean functionality
* if filename not provided, don't create ResultsWriter
* change variable syncing function to simplify its usage. now you should initialize from all mpi processes
* everyrl coinrun changes
* tf_distr changes, bugfix
* get_one
* bring back get_next to temporarily restore code
* lint fixes
* fix test
* rename profile function
* rename gaussian
* fix coinrun training script
* rl19
* remove everyrl dir which appeared in the merge for some reason
* readme
* fiddle with ddpg
* make ddpg work
* steps_total argument
* gpu count
* clean up hyperparams and shape math
* logging + saving
* configuration stuff
* fixes, smoke tests
* fix stats
* make load_results return dicts -- easier to create the same kind of objects with some other mechanism for passing to downstream functions
* benchmarks
* fix tests
* add dqn to tests, fix it
* minor
* turned annotated transformer (pytorch) into a script
* more refactoring
* jax stuff
* cluster
* minor
* copy & paste alec code
* sign error
* add huber, rename some parameters, snapshotting off by default
* remove jax stuff
* minor
* move maze env
* minor
* remove trailing spaces
* remove trailing space
* lint
* fix test breakage due to gym update
* rename function
* move maze back to codegen
* get recurrent ppo working
* enable both lstm and gru
* script to print table of benchmark results
* various
* fix dqn
* add fixup initializer, remove lastrew
* organize logging stats
* fix silly bug
* refactor models
* fix mpi usage
* check sync
* minor
* change vf coef, hps
* clean up slicing in ppo
* minor fixes
* caching transformer
* docstrings
* xf fixes
* get rid of 'B' and 'BT' arguments
* minor
* transformer example
* remove output_kind from base class until we have a better idea how to use it
* add comments, revert maze stuff
* flake8
* codegen lint
* fix codegen tests
* responded to peter's comments
* lint fixes
* minor changes to baselines (#243)
* minor changes to baselines
* fix spaces reference
* remove flake8 disable comments and fix import
* okay maybe don't add spec to vec_env
* Merge branch 'master' of github.com:openai/games
the commit.
* flake8 complaints in baselines/her
* update dmlab30 env (#258)
* codegen continuous control experiment pr (#256)
* finish cherry-pick td3 test commit
* removed graph simplification error ingore
* merge delayed logger config
* merge updated baselines logger
* lazy_mpi load
* cleanups
* use lazy mpi imports in codegen
* more lazy mpi
* don't pretend that class is a module, just use it as a class
* mass-replace mpi4py imports
* flake8
* fix previous lazy_mpi imports
* removed extra printouts from TdLayer op
* silly recursion
* running codegen cc experiment
* wip
* more wip
* use actor is input for critic targets, instead of the action taken
* batch size 100
* tweak update parameters
* tweaking td3 runs
* wip
* use nenvs=2 for contcontrol (to be comparable with ppo_metal)
* wip. Doubts about usefulness of actor in critic target
* delayed actor in ActorLoss
* score is average of last 100
* skip lack of losses or too many action distributions
* 16 envs for contcontrol, replay buffer size equal to horizon (no point in making it longer)
* syntax
* microfixes
* minifixes
* run in process logic to bypass tensorflow freezes/failures (per Oleg's suggestion)
* squash-merge master, resolve conflicts
* remove erroneous file
* restore normal MPI imports
* move wrappers around a little bit
* autopep8
* cleanups
* cleanup mpi_eda, autopep8
* make activation function of action distribution customizable
* cleanups; preparation for a pr
* syntax
* merge latest master, resolve conflicts
* wrap MPI import with try/except
* allow import of modules through env id im baselines cmd_util
* flake8 complaints
* only wrap box action spaces with ClipActionsWrapper
* flake8
* fixes to algo_prob according to Oleg's suggestions
* use apply_without_scope flag in ActorLoss
* remove extra line in algo/core.py
* Rl19 metalearning (#261)
* rl19 metalearning and dict obs
* master merge arch fix
* lint fixes
* view fixes
* load vars tweaks
* user config cleanup
* documentation and revisions
* pass train comm to rl19
* cleanup
* Symshapes - gives codegen ability to evaluate same algo on envs with different ob/ac shapes (#262)
* finish cherry-pick td3 test commit
* removed graph simplification error ingore
* merge delayed logger config
* merge updated baselines logger
* lazy_mpi load
* cleanups
* use lazy mpi imports in codegen
* more lazy mpi
* don't pretend that class is a module, just use it as a class
* mass-replace mpi4py imports
* flake8
* fix previous lazy_mpi imports
* removed extra printouts from TdLayer op
* silly recursion
* running codegen cc experiment
* wip
* more wip
* use actor is input for critic targets, instead of the action taken
* batch size 100
* tweak update parameters
* tweaking td3 runs
* wip
* use nenvs=2 for contcontrol (to be comparable with ppo_metal)
* wip. Doubts about usefulness of actor in critic target
* delayed actor in ActorLoss
* score is average of last 100
* skip lack of losses or too many action distributions
* 16 envs for contcontrol, replay buffer size equal to horizon (no point in making it longer)
* syntax
* microfixes
* minifixes
* run in process logic to bypass tensorflow freezes/failures (per Oleg's suggestion)
* random physics for mujoco
* random parts sizes with range 0.4
* add notebook with results into x/peterz
* variations of ant
* roboschool use gym.make kwargs
* use float as lowest score after rank transform
* rcall from master
* wip
* re-enable dynamic routing
* wip
* squash-merge master, resolve conflicts
* remove erroneous file
* restore normal MPI imports
* move wrappers around a little bit
* autopep8
* cleanups
* cleanup mpi_eda, autopep8
* make activation function of action distribution customizable
* cleanups; preparation for a pr
* syntax
* merge latest master, resolve conflicts
* wrap MPI import with try/except
* allow import of modules through env id im baselines cmd_util
* flake8 complaints
* only wrap box action spaces with ClipActionsWrapper
* flake8
* fixes to algo_prob according to Oleg's suggestions
* use apply_without_scope flag in ActorLoss
* remove extra line in algo/core.py
* multi-task support
* autopep8
* symbolic suffix-shapes (not B,T yet)
* test_with_mpi -> with_mpi rename
* remove extra blank lines in algo/core
* remove extra blank lines in algo/core
* remove more blank lines
* symbolify shapes in existing algorithms
* minor output changes
* cleaning up merge conflicts
* cleaning up merge conflicts
* cleaning up more merge conflicts
* restore mpi_map.py from master
* remove tensorflow dependency from VecEnv
* make tests use single-threaded session for determinism of KfacOptimizer (#298)
* make tests use single-threaded session for determinism of KfacOptimizer
* updated comment in kfac.py
* remove unused sess_config
* add score calculator wrapper, forward property lookups on vecenv wrap… (#300)
* add score calculator wrapper, forward property lookups on vecenv wrapper, misc cleanup
* tests
* pylint
* fix vec monitor infos
* Workbench (#303)
* begin workbench
* cleanup
* begin procgen config integration
* arg tweaks
* more args
* parameter saving
* begin procgen enjoy
* tweaks
* more workbench
* more args sync/restore
* cleanup
* merge in master
* rework args priority
* more workbench
* more loggign
* impala cnn
* impala lstm
* tweak
* tweaks
* rl19 time logging
* misc fixes
* faster pipeline
* update local.py
* sess and log config tweaks
* num processes
* logging tweaks
* difficulty reward wrapper
* logging fixes
* gin tweaks
* tweak
* fix
* task id
* param loading
* more variable loading
* entrypoint
* tweak
* ksync
* restore lstm
* begin rl19 support
* tweak
* rl19 rnn
* more rl19 integration
* fix
* cleanup
* restore rl19 rnn
* cleanup
* cleanup
* wrappers.get_log_info
* cleanup
* cleanup
* directory cleanup
* logging, num_experiments
* fixes
* cleanup
* gin fixes
* fix local max gpu
* resid nx
* num machines and download params
* rename
* cleanup
* create workbench
* more reorg
* fix
* more logging wrappers
* lint fix
* restore train procgen
* restore train procgen
* pylint fix
* better wrapping
* config sweep
* args sweep
* test workers
* mpi_weight
* train test comm and high difficulty fix
* enjoy show returns
* removing gin, procgen_parser
* removing gin
* procgen args
* config fixes
* cleanup
* cleanup
* procgen args fix
* fix
* rcall syncing
* lint
* rename mpi_weight
* use username for sync
* fixes
* microbatch fix
* Grad clipping in MpiAdamOptimizer, transformer changes (#304)
* transformer mnist experiments
* version that only builds one model
* work on inverted mnist
* Add grad clipping to MpiAdamOptimizer
* various
* transformer changes, loading
* get rid of soft labels
* transformer baseline
* minor
* experiments involving all possible training sets
* vary training
* minor
* get ready for fine-tuning expers
* lint
* minor
* Add jrl19 as backend for workbench (#324)
enable jrl in workbench
minor logger changes
* extra functionality in baselines.common.plot_util (#310)
* get plot_util from mt_experiments branch
* add labels
* unit tests for plot_util
* Fixed sequence env minor (#333)
minor changes to FixedSequenceEnv to allow full score
* fix tests (#335)
* Procgen Benchmark Updates (#328)
* directory cleanup
* logging, num_experiments
* fixes
* cleanup
* gin fixes
* fix local max gpu
* resid nx
* tweak
* num machines and download params
* rename
* cleanup
* create workbench
* more reorg
* fix
* more logging wrappers
* lint fix
* restore train procgen
* restore train procgen
* pylint fix
* better wrapping
* whackamole walls
* config sweep
* tweak
* args sweep
* tweak
* test workers
* mpi_weight
* train test comm and high difficulty fix
* enjoy show returns
* better joint training
* tweak
* Add —update to args and add gin-config to requirements.txt
* add username to download_file
* removing gin, procgen_parser
* removing gin
* procgen args
* config fixes
* cleanup
* cleanup
* procgen args fix
* fix
* rcall syncing
* lint
* rename mpi_weight
* begin composable game
* more composable game
* tweak
* background alpha
* use username for sync
* fixes
* microbatch fix
* lure composable game
* merge
* proc trans update
* proc trans update (#307)
* finetuning experiment
* Change is_local to use `use_rcall` and fix error of `enjoy.py` with multiple ends
* graphing help
* add --local
* change args_dict['env_name'] to ENV_NAME
* finetune experiments
* tweak
* tweak
* reorg wrappers, remove is_local
* workdir/local fixes
* move finetune experiments
* default dir and graphing
* more graphing
* fix
* pooled syncing
* tweaks
* dir fix
* tweak
* wrapper mpi fix
* wind and turrets
* composability cleanup
* radius cleanup
* composable reorg
* laser gates
* composable tweaks
* soft walls
* tweak
* begin swamp
* more swamp
* more swamp
* fix
* hidden mines
* use maze layout
* tweak
* laser gate tweaks
* tweaks
* tweaks
* lure/propel updates
* composable midnight
* composable coinmaze
* composability difficulty
* tweak
* add step to save_params
* composable offsets
* composable boxpush
* composable combiner
* tweak
* tweak
* always choose correct number of mechanics
* fix
* rcall local fix
* add steps when dump and save parmas
* loading rank 1,2,3.. error fix
* add experiments.py
* fix loading latest weight with no -rest
* support more complex run_id and add more examples
* fix typo
* move post_run_id into experiments.py
* add hp_search example
* error fix
* joint experiments in progress
* joint hp finished
* typo
* error fix
* edit experiments
* Save experiments set up in code and save weights per step (#319)
* add step to save_params
* add steps when dump and save parmas
* loading rank 1,2,3.. error fix
* add experiments.py
* fix loading latest weight with no -rest
* support more complex run_id and add more examples
* fix typo
* move post_run_id into experiments.py
* add hp_search example
* error fix
* joint experiments in progress
* joint hp finished
* typo
* error fix
* edit experiments
* tweaks
* graph exp WIP
* depth tweaks
* move save_all
* fix
* restore_dir name
* restore depth
* choose max mechanics
* use override mode
* tweak frogger
* lstm default
* fix
* patience is composable
* hunter is composable
* fixed asset seed cleanup
* minesweeper is composable
* eggcatch is composable
* tweak
* applesort is composable
* chaser game
* begin lighter
* lighter game
* tractor game
* boxgather game
* plumber game
* hitcher game
* doorbell game
* lawnmower game
* connecter game
* cannonaim
* outrun game
* encircle game
* spinner game
* tweak
* tweak
* detonator game
* driller
* driller
* mixer
* conveyor
* conveyor game
* joint pcg experiments
* fixes
* pcg sweep experiment
* cannonaim fix
* combiner fix
* store save time
* laseraim fix
* lightup fix
* detonator tweaks
* detonator fixes
* driller fix
* lawnmower calibration
* spinner calibration
* propel fix
* train experiment
* print load time
* system independent hashing
* remove gin configurable
* task ids fix
* test_pcg experiment
* connecter dense reward
* hard_pcg
* num train comms
* mpi splits envs
* tweaks
* tweaks
* graph tweaks
* graph tweaks
* lint fix
* fix tests
* load bugfix
* difficulty timeout tweak
* tweaks
* more graphing
* graph tweaks
* tweak
* download file fix
* pcg train envs list
* cleanup
* tweak
* manually name impala layers
* tweak
* expect fps
* backend arg
* args tweak
* workbench cleanup
* move graph files
* workbench cleanup
* split env name by comma
* workbench cleanup
* ema graph
* remove Dict
* use tf.io.gfile
* comments for auto-killing jobs
* lint fix
* write latest file when not saving all and load it when step=None
* ci/runtests.sh - pass all folders to pytest (#342)
* ci/runtests.sh - pass all folders to pytest
* mpi_optimizer_test precision 1e-4
* fixes to tests
* search for tests in the entire jax folder, also remove unnecessary humor
* delete unnecessary stuff (#338)
* Add initializer for process-level setup in SubprocVecEnv (#276)
* Add initializer for process-level setup in SubprocVecEnv
Use case: run logger.configure() in each subprocess
* Add option to force dummy vec env
* Procgen fixes (#352)
* tweak
* documentation
* rely on log_comm, remove mpi averaging from wrappers
* pass comm for ppo2 initialization
* ppo2 logging
* experiment tweaks
* auto launch tensorboard when using local backend
* graph tweaks
* pass caller to config
* configure logger and tensorboard
* make parent dir if necessary
* parentdir tweak
* JRL PPO test with delayed identity env (#355)
* add a custom delay to identity_env
* min reward 0.8 in delayed identity test
* seed the tests, perfect score on delayed_identity_test
* delay=1 in delayed_identity_test
* flake8 complaints
* increased number of steps in fixed_seq_test
* seed identity tests to ensure reproducibility
* docstrings
* (onp, np) -> (np, jp), switch jax code to use mark_slow decorator (#363)
switch to mark_slow decorator
* fix tests - add matplotlib to setup_requires, put mpi4py import in try-except
* test fixes
* Recognize nightly tf builds
* Use LooseVersion instead of StrictVersion to recongnize nightly build numbers
Nightly version numbers are of the form `1.3.0.dev20181215` but it's not a valid version number for `StrictVersion`, while `LooseVersion` still recognizes it.
* viz docs
* writing vizualization docs
* documenting plot_util
* docstrings in plot_util
* autopep8 and flake8
* spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)
* rephrased viz.md a little bit
* more examples of viz code usage in the docs
* make baselines run without mpi wip
* squash-merged latest master
* further removing MPI references where unnecessary
* more MPI removal
* syntax and flake8
* MpiAdam becomes regular Adam if Mpi not present
* autopep8
* add assertion to test in mpi_adam; fix trpo_mpi failure without MPI on cartpole
* mpiless ddpg
* sync internal changes. Make ddpg work with vecenvs
* B -> nenvs for consistency with other algos, small cleanups
* eval_done[d]==True -> eval_done[d]
* flake8 and numpy.random.random_integers deprecation warning
* Merge branch 'master' of github.com:openai/games into peterz_track_baselines_branch
* add some docstrings
* start making big changes
* state machine redesign
* sampling seems to work
* some reorg
* fixed sampling of real vals
* json conversion
* made it possible to register new commands
got nontrivial version of Pred working
* consolidate command definitions
* add more macro blocks
* revived visualization
* rename Userdata -> CmdInterpreter
make AlgoSmInstance subclass of SmInstance that uses appropriate userdata argument
* replace userdata by ci when appropriate
* minor test fixes
* revamped handmade dir, can run ppo_metal
* seed to avoid random test failure
* implement AlgoAgent
* Autogenerated object that performs all ops and macros
* more CmdRecorder changes
* move files around
* move MatchProb and JtftProb
* remove obsolete
* fix tests involving AlgoAgent (pending the next commit on ppo_metal code)
* ppo_metal: reduce duplication in policy_gen, make sess an attribute of PpoAgent and StochasticPolicy instead of using get_default_session everywhere.
* maze_env reformatting, move algo_search script (but stil broken)
* move agent.py
* fix test on handcrafted agents
* tuning/fixing ppo_metal baseline
* minor
* Fix ppo_metal baseline
* Don’t set epcount, tcount unless they’re being used
* get rid of old ppo_metal baseline
* fixes for handmade/run.py tuning
* fix codegen ppo
* fix handmade ppo hps
* fix test, go back to safe_div
* switch to more complex filtering
* make sure all handcrafted algos have finite probability
* train to maximize logprob of provided samples
Trex changes to avoid segfault
* AlgoSm also includes global hyperparams
* don’t duplicate global hyperparam defaults
* create generic_ob_ac_space function
* use sorted list of outkeys
* revive tsne
* todo changes
* determinism test
* todo + test fix
* remove a few deprecated files, rename other tests so they don’t run automatically, fix real test failure
* continuous control with codegen
* continuous control with codegen
* implement continuous action space algodistr
* ppo with trex RUN BENCHMARKS
* wrap trex in a monitor
* dummy commit to RUN BENCHMARKS
* adding monitor to trex env RUN BENCHMARKS
* adding monitor to trex RUN BENCHMARKS
* include monitor into trex env RUN BENCHMARKS
* generate nll and predmean using Distribution node
* dummy commit to RUN BENCHMARKS
* include pybullet into baselines optional dependencies
* dummy commit to RUN BENCHMARKS
* install games for cron rcall user RUN BENCHMARKS
* add --yes flag to install.py in rcall config for cron user RUN BENCHMARKS
* both continuous and discrete versions seem to run
* fixes to monitor to work with vecenv-like info and rewards RUN BENCHMARKS
* dummy commit to RUN BENCHMARKS
* removed shape check from one-hot encoding logic in distributions.CategoricalPd
* reset logger configuration in codegen/handmade/run.py to be in-line with baselines RUN BENCHMARKS
* merged peterz_codegen_benchmarks RUN BENCHMARKS
* skip tests RUN BENCHMARKS
* working on test failures
* save benchmark dicts RUN BENCHMARK
* merged peterz_codegen_benchmark RUN BENCHMARKS
* add get_git_commit_message to the baselines.common.console_util
* dummy commit to RUN BENCHMARKS
* merged fixes from peterz_codegen_benchmark RUN BENCHMARKS
* fixing failure in test_algo_nll WIP
* test_algo_nll passes with both ppo and softq
* re-enabled tests
* run trex on gpus for 100k total (horizon=100k / 16) RUN BENCHMARKS
* merged latest peterz_codegen_benchmarks RUN BENCHMARKS
* fixing codegen test failures (logging-related)
* fixed name collision in run-benchmarks-new.py RUN BENCHMARKS
* fixed name collision in run-benchmarks-new.py RUN BENCHMARKS
* fixed import in node_filters.py
* test_algo_search passes
* some cleanup
* dummy commit to RUN BENCHMARKS
* merge fast fail for subprocvecenv RUN BENCHMARKS
* use SubprocVecEnv in sonic_prob
* added deprecation note to shmem_vec_env
* allow indexing of distributions
* add timeout to pipeline.yaml
* typo in pipeline.yml
* run tests with --forked option
* resolved merge conflict in rl_algs.bench.benchmarks
* re-enable parallel tests
* fix remaining merge conflicts and syntax
* Update trex_prob.py
* fixes to ResultsWriter
* take baselines/run.py from peterz_codegen branch
* actually save stuff to file in VecMonitor RUN BENCHMARKS
* enable parallel tests
* merge stricter flake8
* merge peterz_codegen_benchmark, resolve conflicts
* autopep8
* remove traces of Monitor from trex env, check shapes before encoding in CategoricalPd
* asserts and warnings to make q -> distribution change more explicit
* fixed assert in CategoricalPd
* add header to vec_monitor output file RUN BENCHMARKS
* make VecMonitor write header to the output file
* remove deprecation message from shmem_vec_env RUN BENCHMARKS
* autopep8
* proper shape test in distributions.py
* ResultsWriter can take dict headers
* dummy commit to RUN BENCHMARKS
* replace assert len(qs)==1 with warning RUN BENCHMARKS
* removed pdb from ppo2 RUN BENCHMARKS
* re-setting up travis
* re-setting up travis
* resolved merge conflicts, added missing dependency for codegen
* removed parallel tests (workers are failing for some reason)
* try test baselines only
* added language options - some weirdness in rcall image that requires them?
* added verbosity to tests
* try tests in baselines only
* ci/runtests.sh tests codegen (some failure on baselines specifically on travis, trying to narrow down the problem)
* removed render from codegen test - maybe that's the problem?
* trying even simpler command within the image to figure out the problem
* print out system info in ci/runtests.sh
* print system info outside of docker as well
* trying single test file in codegen
* install graphviz in the docker image
* git subrepo pull baselines
subrepo:
subdir: "baselines"
merged: "8c2aea2"
upstream:
origin: "git@github.com:openai/baselines.git"
branch: "master"
commit: "8c2aea2"
git-subrepo:
version: "0.4.0"
origin: "git@github.com:ingydotnet/git-subrepo.git"
commit: "74339e8"
* added graphviz to the dockerfile (need both graphviz-dev and graphviz)
* only tests in codegen/algo/test_algo_builder.py
* run baselines tests only. still no clue why collection of codegen tests fails
* update baselines setup to install filelock for tests
* run slow tests
* skip slow tests in baselines
* single test file in baselines
* try reinstalling tensorflow
* running slow tests
* try full baselines and codegen test suite
* in the test Dockerfile, reinstall tensorflow
* using fake display for codegen render tests
* fixed display-related failures by adding a custom entrpoint to the docker image
* set LC_ALL and LANG env variables in docker image
* try sequential tests
* include psutil in requirements; increase relative tolerance in test_low_level_algo_distr
* trying to fix codegen failures on travis
* git subrepo commit (merge) baselines
subrepo:
subdir: "baselines"
merged: "9ce84da"
upstream:
origin: "git@github.com:openai/baselines.git"
branch: "master"
commit: "b222dd0"
git-subrepo:
version: "0.4.0"
origin: "git@github.com:ingydotnet/git-subrepo.git"
commit: "74339e8"
* syntax in install.py
* changing the order of package installation
* removed supervised-reptile from installation list
* cron uses the full games repo in rcall
* flake8 complaints
* rewrite all extras logic in baselines, install.py always uses [all]
* exported rl-algs
* more stuff from rl-algs
* run slow tests
* re-exported rl_algs
* re-exported rl_algs - fixed problems with serialization test and test_cartpole
* replaced atari_arg_parser with common_arg_parser
* run.py can run algos from both baselines and rl_algs
* added approximate humanoid reward with ppo2 into the README for reference
* dummy commit to RUN BENCHMARKS
* dummy commit to RUN BENCHMARKS
* dummy commit to RUN BENCHMARKS
* dummy commit to RUN BENCHMARKS
* very dummy commit to RUN BENCHMARKS
* serialize variables as a dict, not as a list
* running_mean_std uses tensorflow variables
* fixed import in vec_normalize
* dummy commit to RUN BENCHMARKS
* dummy commit to RUN BENCHMARKS
* flake8 complaints
* save all variables to make sure we save the vec_normalize normalization
* benchmarks on ppo2 only RUN BENCHMARKS
* make_atari_env compatible with mpi
* run ppo_mpi benchmarks only RUN BENCHMARKS
* hardcode names of retro environments
* add defaults
* changed default ppo2 lr schedule to linear RUN BENCHMARKS
* non-tf normalization benchmark RUN BENCHMARKS
* use ncpu=1 for mujoco sessions - gives a bit of a performance speedup
* reverted running_mean_std to user property decorators for mean, var, count
* reverted VecNormalize to use RunningMeanStd (no tf)
* reverted VecNormalize to use RunningMeanStd (no tf)
* profiling wip
* use VecNormalize with regular RunningMeanStd
* added acer runner (missing import)
* flake8 complaints
* added a note in README about TfRunningMeanStd and serialization of VecNormalize
* dummy commit to RUN BENCHMARKS
* merged benchmarks branch