347 Commits

Author SHA1 Message Date
Harry Uglow
ea25b9e8b2 Monitor should close what it inherits (#1076) 2020-01-31 05:06:18 -08:00
pzhokhov
9ee399f5b2 Fix build with latest gym (#1034)
* update to use latest version of gym

* fix imports

* narrow down gym version to 0.15.4 <= gym < 0.16.0
2019-11-10 11:10:01 -08:00
Tomasz Wrona
391811d98c SubprocVecEnv uses CloudpickleWrapper to send specs (#1028) 2019-11-08 15:23:49 -08:00
Yen-Chen Lin
665b888eeb Fix behavior cloning due to API changes (#1014) 2019-10-25 15:44:43 -07:00
Christopher Hesse
f40a477a17 fix tf2 branch name 2019-10-25 15:27:30 -07:00
johannespitz
c6144bdb6a Fix RuntimeError (#910) (#1015)
* Update the commands to install Tensorflow

The current 'tensorflow' package is for Tensorflow 2, which is not supported by the master branch of baselines.

* Update command to install Tensorflow 1.14

* Fix RuntimeError (#910)

 - Removed interfering calls to env.reset() in play mode.
   (Note that the worker in the subprocess is calling env.reset() already)

 - Fixed the printed reward when running multiple envs in play mode.
2019-10-25 15:24:41 -07:00
Peter Zhokhov
adba88b218 add quote marks to tensorflow < 2 to avoid bash logic 2019-10-11 17:13:43 -07:00
Peter Zhokhov
bfbc3bae14 update status, fix the tensorflow version in the build 2019-10-11 15:23:14 -07:00
Haiyang Chen
f703776c91 fix a bug in acer saving and loading model (#990) 2019-09-27 15:39:41 -07:00
pzhokhov
53797293e5 use allreduce instead of Allreduce (send pickled data instead of floats) - probably affects performance somewhat, but avoid element number mismatch. Fixes 998 (#1000) 2019-09-27 14:45:31 -07:00
tanzhenyu
229a772b81 Release notes for Tensorflow 2.0 support. (#997) 2019-08-29 14:25:44 -07:00
Tomasz Wrona
d80b075904 Make SubprocVecEnv works with DummyVecEnv (#908)
* Make SubprocVecEnv works with DummyVecEnv (nested environments for synchronous sampling)

* SubprocVecEnv now supports running environments in series in each process

* Added docstring to the test definition

* Added additional test to check, whether SubprocVecEnv results with the same output when in_series parameter is enabled and not

* Added more test cases for in_series parameter

* Refactored worker function, added docstring for in_series parameter

* Remove check for TF presence in setup.py
2019-08-29 12:16:25 -07:00
NicoBach
0182fe1877 entrypoint variable made public (#970) 2019-08-06 02:03:19 +03:00
Seungjae Ryan Lee
1fb4dfb780 Fix typo in GAIL dataset log (#950) 2019-08-06 02:02:43 +03:00
Timo Kaufmann
7cadef715f Fix typo (#930)
* Fix typo

* Fix train_freq documentation

Seems to be a copy-paste error, train_freq has nothing to do with
printing.

* Fix documentation typo
2019-08-06 02:02:21 +03:00
tanzhenyu
fce4370ba2 Remove duplicate code in adaptive param noise. (#976) 2019-08-06 02:01:54 +03:00
tanzhenyu
c57528573e Remove model def from deepq. (#946) 2019-06-27 10:12:38 -07:00
Marcin Michalski
2bca7901f5 Updating the version to 0.1.6 (#933)
Updating the version in setup.py to avoid conflict with the old (>1 year old) version in pypi.
2019-06-24 10:19:01 -07:00
albert
ba2b017820 add log_path flag to command line utility (#917)
* add log_path flag to command line utility

* Update README with log_path flag

* clarify logg and viz docs
2019-06-07 15:05:52 -07:00
Anton Grigoryev
7c520852d9 Fix converting list of LazyFrames to ndarray (#907) 2019-05-31 16:49:46 -07:00
pzhokhov
1c872ca8fd run test_monitor through pytest; fix the test, add flake8 to bench direectory - like PR 891 (#921) 2019-05-31 15:36:20 -07:00
Jinho Lee
ff8d36a7a7 Starting to reassign waiting_step in shmem_vecenv (#915)
"self.waiting_step" is initialized in __init__ function but it is not reassigned anywhere.
Because it is used in reset function and close_extras function, it should be fixed.
So i fixed it to be similar with subproc_vec_env's one.
2019-05-31 14:31:35 -07:00
Sridhar Thiagarajan
7614b02f7a remove f strings for python back compatibility (#906) 2019-05-31 14:27:11 -07:00
Andy Twigg
f7d5a265e1 suppress excessive messages from unused loggers (#920)
Only print the "logging to [dir]" message when the logger has something to output. Running with the new spawn change and both mpi and subprocvecenv, there are many "logging to [dir]" messages but most are not logging anything.
2019-05-31 14:26:45 -07:00
Joshua Meier
21776e8f57 Support Tuple observation spaces (#911) 2019-05-31 14:06:20 -07:00
pzhokhov
9b68103b73 release Internal changes (#895)
* joshim5 changes (width and height to WarpFrame wrapper)

* match network output with action distribution via a linear layer only if necessary (#167)

* support color vs. grayscale option in WarpFrame wrapper (#166)

* support color vs. grayscale option in WarpFrame wrapper

* Support color in other wrappers

* Updated per Peters suggestions

* fixing test failures

* ppo2 with microbatches (#168)

* pass microbatch_size to the model during construction

* microbatch fixes and test (#169)

* microbatch fixes and test

* tiny cleanup

* added assertions to the test

* vpg-related fix

* Peterz joshim5 subclass ppo2 model (#170)

* microbatch fixes and test

* tiny cleanup

* added assertions to the test

* vpg-related fix

* subclassing the model to make microbatched version of model WIP

* made microbatched model a subclass of ppo2 Model

* flake8 complaint

* mpi-less ppo2 (resolving merge conflict)

* flake8 and mpi4py imports in ppo2/model.py

* more un-mpying

* merge master

* updates to the benchmark viewer code + autopep8 (#184)

* viz docs and syntactic sugar wip

* update viewer yaml to use persistent volume claims

* move plot_util to baselines.common, update links

* use 1Tb hard drive for results viewer

* small updates to benchmark vizualizer code

* autopep8

* autopep8

* any folder can be a benchmark

* massage games image a little bit

* fixed --preload option in app.py

* remove preload from run_viewer.sh

* remove pdb breakpoints

* update bench-viewer.yaml

* fixed bug (#185)

* fixed bug 

it's wrong to do the else statement, because no other nodes would start.

* changed the fix slightly

* Refactor her phase 1 (#194)

* add monitor to the rollout envs in her RUN BENCHMARKS her

* Slice -> Slide in her benchmarks RUN BENCHMARKS her

* run her benchmark for 200 epochs

* dummy commit to RUN BENCHMARKS her

* her benchmark for 500 epochs RUN BENCHMARKS her

* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her

* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her

* add num_timesteps to her benchmark to be compatible with viewer RUN BENCHMARKS her

* disable saving of policies in her benchmark RUN BENCHMARKS her

* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch

* run fetch benchmarks with ppo2 and ddpg RUN BENCHMARKS Fetch

* launcher refactor wip

* wip

* her works on FetchReach

* her runner refactor RUN BENCHMARKS Fetch1M

* unit test for her

* fixing warnings in mpi_average in her, skip test_fetchreach if mujoco is not present

* pickle-based serialization in her

* remove extra import from subproc_vec_env.py

* investigating differences in rollout.py

* try with old rollout code RUN BENCHMARKS her

* temporarily use DummyVecEnv in cmd_util.py RUN BENCHMARKS her

* dummy commit to RUN BENCHMARKS her

* set info_values in rollout worker in her RUN BENCHMARKS her

* bug in rollout_new.py RUN BENCHMARKS her

* fixed bug in rollout_new.py RUN BENCHMARKS her

* do not use last step because vecenv calls reset and returns obs after reset RUN BENCHMARKS her

* updated buffer sizes RUN BENCHMARKS her

* fixed loading/saving via joblib

* dust off learning from demonstrations in HER, docs, refactor

* add deprecation notice on her play and plot files

* address comments by Matthias

* 1.5 months of codegen changes (#196)

* play with resnet

* feed_dict version

* coinrun prob and more stats

* fixes to get_choices_specs & hp search

* minor prob fixes

* minor fixes

* minor

* alternative version of rl_algo stuff

* pylint fixes

* fix bugs, move node_filters to soup

* changed how get_algo works

* change how get_algo works, probably broke all tests

* continue previous refactor

* get eval_agent running again

* fixing tests

* fix tests

* fix more tests

* clean up cma stuff

* fix experiment

* minor changes to eval_agent to make ppo_metal use gpu

* make dict space work

* modify mac makefile to use conda

* recurrent layers

* play with bn and resnets

* minor hp changes

* minor

* got rid of use_fb argument and jtft (joint-train-fine-tune) functionality
built test phase directly into AlgoProb

* make new rl algos generateable

* pylint; start fixing tests

* fixing tests

* more test fixes

* pylint

* fix search

* work on search

* hack around infinite loop caused by scan

* algo search fixes

* misc changes for search expt

* enable annealing, overriding options of Op

* pylint fixes

* identity op

* achieve use_last_output through masking so it automatically works in other distributions

* fix tests

* minor

* discrete

* use_last_output to be just a preference, not a hard constraint

* pred delay, pruning

* require nontrivial inputs

* aliases for get_sm

* add probname to probs

* fixes

* small fixes

* fix tests

* fix tests

* fix tests

* minor

* test scripts

* dualgru network improvements

* minor

* work on mysterious bugs

* rcall gpu-usage command for kube

* use cache dir that’s not in code folder, so that it doesn’t get removed by rcall code rsync

* add power mode to gpu usage

* make sure train/test actually different

* remove VR for now

* minor fixes

* simplify soln_db

* minor

* big refactor of mpi eda

* improve mpieda for multitask

* - get rid of timelimit hack
- add __del__ to cleanup SubprocVecEnv

* get multitask working better

* fixes

* working on atari, various

* annotate ops with whether they’re parametrized

* minor

* gym version

* rand atari prob

* minor

* SolnDb bugfix and name change

* pyspy script

* switch conv layers

* fix roboschool/bullet3

* nenvs assertion

* fix rand atari

* get rid of blanket exception catching
fix soln_db bug

* fix rand_atari

* dynamic routing as cmdline arg

* slight modifications to test_mpi_map and pyspy-all

* max_tries argument for run_until_successs

* dedup option in train_mle

* simplify soln_db

* increase atari horizon for 1 experiment

* start implementing reward increment

* ent multiplier

* create cc dsl
other misc fixes

* cc ops

* q_func -> qs in rl_algos_cc.py

* fix PredictDistr

* rl_ops_cc fixes, MakeAction op

* augment algo agent to support cc stuff

* work on ddpg experiments

* fix blocking
temporarily change logger

* allow layer scaling

* pylint fixes

* spawn_method

* isolate ddpg hacks

* improve pruning

* use spawn for subproc

* remove use of python -c in rcall

* fix pylint warning

* fix static

* maybe fix local backend

* switch to DummyVecEnv

* making some fixes via pylint

* pylint fixes

* fixing tests

* fix tests

* fix tests

* write scaffolding for SSL in Codegen

* logger fix

* fix error

* add EMA op to sl_ops

* save many changes

* save

* add upsampler

* add sl ops, enhance state machine

* get ssl search working — some gross hacking

* fix session/graph issue

* fix importing

* work on mle

* - scale embeddings in gru model
- better exception handling in sl_prob
- use emas for test/val
- use non-contrib batch_norm layer

* improve logging

* option to average before dumping in logger

* default arguments, etc

* new ddpg and identity test

* concat fix

* minor

* move realistic ssl stuff to third-party (underscore to dash)

* fixes

* remove realistic_ssl_evaluation

* pylint fixes

* use gym master

* try again

* pass around args without gin

* fix tests

* separate line to install gym

* rename failing tests that should be ignored

* add data aug

* ssl improvements

* use fixed time limit

* try to fix baselines tests

* add score_floor, max_walltime, fiddle with lr decay

* realistic_ssl

* autopep8

* various ssl
- enable blocking grad for simplification
- kl
- multiple final prediction

* fix pruning

* misc ssl stuff

* bring back linear schedule, don’t use allgather for collecting stats
(i’ve been getting nondeterministic errors from the old code)

* save/load weights in SSL, big stepsize

* cleanup SslProb

* fix

* get rid of kl coef

* fix simplification, lower lr

* search over hps

* minor fixes

* minor

* static analysis

* move files and rename things for improved consistency.
still broken, and just saving before making nontrivial changes

* various

* make tests pass

* move coinrun_train to codegen since it depends on codegen

* fixes

* pylint fixes

* improve tests
fix some things

* improve tests

* lint

* fix up db_info.py, tests

* mostly restore master version of envs directory, except for makefile changes

* fix tests

* improve printing

* minor fixes

* fix fixmes

* pruning test

* fixes

* lint

* write new test that makes tf graphs of random algos; fix some bugs it caught

* add —delete flag to rcall upload-code command

* lint

* get cifar10 lazily for testing purposes

* disable codegen ci tests for now

* clean up rl_ops

* rename spec classes

* td3 with identity test

* identity tests without gin files

* remove gin.configurable from AlgoAgent

* comments about reduction in rl_ops_cc

* address @pzhokhov comments

* fix tests

* more linting

* better tests

* clean up filtering a bit

* fix concat

* delayed logger configuration (#208)

* delayed logger configuration

* fix typo

* setters and getters for Logger.DEFAULT as well

* do away with fancy property stuff - unable to get it to work with class level methods

* grammar and spaces

* spaces

* use get_current function instead of reading Logger.CURRENT

* autopep8

* disable mpi in subprocesses (#213)

* lazy_mpi load

* cleanups

* more lazy mpi

* don't pretend that class is a module, just use it as a class

* mass-replace mpi4py imports

* flake8

* fix previous lazy_mpi imports

* silly recursion

* try os.environ hack

* better prefix test, work with mpich

* restored MPI imports

* removed commented import in test_with_mpi

* restored codegen from master

* remove lazy mpi

* restored changes from rl-algs

* remove extra files

* address Chris' comments

* use spawn for shmem vec env as well (#2) (#219)

* lazy_mpi load

* cleanups

* more lazy mpi

* don't pretend that class is a module, just use it as a class

* mass-replace mpi4py imports

* flake8

* fix previous lazy_mpi imports

* silly recursion

* try os.environ hack

* better prefix test, work with mpich

* restored MPI imports

* removed commented import in test_with_mpi

* restored codegen from master

* remove lazy mpi

* restored changes from rl-algs

* remove extra files

* port mpi fix to shmem vec env

* increase the mpi test default timeout

* change humanoid hyperparameters, get rid of clip_Frac annealing, as it's apparently dangerous

* remove clip_frac schedule from ppo2

* more timesteps in humanoid run

* whitespace + RUN BENCHMARKS

* baselines: export vecenvs from folder (#221)

* baselines: export vecenvs from folder

* put missing function back in

* add missing imports

* more imports

* longer mpi timeout?

* make default logger configuration the same as call to logger.configure() (#222)

* Vecenv refactor (#223)

* update karl util

* restore pvi flag

* change rcall auto cpu behavior, move gin.configurable, add os.makedirs

* vecenv refactor

* aux buf index fix

* add num aux obs

* reset level with enter

* restore high difficulty flag

* bugfix

* restore train_coinrun.py

* tweaks

* renaming

* renaming

* better arguments handling

* more options

* options cleanup

* game data refactor

* more options

* args for train_procgen

* add close handler to interactive base class

* use debug build if debug=True, fix range on aux_obs

* add ProcGenEnv to __init__.py, add missing imports to procgen.py

* export RemoveDictWrapper and build, update train_procgen.py, move assets download into env creation and replace init_assets_and_build with just build

* fix formatting issues

* only call global init once

* fix path in setup.py

* revert part of makefile

* ignore IDE files and folders

* vec remove dict

* export VecRemoveDictObs

* remove RemoveDictWrapper

* remove IDE files

* move shared .h and .cpp files to common folder, update build to use those, dedupe env.cpp

* fix missing header

* try unified build function

* remove old scripts dir

* add comment on build

* upload libenv with render fixes

* tell qthreads to die when we unload the library

* pyglet.app.run is garbage

* static fixes

* whoops

* actually vsync is on

* cleanup

* cleanup

* extern C for libenv interface

* parse util rcall arg

* high difficulty fix

* game type enums

* ProcGenEnv subclasses

* game type cleanup

* unrecognized key

* unrecognized game type

* parse util reorg

* args management

* typo fix

* GinParser

* arg tweaks

* tweak

* restore start_level/num_levels setting

* fix create_procgen_env interface

* build fix

* procgen args in init signature

* fix

* build fix

* fix logger usage in ppo_metal/run_retro

* removed unnecessary OrderedDict requirement in subproc_vec_env

* flake8 fix

* allow for non-mpi tests

* mpi test fixes

* flake8; removed special logic for discrete spaces in dummy_vec_env

* remove forked argument in front of tests - does not play nicely with subprocvecenv in spawned processes; analog of forked in ddpg/test_smoke

* Everyrl initial commit & a few minor baselines changes (#226)

* everyrl initial commit

* add keep_buf argument to VecMonitor

* logger changes: set_comm and fix to mpi_mean functionality

* if filename not provided, don't create ResultsWriter

* change variable syncing function to simplify its usage. now you should initialize from all mpi processes

* everyrl coinrun changes

* tf_distr changes, bugfix

* get_one

* bring back get_next to temporarily restore code

* lint fixes

* fix test

* rename profile function

* rename gaussian

* fix coinrun training script

* change random seeding to work with new gym version (#231)

* change random seeding to work with new gym version

* move seeding to seed() method

* fix mnistenv

* actually try some of the tests before pushing

* more deterministic fixed seq

* misc changes to vecenvs and run.py for benchmarks (#236)

* misc changes to vecenvs and run.py for benchmarks

* dont seed global gen

* update more references to assert_venvs_equal

* Rl19 (#232)

* everyrl initial commit

* add keep_buf argument to VecMonitor

* logger changes: set_comm and fix to mpi_mean functionality

* if filename not provided, don't create ResultsWriter

* change variable syncing function to simplify its usage. now you should initialize from all mpi processes

* everyrl coinrun changes

* tf_distr changes, bugfix

* get_one

* bring back get_next to temporarily restore code

* lint fixes

* fix test

* rename profile function

* rename gaussian

* fix coinrun training script

* rl19

* remove everyrl dir which appeared in the merge for some reason

* readme

* fiddle with ddpg

* make ddpg work

* steps_total argument

* gpu count

* clean up hyperparams and shape math

* logging + saving

* configuration stuff

* fixes, smoke tests

* fix stats

* make load_results return dicts -- easier to create the same kind of objects with some other mechanism for passing to downstream functions

* benchmarks

* fix tests

* add dqn to tests, fix it

* minor

* turned annotated transformer (pytorch) into a script

* more refactoring

* jax stuff

* cluster

* minor

* copy & paste alec code

* sign error

* add huber, rename some parameters, snapshotting off by default

* remove jax stuff

* minor

* move maze env

* minor

* remove trailing spaces

* remove trailing space

* lint

* fix test breakage due to gym update

* rename function

* move maze back to codegen

* get recurrent ppo working

* enable both lstm and gru

* script to print table of benchmark results

* various

* fix dqn

* add fixup initializer, remove lastrew

* organize logging stats

* fix silly bug

* refactor models

* fix mpi usage

* check sync

* minor

* change vf coef, hps

* clean up slicing in ppo

* minor fixes

* caching transformer

* docstrings

* xf fixes

* get rid of 'B' and 'BT' arguments

* minor

* transformer example

* remove output_kind from base class until we have a better idea how to use it

* add comments, revert maze stuff

* flake8

* codegen lint

* fix codegen tests

* responded to peter's comments

* lint fixes

* minor changes to baselines (#243)

* minor changes to baselines

* fix spaces reference

* remove flake8 disable comments and fix import

* okay maybe don't add spec to vec_env

* Merge branch 'master' of github.com:openai/games

 the commit.

* flake8 complaints in baselines/her

* update dmlab30 env (#258)

* codegen continuous control experiment pr (#256)

* finish cherry-pick td3 test commit

* removed graph simplification error ingore

* merge delayed logger config

* merge updated baselines logger

* lazy_mpi load

* cleanups

* use lazy mpi imports in codegen

* more lazy mpi

* don't pretend that class is a module, just use it as a class

* mass-replace mpi4py imports

* flake8

* fix previous lazy_mpi imports

* removed extra printouts from TdLayer op

* silly recursion

* running codegen cc experiment

* wip

* more wip

* use actor is input for critic targets, instead of the action taken

* batch size 100

* tweak update parameters

* tweaking td3 runs

* wip

* use nenvs=2 for contcontrol (to be comparable with ppo_metal)

* wip. Doubts about usefulness of actor in critic target

* delayed actor in ActorLoss

* score is average of last 100

* skip lack of losses or too many action distributions

* 16 envs for contcontrol, replay buffer size equal to horizon (no point in making it longer)

* syntax

* microfixes

* minifixes

* run in process logic to bypass tensorflow freezes/failures (per Oleg's suggestion)

* squash-merge master, resolve conflicts

* remove erroneous file

* restore normal MPI imports

* move wrappers around a little bit

* autopep8

* cleanups

* cleanup mpi_eda, autopep8

* make activation function of action distribution customizable

* cleanups; preparation for a pr

* syntax

* merge latest master, resolve conflicts

* wrap MPI import with try/except

* allow import of modules through env id im baselines cmd_util

* flake8 complaints

* only wrap box action spaces with ClipActionsWrapper

* flake8

* fixes to algo_prob according to Oleg's suggestions

* use apply_without_scope flag in ActorLoss

* remove extra line in algo/core.py

* Rl19 metalearning (#261)

* rl19 metalearning and dict obs

* master merge arch fix

* lint fixes

* view fixes

* load vars tweaks

* user config cleanup

* documentation and revisions

* pass train comm to rl19

* cleanup

* Symshapes - gives codegen ability to evaluate same algo on envs with different ob/ac shapes (#262)

* finish cherry-pick td3 test commit

* removed graph simplification error ingore

* merge delayed logger config

* merge updated baselines logger

* lazy_mpi load

* cleanups

* use lazy mpi imports in codegen

* more lazy mpi

* don't pretend that class is a module, just use it as a class

* mass-replace mpi4py imports

* flake8

* fix previous lazy_mpi imports

* removed extra printouts from TdLayer op

* silly recursion

* running codegen cc experiment

* wip

* more wip

* use actor is input for critic targets, instead of the action taken

* batch size 100

* tweak update parameters

* tweaking td3 runs

* wip

* use nenvs=2 for contcontrol (to be comparable with ppo_metal)

* wip. Doubts about usefulness of actor in critic target

* delayed actor in ActorLoss

* score is average of last 100

* skip lack of losses or too many action distributions

* 16 envs for contcontrol, replay buffer size equal to horizon (no point in making it longer)

* syntax

* microfixes

* minifixes

* run in process logic to bypass tensorflow freezes/failures (per Oleg's suggestion)

* random physics for mujoco

* random parts sizes with range 0.4

* add notebook with results into x/peterz

* variations of ant

* roboschool use gym.make kwargs

* use float as lowest score after rank transform

* rcall from master

* wip

* re-enable dynamic routing

* wip

* squash-merge master, resolve conflicts

* remove erroneous file

* restore normal MPI imports

* move wrappers around a little bit

* autopep8

* cleanups

* cleanup mpi_eda, autopep8

* make activation function of action distribution customizable

* cleanups; preparation for a pr

* syntax

* merge latest master, resolve conflicts

* wrap MPI import with try/except

* allow import of modules through env id im baselines cmd_util

* flake8 complaints

* only wrap box action spaces with ClipActionsWrapper

* flake8

* fixes to algo_prob according to Oleg's suggestions

* use apply_without_scope flag in ActorLoss

* remove extra line in algo/core.py

* multi-task support

* autopep8

* symbolic suffix-shapes (not B,T yet)

* test_with_mpi -> with_mpi rename

* remove extra blank lines in algo/core

* remove extra blank lines in algo/core

* remove more blank lines

* symbolify shapes in existing algorithms

* minor output changes

* cleaning up merge conflicts

* cleaning up merge conflicts

* cleaning up more merge conflicts

* restore mpi_map.py from master

* remove tensorflow dependency from VecEnv

* make tests use single-threaded session for determinism of KfacOptimizer (#298)

* make tests use single-threaded session for determinism of KfacOptimizer

* updated comment in kfac.py

* remove unused sess_config

* add score calculator wrapper, forward property lookups on vecenv wrap… (#300)

* add score calculator wrapper, forward property lookups on vecenv wrapper, misc cleanup

* tests

* pylint

* fix vec monitor infos

* Workbench (#303)

* begin workbench

* cleanup

* begin procgen config integration

* arg tweaks

* more args

* parameter saving

* begin procgen enjoy

* tweaks

* more workbench

* more args sync/restore

* cleanup

* merge in master

* rework args priority

* more workbench

* more loggign

* impala cnn

* impala lstm

* tweak

* tweaks

* rl19 time logging

* misc fixes

* faster pipeline

* update local.py

* sess and log config tweaks

* num processes

* logging tweaks

* difficulty reward wrapper

* logging fixes

* gin tweaks

* tweak

* fix

* task id

* param loading

* more variable loading

* entrypoint

* tweak

* ksync

* restore lstm

* begin rl19 support

* tweak

* rl19 rnn

* more rl19 integration

* fix

* cleanup

* restore rl19 rnn

* cleanup

* cleanup

* wrappers.get_log_info

* cleanup

* cleanup

* directory cleanup

* logging, num_experiments

* fixes

* cleanup

* gin fixes

* fix local max gpu

* resid nx

* num machines and download params

* rename

* cleanup

* create workbench

* more reorg

* fix

* more logging wrappers

* lint fix

* restore train procgen

* restore train procgen

* pylint fix

* better wrapping

* config sweep

* args sweep

* test workers

* mpi_weight

* train test comm and high difficulty fix

* enjoy show returns

* removing gin, procgen_parser

* removing gin

* procgen args

* config fixes

* cleanup

* cleanup

* procgen args fix

* fix

* rcall syncing

* lint

* rename mpi_weight

* use username for sync

* fixes

* microbatch fix

* Grad clipping in MpiAdamOptimizer, transformer changes (#304)

* transformer mnist experiments

* version that only builds one model

* work on inverted mnist

* Add grad clipping to MpiAdamOptimizer

* various

* transformer changes, loading

* get rid of soft labels

* transformer baseline

* minor

* experiments involving all possible training sets

* vary training

* minor

* get ready for fine-tuning expers

* lint

* minor

* Add jrl19 as backend for workbench (#324)

enable jrl in workbench
minor logger changes

* extra functionality in baselines.common.plot_util (#310)

* get plot_util from mt_experiments branch

* add labels

* unit tests for plot_util

* Fixed sequence env minor (#333)

minor changes to FixedSequenceEnv to allow full score

* fix tests (#335)

* Procgen Benchmark Updates (#328)

* directory cleanup

* logging, num_experiments

* fixes

* cleanup

* gin fixes

* fix local max gpu

* resid nx

* tweak

* num machines and download params

* rename

* cleanup

* create workbench

* more reorg

* fix

* more logging wrappers

* lint fix

* restore train procgen

* restore train procgen

* pylint fix

* better wrapping

* whackamole walls

* config sweep

* tweak

* args sweep

* tweak

* test workers

* mpi_weight

* train test comm and high difficulty fix

* enjoy show returns

* better joint training

* tweak

* Add —update to args and add gin-config to requirements.txt

* add username to download_file

* removing gin, procgen_parser

* removing gin

* procgen args

* config fixes

* cleanup

* cleanup

* procgen args fix

* fix

* rcall syncing

* lint

* rename mpi_weight

* begin composable game

* more composable game

* tweak

* background alpha

* use username for sync

* fixes

* microbatch fix

* lure composable game

* merge

* proc trans update

* proc trans update (#307)

* finetuning experiment

* Change is_local to use `use_rcall` and fix error of `enjoy.py` with multiple ends

* graphing help

* add --local

* change args_dict['env_name'] to ENV_NAME

* finetune experiments

* tweak

* tweak

* reorg wrappers, remove is_local

* workdir/local fixes

* move finetune experiments

* default dir and graphing

* more graphing

* fix

* pooled syncing

* tweaks

* dir fix

* tweak

* wrapper mpi fix

* wind and turrets

* composability cleanup

* radius cleanup

* composable reorg

* laser gates

* composable tweaks

* soft walls

* tweak

* begin swamp

* more swamp

* more swamp

* fix

* hidden mines

* use maze layout

* tweak

* laser gate tweaks

* tweaks

* tweaks

* lure/propel updates

* composable midnight

* composable coinmaze

* composability difficulty

* tweak

* add step to save_params

* composable offsets

* composable boxpush

* composable combiner

* tweak

* tweak

* always choose correct number of mechanics

* fix

* rcall local fix

* add steps when dump and save parmas

* loading rank 1,2,3.. error fix

* add experiments.py

* fix loading latest weight with no -rest

* support more complex run_id and add more examples

* fix typo

* move post_run_id into experiments.py

* add hp_search example

* error fix

* joint experiments in progress

* joint hp finished

* typo

* error fix

* edit experiments

* Save experiments set up in code and  save weights per step (#319)

* add step to save_params

* add steps when dump and save parmas

* loading rank 1,2,3.. error fix

* add experiments.py

* fix loading latest weight with no -rest

* support more complex run_id and add more examples

* fix typo

* move post_run_id into experiments.py

* add hp_search example

* error fix

* joint experiments in progress

* joint hp finished

* typo

* error fix

* edit experiments

* tweaks

* graph exp WIP

* depth tweaks

* move save_all

* fix

* restore_dir name

* restore depth

* choose max mechanics

* use override mode

* tweak frogger

* lstm default

* fix

* patience is composable

* hunter is composable

* fixed asset seed cleanup

* minesweeper is composable

* eggcatch is composable

* tweak

* applesort is composable

* chaser game

* begin lighter

* lighter game

* tractor game

* boxgather game

* plumber game

* hitcher game

* doorbell game

* lawnmower game

* connecter game

* cannonaim

* outrun game

* encircle game

* spinner game

* tweak

* tweak

* detonator game

* driller

* driller

* mixer

* conveyor

* conveyor game

* joint pcg experiments

* fixes

* pcg sweep experiment

* cannonaim fix

* combiner fix

* store save time

* laseraim fix

* lightup fix

* detonator tweaks

* detonator fixes

* driller fix

* lawnmower calibration

* spinner calibration

* propel fix

* train experiment

* print load time

* system independent hashing

* remove gin configurable

* task ids fix

* test_pcg experiment

* connecter dense reward

* hard_pcg

* num train comms

* mpi splits envs

* tweaks

* tweaks

* graph tweaks

* graph tweaks

* lint fix

* fix tests

* load bugfix

* difficulty timeout tweak

* tweaks

* more graphing

* graph tweaks

* tweak

* download file fix

* pcg train envs list

* cleanup

* tweak

* manually name impala layers

* tweak

* expect fps

* backend arg

* args tweak

* workbench cleanup

* move graph files

* workbench cleanup

* split env name by comma

* workbench cleanup

* ema graph

* remove Dict

* use tf.io.gfile

* comments for auto-killing jobs

* lint fix

* write latest file when not saving all and load it when step=None

* ci/runtests.sh - pass all folders to pytest (#342)

* ci/runtests.sh - pass all folders to pytest

* mpi_optimizer_test precision 1e-4

* fixes to tests

* search for tests in the entire jax folder, also remove unnecessary humor

* delete unnecessary stuff (#338)

* Add initializer for process-level setup in SubprocVecEnv (#276)

* Add initializer for process-level setup in SubprocVecEnv

Use case: run logger.configure() in each subprocess

* Add option to force dummy vec env

* Procgen fixes (#352)

* tweak

* documentation

* rely on log_comm, remove mpi averaging from wrappers

* pass comm for ppo2 initialization

* ppo2 logging

* experiment tweaks

* auto launch tensorboard when using local backend

* graph tweaks

* pass caller to config

* configure logger and tensorboard

* make parent dir if necessary

* parentdir tweak

* JRL PPO test with delayed identity env (#355)

* add a custom delay to identity_env

* min reward 0.8 in delayed identity test

* seed the tests, perfect score on delayed_identity_test

* delay=1 in delayed_identity_test

* flake8 complaints

* increased number of steps in fixed_seq_test

* seed identity tests to ensure reproducibility

* docstrings

* (onp, np) -> (np, jp), switch jax code to use mark_slow decorator (#363)

switch to mark_slow decorator

* fix tests - add matplotlib to setup_requires, put mpi4py import in try-except

* test fixes
2019-05-08 11:36:10 -07:00
pzhokhov
3301089b48 remove bullet extra, constrain gym version to be >= 0.10.0 (#885)
* remove bullet extra, constrain gym version to be >= 0.10.0

* constrain gym version from above
2019-04-26 16:14:49 -07:00
pzhokhov
a07fad9066 change rms 2 tfrms switch in vec_normalize to be more explicit (#886)
* change rms 2 tfrms switch in vec_normalize to be more explicit

* modify the vec_normalize / use_tf logic a little bit

* typo

* use_tf = False by default
2019-04-26 16:14:21 -07:00
Taeyeong Jeong
5d8041d18e Fix indexing LazyFrames (#875)
Indexing LazyFrames with index i should return the single channel frame
2019-04-19 15:00:09 -07:00
Peter Zhokhov
fa37beb52e fix commit on atari bms page to point to a public commit 2019-04-06 20:03:32 -07:00
Peter Zhokhov
8a97e0df10 fix shuffling bug in ppo1 2019-04-05 15:23:46 -07:00
pzhokhov
fabbf2c611 short-circuit framestack wrapper with size 1 (#871) 2019-04-05 15:18:15 -07:00
Xingdong Zuo
5d285b318f [Update misc_util.py]: clean up unused helper functions (#751)
* Update misc_util.py

* Update misc_util.py
2019-04-05 15:16:26 -07:00
Tim Zaman
49a99c7d23 Add eps to normalization (#797) 2019-04-05 14:46:01 -07:00
Peter Zhokhov
c79b3373bf parse colon-separated env_id's 2019-04-05 14:43:09 -07:00
Sridhar Thiagarajan
6d1c6c78d3 Interface for U.make_session changed (#865) 2019-04-01 16:24:02 -07:00
JongGyun Kim
62a9c76f18 fix the definition of TfInput.make_feed_dict. (#812) 2019-04-01 15:49:25 -07:00
Hao-Chih, Lin
282c9cc91f fix small bug in plot_results() (#864)
Remove the comma behind the last input argument
2019-04-01 15:48:35 -07:00
Peter Zhokhov
096f4d9cf0 neaten up stacking logic in mujoco_dset in gail 2019-04-01 15:47:13 -07:00
Mingfei
16136ddca7 fix bugs: obs_ph normalization in adversary.py (#823)
* fix bugs: obs_ph normalization in adversary.py

* fix bug in reshape obs and acs in Mujobo_Dset
2019-04-01 15:44:31 -07:00
Darío Hereñú
b1644157d6 Fixed typo on #092 (#824) 2019-04-01 15:41:52 -07:00
Yu Feng
58541db226 MPI refer to workers as ranks, not threads. (#833) 2019-04-01 15:38:45 -07:00
zlsh80826
c02b575f01 ppo2: use time.perf_counter() instead of time.time() for time measurement (#847) 2019-04-01 15:37:32 -07:00
Pastafarianist
897fa31548 Avoid using default config while requesting available GPUs (#863) 2019-03-29 13:25:56 -07:00
Brett Daley
d51f8be8f9 Report episode rewards/length in A2C and ACKTR (#856) 2019-03-28 09:21:48 -07:00
Jacob Hilton
3f2f45acef Merge pull request #860 from openai/build-retro-env-framestack-fix
run.py framestack bug fix
2019-03-25 14:33:15 -07:00
Jacob Hilton
b64974eb90 build_env now doesn't apply frame stack to retro games twice 2019-03-24 12:27:14 -07:00
pzhokhov
1b092434fc remove f-strings for python 3.5 compatibility (#854) 2019-03-16 11:54:47 -07:00
Peter Zhokhov
1259f6ab25 check for environment being vectorized in the play logic in run.py 2019-03-11 17:44:03 -07:00
pzhokhov
74101a9f24 fix freeze of ppo2 (#849)
* fix freeze of ppo2

* unit test for freeze, updated docstring

* more docstring update

* set number of threads to 1 in the test
2019-03-11 17:28:51 -07:00