2019-01-15 09:59:27 -08:00
|
|
|
import contextlib
|
1.5 months of codegen changes (#196)
* play with resnet
* feed_dict version
* coinrun prob and more stats
* fixes to get_choices_specs & hp search
* minor prob fixes
* minor fixes
* minor
* alternative version of rl_algo stuff
* pylint fixes
* fix bugs, move node_filters to soup
* changed how get_algo works
* change how get_algo works, probably broke all tests
* continue previous refactor
* get eval_agent running again
* fixing tests
* fix tests
* fix more tests
* clean up cma stuff
* fix experiment
* minor changes to eval_agent to make ppo_metal use gpu
* make dict space work
* modify mac makefile to use conda
* recurrent layers
* play with bn and resnets
* minor hp changes
* minor
* got rid of use_fb argument and jtft (joint-train-fine-tune) functionality
built test phase directly into AlgoProb
* make new rl algos generateable
* pylint; start fixing tests
* fixing tests
* more test fixes
* pylint
* fix search
* work on search
* hack around infinite loop caused by scan
* algo search fixes
* misc changes for search expt
* enable annealing, overriding options of Op
* pylint fixes
* identity op
* achieve use_last_output through masking so it automatically works in other distributions
* fix tests
* minor
* discrete
* use_last_output to be just a preference, not a hard constraint
* pred delay, pruning
* require nontrivial inputs
* aliases for get_sm
* add probname to probs
* fixes
* small fixes
* fix tests
* fix tests
* fix tests
* minor
* test scripts
* dualgru network improvements
* minor
* work on mysterious bugs
* rcall gpu-usage command for kube
* use cache dir that’s not in code folder, so that it doesn’t get removed by rcall code rsync
* add power mode to gpu usage
* make sure train/test actually different
* remove VR for now
* minor fixes
* simplify soln_db
* minor
* big refactor of mpi eda
* improve mpieda for multitask
* - get rid of timelimit hack
- add __del__ to cleanup SubprocVecEnv
* get multitask working better
* fixes
* working on atari, various
* annotate ops with whether they’re parametrized
* minor
* gym version
* rand atari prob
* minor
* SolnDb bugfix and name change
* pyspy script
* switch conv layers
* fix roboschool/bullet3
* nenvs assertion
* fix rand atari
* get rid of blanket exception catching
fix soln_db bug
* fix rand_atari
* dynamic routing as cmdline arg
* slight modifications to test_mpi_map and pyspy-all
* max_tries argument for run_until_successs
* dedup option in train_mle
* simplify soln_db
* increase atari horizon for 1 experiment
* start implementing reward increment
* ent multiplier
* create cc dsl
other misc fixes
* cc ops
* q_func -> qs in rl_algos_cc.py
* fix PredictDistr
* rl_ops_cc fixes, MakeAction op
* augment algo agent to support cc stuff
* work on ddpg experiments
* fix blocking
temporarily change logger
* allow layer scaling
* pylint fixes
* spawn_method
* isolate ddpg hacks
* improve pruning
* use spawn for subproc
* remove use of python -c in rcall
* fix pylint warning
* fix static
* maybe fix local backend
* switch to DummyVecEnv
* making some fixes via pylint
* pylint fixes
* fixing tests
* fix tests
* fix tests
* write scaffolding for SSL in Codegen
* logger fix
* fix error
* add EMA op to sl_ops
* save many changes
* save
* add upsampler
* add sl ops, enhance state machine
* get ssl search working — some gross hacking
* fix session/graph issue
* fix importing
* work on mle
* - scale embeddings in gru model
- better exception handling in sl_prob
- use emas for test/val
- use non-contrib batch_norm layer
* improve logging
* option to average before dumping in logger
* default arguments, etc
* new ddpg and identity test
* concat fix
* minor
* move realistic ssl stuff to third-party (underscore to dash)
* fixes
* remove realistic_ssl_evaluation
* pylint fixes
* use gym master
* try again
* pass around args without gin
* fix tests
* separate line to install gym
* rename failing tests that should be ignored
* add data aug
* ssl improvements
* use fixed time limit
* try to fix baselines tests
* add score_floor, max_walltime, fiddle with lr decay
* realistic_ssl
* autopep8
* various ssl
- enable blocking grad for simplification
- kl
- multiple final prediction
* fix pruning
* misc ssl stuff
* bring back linear schedule, don’t use allgather for collecting stats
(i’ve been getting nondeterministic errors from the old code)
* save/load weights in SSL, big stepsize
* cleanup SslProb
* fix
* get rid of kl coef
* fix simplification, lower lr
* search over hps
* minor fixes
* minor
* static analysis
* move files and rename things for improved consistency.
still broken, and just saving before making nontrivial changes
* various
* make tests pass
* move coinrun_train to codegen since it depends on codegen
* fixes
* pylint fixes
* improve tests
fix some things
* improve tests
* lint
* fix up db_info.py, tests
* mostly restore master version of envs directory, except for makefile changes
* fix tests
* improve printing
* minor fixes
* fix fixmes
* pruning test
* fixes
* lint
* write new test that makes tf graphs of random algos; fix some bugs it caught
* add —delete flag to rcall upload-code command
* lint
* get cifar10 lazily for testing purposes
* disable codegen ci tests for now
* clean up rl_ops
* rename spec classes
* td3 with identity test
* identity tests without gin files
* remove gin.configurable from AlgoAgent
* comments about reduction in rl_ops_cc
* address @pzhokhov comments
* fix tests
* more linting
* better tests
* clean up filtering a bit
* fix concat
2019-01-03 13:23:18 -08:00
|
|
|
import multiprocessing as mp
|
2019-01-15 09:59:27 -08:00
|
|
|
import os
|
|
|
|
|
|
|
|
import numpy as np
|
2018-08-17 09:40:35 -07:00
|
|
|
from . import VecEnv, CloudpickleWrapper
|
2017-10-25 09:21:29 -04:00
|
|
|
|
2019-01-15 09:59:27 -08:00
|
|
|
@contextlib.contextmanager
|
|
|
|
def clear_mpi_env_vars():
|
|
|
|
"""
|
|
|
|
from mpi4py import MPI will call MPI_Init by default. If the child process has MPI environment variables, MPI will think that the child process is an MPI process just like the parent and do bad things such as hang.
|
|
|
|
|
|
|
|
This context manager is a hacky way to clear those environment variables temporarily such as when we are starting multiprocessing
|
|
|
|
Processes.
|
|
|
|
"""
|
|
|
|
removed_environment = {}
|
|
|
|
for k, v in list(os.environ.items()):
|
|
|
|
for prefix in ['OMPI_', 'PMI_']:
|
|
|
|
if k.startswith(prefix):
|
|
|
|
removed_environment[k] = v
|
|
|
|
del os.environ[k]
|
|
|
|
try:
|
|
|
|
yield
|
|
|
|
finally:
|
|
|
|
os.environ.update(removed_environment)
|
|
|
|
|
1.5 months of codegen changes (#196)
* play with resnet
* feed_dict version
* coinrun prob and more stats
* fixes to get_choices_specs & hp search
* minor prob fixes
* minor fixes
* minor
* alternative version of rl_algo stuff
* pylint fixes
* fix bugs, move node_filters to soup
* changed how get_algo works
* change how get_algo works, probably broke all tests
* continue previous refactor
* get eval_agent running again
* fixing tests
* fix tests
* fix more tests
* clean up cma stuff
* fix experiment
* minor changes to eval_agent to make ppo_metal use gpu
* make dict space work
* modify mac makefile to use conda
* recurrent layers
* play with bn and resnets
* minor hp changes
* minor
* got rid of use_fb argument and jtft (joint-train-fine-tune) functionality
built test phase directly into AlgoProb
* make new rl algos generateable
* pylint; start fixing tests
* fixing tests
* more test fixes
* pylint
* fix search
* work on search
* hack around infinite loop caused by scan
* algo search fixes
* misc changes for search expt
* enable annealing, overriding options of Op
* pylint fixes
* identity op
* achieve use_last_output through masking so it automatically works in other distributions
* fix tests
* minor
* discrete
* use_last_output to be just a preference, not a hard constraint
* pred delay, pruning
* require nontrivial inputs
* aliases for get_sm
* add probname to probs
* fixes
* small fixes
* fix tests
* fix tests
* fix tests
* minor
* test scripts
* dualgru network improvements
* minor
* work on mysterious bugs
* rcall gpu-usage command for kube
* use cache dir that’s not in code folder, so that it doesn’t get removed by rcall code rsync
* add power mode to gpu usage
* make sure train/test actually different
* remove VR for now
* minor fixes
* simplify soln_db
* minor
* big refactor of mpi eda
* improve mpieda for multitask
* - get rid of timelimit hack
- add __del__ to cleanup SubprocVecEnv
* get multitask working better
* fixes
* working on atari, various
* annotate ops with whether they’re parametrized
* minor
* gym version
* rand atari prob
* minor
* SolnDb bugfix and name change
* pyspy script
* switch conv layers
* fix roboschool/bullet3
* nenvs assertion
* fix rand atari
* get rid of blanket exception catching
fix soln_db bug
* fix rand_atari
* dynamic routing as cmdline arg
* slight modifications to test_mpi_map and pyspy-all
* max_tries argument for run_until_successs
* dedup option in train_mle
* simplify soln_db
* increase atari horizon for 1 experiment
* start implementing reward increment
* ent multiplier
* create cc dsl
other misc fixes
* cc ops
* q_func -> qs in rl_algos_cc.py
* fix PredictDistr
* rl_ops_cc fixes, MakeAction op
* augment algo agent to support cc stuff
* work on ddpg experiments
* fix blocking
temporarily change logger
* allow layer scaling
* pylint fixes
* spawn_method
* isolate ddpg hacks
* improve pruning
* use spawn for subproc
* remove use of python -c in rcall
* fix pylint warning
* fix static
* maybe fix local backend
* switch to DummyVecEnv
* making some fixes via pylint
* pylint fixes
* fixing tests
* fix tests
* fix tests
* write scaffolding for SSL in Codegen
* logger fix
* fix error
* add EMA op to sl_ops
* save many changes
* save
* add upsampler
* add sl ops, enhance state machine
* get ssl search working — some gross hacking
* fix session/graph issue
* fix importing
* work on mle
* - scale embeddings in gru model
- better exception handling in sl_prob
- use emas for test/val
- use non-contrib batch_norm layer
* improve logging
* option to average before dumping in logger
* default arguments, etc
* new ddpg and identity test
* concat fix
* minor
* move realistic ssl stuff to third-party (underscore to dash)
* fixes
* remove realistic_ssl_evaluation
* pylint fixes
* use gym master
* try again
* pass around args without gin
* fix tests
* separate line to install gym
* rename failing tests that should be ignored
* add data aug
* ssl improvements
* use fixed time limit
* try to fix baselines tests
* add score_floor, max_walltime, fiddle with lr decay
* realistic_ssl
* autopep8
* various ssl
- enable blocking grad for simplification
- kl
- multiple final prediction
* fix pruning
* misc ssl stuff
* bring back linear schedule, don’t use allgather for collecting stats
(i’ve been getting nondeterministic errors from the old code)
* save/load weights in SSL, big stepsize
* cleanup SslProb
* fix
* get rid of kl coef
* fix simplification, lower lr
* search over hps
* minor fixes
* minor
* static analysis
* move files and rename things for improved consistency.
still broken, and just saving before making nontrivial changes
* various
* make tests pass
* move coinrun_train to codegen since it depends on codegen
* fixes
* pylint fixes
* improve tests
fix some things
* improve tests
* lint
* fix up db_info.py, tests
* mostly restore master version of envs directory, except for makefile changes
* fix tests
* improve printing
* minor fixes
* fix fixmes
* pruning test
* fixes
* lint
* write new test that makes tf graphs of random algos; fix some bugs it caught
* add —delete flag to rcall upload-code command
* lint
* get cifar10 lazily for testing purposes
* disable codegen ci tests for now
* clean up rl_ops
* rename spec classes
* td3 with identity test
* identity tests without gin files
* remove gin.configurable from AlgoAgent
* comments about reduction in rl_ops_cc
* address @pzhokhov comments
* fix tests
* more linting
* better tests
* clean up filtering a bit
* fix concat
2019-01-03 13:23:18 -08:00
|
|
|
ctx = mp.get_context('spawn')
|
|
|
|
|
2017-10-25 09:21:29 -04:00
|
|
|
def worker(remote, parent_remote, env_fn_wrapper):
|
|
|
|
parent_remote.close()
|
2017-08-18 09:25:39 -07:00
|
|
|
env = env_fn_wrapper.x()
|
2018-08-13 09:56:44 -07:00
|
|
|
try:
|
|
|
|
while True:
|
|
|
|
cmd, data = remote.recv()
|
|
|
|
if cmd == 'step':
|
|
|
|
ob, reward, done, info = env.step(data)
|
|
|
|
if done:
|
|
|
|
ob = env.reset()
|
|
|
|
remote.send((ob, reward, done, info))
|
|
|
|
elif cmd == 'reset':
|
2017-08-18 09:25:39 -07:00
|
|
|
ob = env.reset()
|
2018-08-13 09:56:44 -07:00
|
|
|
remote.send(ob)
|
|
|
|
elif cmd == 'render':
|
|
|
|
remote.send(env.render(mode='rgb_array'))
|
|
|
|
elif cmd == 'close':
|
|
|
|
remote.close()
|
|
|
|
break
|
1.5 months of codegen changes (#196)
* play with resnet
* feed_dict version
* coinrun prob and more stats
* fixes to get_choices_specs & hp search
* minor prob fixes
* minor fixes
* minor
* alternative version of rl_algo stuff
* pylint fixes
* fix bugs, move node_filters to soup
* changed how get_algo works
* change how get_algo works, probably broke all tests
* continue previous refactor
* get eval_agent running again
* fixing tests
* fix tests
* fix more tests
* clean up cma stuff
* fix experiment
* minor changes to eval_agent to make ppo_metal use gpu
* make dict space work
* modify mac makefile to use conda
* recurrent layers
* play with bn and resnets
* minor hp changes
* minor
* got rid of use_fb argument and jtft (joint-train-fine-tune) functionality
built test phase directly into AlgoProb
* make new rl algos generateable
* pylint; start fixing tests
* fixing tests
* more test fixes
* pylint
* fix search
* work on search
* hack around infinite loop caused by scan
* algo search fixes
* misc changes for search expt
* enable annealing, overriding options of Op
* pylint fixes
* identity op
* achieve use_last_output through masking so it automatically works in other distributions
* fix tests
* minor
* discrete
* use_last_output to be just a preference, not a hard constraint
* pred delay, pruning
* require nontrivial inputs
* aliases for get_sm
* add probname to probs
* fixes
* small fixes
* fix tests
* fix tests
* fix tests
* minor
* test scripts
* dualgru network improvements
* minor
* work on mysterious bugs
* rcall gpu-usage command for kube
* use cache dir that’s not in code folder, so that it doesn’t get removed by rcall code rsync
* add power mode to gpu usage
* make sure train/test actually different
* remove VR for now
* minor fixes
* simplify soln_db
* minor
* big refactor of mpi eda
* improve mpieda for multitask
* - get rid of timelimit hack
- add __del__ to cleanup SubprocVecEnv
* get multitask working better
* fixes
* working on atari, various
* annotate ops with whether they’re parametrized
* minor
* gym version
* rand atari prob
* minor
* SolnDb bugfix and name change
* pyspy script
* switch conv layers
* fix roboschool/bullet3
* nenvs assertion
* fix rand atari
* get rid of blanket exception catching
fix soln_db bug
* fix rand_atari
* dynamic routing as cmdline arg
* slight modifications to test_mpi_map and pyspy-all
* max_tries argument for run_until_successs
* dedup option in train_mle
* simplify soln_db
* increase atari horizon for 1 experiment
* start implementing reward increment
* ent multiplier
* create cc dsl
other misc fixes
* cc ops
* q_func -> qs in rl_algos_cc.py
* fix PredictDistr
* rl_ops_cc fixes, MakeAction op
* augment algo agent to support cc stuff
* work on ddpg experiments
* fix blocking
temporarily change logger
* allow layer scaling
* pylint fixes
* spawn_method
* isolate ddpg hacks
* improve pruning
* use spawn for subproc
* remove use of python -c in rcall
* fix pylint warning
* fix static
* maybe fix local backend
* switch to DummyVecEnv
* making some fixes via pylint
* pylint fixes
* fixing tests
* fix tests
* fix tests
* write scaffolding for SSL in Codegen
* logger fix
* fix error
* add EMA op to sl_ops
* save many changes
* save
* add upsampler
* add sl ops, enhance state machine
* get ssl search working — some gross hacking
* fix session/graph issue
* fix importing
* work on mle
* - scale embeddings in gru model
- better exception handling in sl_prob
- use emas for test/val
- use non-contrib batch_norm layer
* improve logging
* option to average before dumping in logger
* default arguments, etc
* new ddpg and identity test
* concat fix
* minor
* move realistic ssl stuff to third-party (underscore to dash)
* fixes
* remove realistic_ssl_evaluation
* pylint fixes
* use gym master
* try again
* pass around args without gin
* fix tests
* separate line to install gym
* rename failing tests that should be ignored
* add data aug
* ssl improvements
* use fixed time limit
* try to fix baselines tests
* add score_floor, max_walltime, fiddle with lr decay
* realistic_ssl
* autopep8
* various ssl
- enable blocking grad for simplification
- kl
- multiple final prediction
* fix pruning
* misc ssl stuff
* bring back linear schedule, don’t use allgather for collecting stats
(i’ve been getting nondeterministic errors from the old code)
* save/load weights in SSL, big stepsize
* cleanup SslProb
* fix
* get rid of kl coef
* fix simplification, lower lr
* search over hps
* minor fixes
* minor
* static analysis
* move files and rename things for improved consistency.
still broken, and just saving before making nontrivial changes
* various
* make tests pass
* move coinrun_train to codegen since it depends on codegen
* fixes
* pylint fixes
* improve tests
fix some things
* improve tests
* lint
* fix up db_info.py, tests
* mostly restore master version of envs directory, except for makefile changes
* fix tests
* improve printing
* minor fixes
* fix fixmes
* pruning test
* fixes
* lint
* write new test that makes tf graphs of random algos; fix some bugs it caught
* add —delete flag to rcall upload-code command
* lint
* get cifar10 lazily for testing purposes
* disable codegen ci tests for now
* clean up rl_ops
* rename spec classes
* td3 with identity test
* identity tests without gin files
* remove gin.configurable from AlgoAgent
* comments about reduction in rl_ops_cc
* address @pzhokhov comments
* fix tests
* more linting
* better tests
* clean up filtering a bit
* fix concat
2019-01-03 13:23:18 -08:00
|
|
|
elif cmd == 'get_spaces_spec':
|
|
|
|
remote.send((env.observation_space, env.action_space, env.spec))
|
2018-08-13 09:56:44 -07:00
|
|
|
else:
|
|
|
|
raise NotImplementedError
|
|
|
|
except KeyboardInterrupt:
|
|
|
|
print('SubprocVecEnv worker: got KeyboardInterrupt')
|
|
|
|
finally:
|
|
|
|
env.close()
|
2017-10-25 09:21:29 -04:00
|
|
|
|
2018-08-17 09:40:35 -07:00
|
|
|
|
2017-08-18 09:25:39 -07:00
|
|
|
class SubprocVecEnv(VecEnv):
|
2018-09-11 12:40:23 -07:00
|
|
|
"""
|
2018-09-11 13:21:52 -07:00
|
|
|
VecEnv that runs multiple environments in parallel in subproceses and communicates with them via pipes.
|
|
|
|
Recommended to use when num_envs > 1 and step() can be a bottleneck.
|
2018-09-11 12:40:23 -07:00
|
|
|
"""
|
2018-01-25 18:33:48 -08:00
|
|
|
def __init__(self, env_fns, spaces=None):
|
2017-08-18 09:25:39 -07:00
|
|
|
"""
|
2018-09-11 12:40:23 -07:00
|
|
|
Arguments:
|
|
|
|
|
|
|
|
env_fns: iterable of callables - functions that create environments to run in subprocesses. Need to be cloud-pickleable
|
2017-08-18 09:25:39 -07:00
|
|
|
"""
|
2018-01-25 18:33:48 -08:00
|
|
|
self.waiting = False
|
2018-09-10 11:58:22 -07:00
|
|
|
self.closed = False
|
2017-08-18 09:25:39 -07:00
|
|
|
nenvs = len(env_fns)
|
1.5 months of codegen changes (#196)
* play with resnet
* feed_dict version
* coinrun prob and more stats
* fixes to get_choices_specs & hp search
* minor prob fixes
* minor fixes
* minor
* alternative version of rl_algo stuff
* pylint fixes
* fix bugs, move node_filters to soup
* changed how get_algo works
* change how get_algo works, probably broke all tests
* continue previous refactor
* get eval_agent running again
* fixing tests
* fix tests
* fix more tests
* clean up cma stuff
* fix experiment
* minor changes to eval_agent to make ppo_metal use gpu
* make dict space work
* modify mac makefile to use conda
* recurrent layers
* play with bn and resnets
* minor hp changes
* minor
* got rid of use_fb argument and jtft (joint-train-fine-tune) functionality
built test phase directly into AlgoProb
* make new rl algos generateable
* pylint; start fixing tests
* fixing tests
* more test fixes
* pylint
* fix search
* work on search
* hack around infinite loop caused by scan
* algo search fixes
* misc changes for search expt
* enable annealing, overriding options of Op
* pylint fixes
* identity op
* achieve use_last_output through masking so it automatically works in other distributions
* fix tests
* minor
* discrete
* use_last_output to be just a preference, not a hard constraint
* pred delay, pruning
* require nontrivial inputs
* aliases for get_sm
* add probname to probs
* fixes
* small fixes
* fix tests
* fix tests
* fix tests
* minor
* test scripts
* dualgru network improvements
* minor
* work on mysterious bugs
* rcall gpu-usage command for kube
* use cache dir that’s not in code folder, so that it doesn’t get removed by rcall code rsync
* add power mode to gpu usage
* make sure train/test actually different
* remove VR for now
* minor fixes
* simplify soln_db
* minor
* big refactor of mpi eda
* improve mpieda for multitask
* - get rid of timelimit hack
- add __del__ to cleanup SubprocVecEnv
* get multitask working better
* fixes
* working on atari, various
* annotate ops with whether they’re parametrized
* minor
* gym version
* rand atari prob
* minor
* SolnDb bugfix and name change
* pyspy script
* switch conv layers
* fix roboschool/bullet3
* nenvs assertion
* fix rand atari
* get rid of blanket exception catching
fix soln_db bug
* fix rand_atari
* dynamic routing as cmdline arg
* slight modifications to test_mpi_map and pyspy-all
* max_tries argument for run_until_successs
* dedup option in train_mle
* simplify soln_db
* increase atari horizon for 1 experiment
* start implementing reward increment
* ent multiplier
* create cc dsl
other misc fixes
* cc ops
* q_func -> qs in rl_algos_cc.py
* fix PredictDistr
* rl_ops_cc fixes, MakeAction op
* augment algo agent to support cc stuff
* work on ddpg experiments
* fix blocking
temporarily change logger
* allow layer scaling
* pylint fixes
* spawn_method
* isolate ddpg hacks
* improve pruning
* use spawn for subproc
* remove use of python -c in rcall
* fix pylint warning
* fix static
* maybe fix local backend
* switch to DummyVecEnv
* making some fixes via pylint
* pylint fixes
* fixing tests
* fix tests
* fix tests
* write scaffolding for SSL in Codegen
* logger fix
* fix error
* add EMA op to sl_ops
* save many changes
* save
* add upsampler
* add sl ops, enhance state machine
* get ssl search working — some gross hacking
* fix session/graph issue
* fix importing
* work on mle
* - scale embeddings in gru model
- better exception handling in sl_prob
- use emas for test/val
- use non-contrib batch_norm layer
* improve logging
* option to average before dumping in logger
* default arguments, etc
* new ddpg and identity test
* concat fix
* minor
* move realistic ssl stuff to third-party (underscore to dash)
* fixes
* remove realistic_ssl_evaluation
* pylint fixes
* use gym master
* try again
* pass around args without gin
* fix tests
* separate line to install gym
* rename failing tests that should be ignored
* add data aug
* ssl improvements
* use fixed time limit
* try to fix baselines tests
* add score_floor, max_walltime, fiddle with lr decay
* realistic_ssl
* autopep8
* various ssl
- enable blocking grad for simplification
- kl
- multiple final prediction
* fix pruning
* misc ssl stuff
* bring back linear schedule, don’t use allgather for collecting stats
(i’ve been getting nondeterministic errors from the old code)
* save/load weights in SSL, big stepsize
* cleanup SslProb
* fix
* get rid of kl coef
* fix simplification, lower lr
* search over hps
* minor fixes
* minor
* static analysis
* move files and rename things for improved consistency.
still broken, and just saving before making nontrivial changes
* various
* make tests pass
* move coinrun_train to codegen since it depends on codegen
* fixes
* pylint fixes
* improve tests
fix some things
* improve tests
* lint
* fix up db_info.py, tests
* mostly restore master version of envs directory, except for makefile changes
* fix tests
* improve printing
* minor fixes
* fix fixmes
* pruning test
* fixes
* lint
* write new test that makes tf graphs of random algos; fix some bugs it caught
* add —delete flag to rcall upload-code command
* lint
* get cifar10 lazily for testing purposes
* disable codegen ci tests for now
* clean up rl_ops
* rename spec classes
* td3 with identity test
* identity tests without gin files
* remove gin.configurable from AlgoAgent
* comments about reduction in rl_ops_cc
* address @pzhokhov comments
* fix tests
* more linting
* better tests
* clean up filtering a bit
* fix concat
2019-01-03 13:23:18 -08:00
|
|
|
self.remotes, self.work_remotes = zip(*[ctx.Pipe() for _ in range(nenvs)])
|
|
|
|
self.ps = [ctx.Process(target=worker, args=(work_remote, remote, CloudpickleWrapper(env_fn)))
|
2018-08-17 09:40:35 -07:00
|
|
|
for (work_remote, remote, env_fn) in zip(self.work_remotes, self.remotes, env_fns)]
|
2017-08-18 09:25:39 -07:00
|
|
|
for p in self.ps:
|
2018-08-17 09:40:35 -07:00
|
|
|
p.daemon = True # if the main process crashes, we should not cause things to hang
|
2019-01-15 09:59:27 -08:00
|
|
|
with clear_mpi_env_vars():
|
|
|
|
p.start()
|
2017-10-25 09:21:29 -04:00
|
|
|
for remote in self.work_remotes:
|
|
|
|
remote.close()
|
2017-08-18 09:25:39 -07:00
|
|
|
|
1.5 months of codegen changes (#196)
* play with resnet
* feed_dict version
* coinrun prob and more stats
* fixes to get_choices_specs & hp search
* minor prob fixes
* minor fixes
* minor
* alternative version of rl_algo stuff
* pylint fixes
* fix bugs, move node_filters to soup
* changed how get_algo works
* change how get_algo works, probably broke all tests
* continue previous refactor
* get eval_agent running again
* fixing tests
* fix tests
* fix more tests
* clean up cma stuff
* fix experiment
* minor changes to eval_agent to make ppo_metal use gpu
* make dict space work
* modify mac makefile to use conda
* recurrent layers
* play with bn and resnets
* minor hp changes
* minor
* got rid of use_fb argument and jtft (joint-train-fine-tune) functionality
built test phase directly into AlgoProb
* make new rl algos generateable
* pylint; start fixing tests
* fixing tests
* more test fixes
* pylint
* fix search
* work on search
* hack around infinite loop caused by scan
* algo search fixes
* misc changes for search expt
* enable annealing, overriding options of Op
* pylint fixes
* identity op
* achieve use_last_output through masking so it automatically works in other distributions
* fix tests
* minor
* discrete
* use_last_output to be just a preference, not a hard constraint
* pred delay, pruning
* require nontrivial inputs
* aliases for get_sm
* add probname to probs
* fixes
* small fixes
* fix tests
* fix tests
* fix tests
* minor
* test scripts
* dualgru network improvements
* minor
* work on mysterious bugs
* rcall gpu-usage command for kube
* use cache dir that’s not in code folder, so that it doesn’t get removed by rcall code rsync
* add power mode to gpu usage
* make sure train/test actually different
* remove VR for now
* minor fixes
* simplify soln_db
* minor
* big refactor of mpi eda
* improve mpieda for multitask
* - get rid of timelimit hack
- add __del__ to cleanup SubprocVecEnv
* get multitask working better
* fixes
* working on atari, various
* annotate ops with whether they’re parametrized
* minor
* gym version
* rand atari prob
* minor
* SolnDb bugfix and name change
* pyspy script
* switch conv layers
* fix roboschool/bullet3
* nenvs assertion
* fix rand atari
* get rid of blanket exception catching
fix soln_db bug
* fix rand_atari
* dynamic routing as cmdline arg
* slight modifications to test_mpi_map and pyspy-all
* max_tries argument for run_until_successs
* dedup option in train_mle
* simplify soln_db
* increase atari horizon for 1 experiment
* start implementing reward increment
* ent multiplier
* create cc dsl
other misc fixes
* cc ops
* q_func -> qs in rl_algos_cc.py
* fix PredictDistr
* rl_ops_cc fixes, MakeAction op
* augment algo agent to support cc stuff
* work on ddpg experiments
* fix blocking
temporarily change logger
* allow layer scaling
* pylint fixes
* spawn_method
* isolate ddpg hacks
* improve pruning
* use spawn for subproc
* remove use of python -c in rcall
* fix pylint warning
* fix static
* maybe fix local backend
* switch to DummyVecEnv
* making some fixes via pylint
* pylint fixes
* fixing tests
* fix tests
* fix tests
* write scaffolding for SSL in Codegen
* logger fix
* fix error
* add EMA op to sl_ops
* save many changes
* save
* add upsampler
* add sl ops, enhance state machine
* get ssl search working — some gross hacking
* fix session/graph issue
* fix importing
* work on mle
* - scale embeddings in gru model
- better exception handling in sl_prob
- use emas for test/val
- use non-contrib batch_norm layer
* improve logging
* option to average before dumping in logger
* default arguments, etc
* new ddpg and identity test
* concat fix
* minor
* move realistic ssl stuff to third-party (underscore to dash)
* fixes
* remove realistic_ssl_evaluation
* pylint fixes
* use gym master
* try again
* pass around args without gin
* fix tests
* separate line to install gym
* rename failing tests that should be ignored
* add data aug
* ssl improvements
* use fixed time limit
* try to fix baselines tests
* add score_floor, max_walltime, fiddle with lr decay
* realistic_ssl
* autopep8
* various ssl
- enable blocking grad for simplification
- kl
- multiple final prediction
* fix pruning
* misc ssl stuff
* bring back linear schedule, don’t use allgather for collecting stats
(i’ve been getting nondeterministic errors from the old code)
* save/load weights in SSL, big stepsize
* cleanup SslProb
* fix
* get rid of kl coef
* fix simplification, lower lr
* search over hps
* minor fixes
* minor
* static analysis
* move files and rename things for improved consistency.
still broken, and just saving before making nontrivial changes
* various
* make tests pass
* move coinrun_train to codegen since it depends on codegen
* fixes
* pylint fixes
* improve tests
fix some things
* improve tests
* lint
* fix up db_info.py, tests
* mostly restore master version of envs directory, except for makefile changes
* fix tests
* improve printing
* minor fixes
* fix fixmes
* pruning test
* fixes
* lint
* write new test that makes tf graphs of random algos; fix some bugs it caught
* add —delete flag to rcall upload-code command
* lint
* get cifar10 lazily for testing purposes
* disable codegen ci tests for now
* clean up rl_ops
* rename spec classes
* td3 with identity test
* identity tests without gin files
* remove gin.configurable from AlgoAgent
* comments about reduction in rl_ops_cc
* address @pzhokhov comments
* fix tests
* more linting
* better tests
* clean up filtering a bit
* fix concat
2019-01-03 13:23:18 -08:00
|
|
|
self.remotes[0].send(('get_spaces_spec', None))
|
|
|
|
observation_space, action_space, self.spec = self.remotes[0].recv()
|
2018-08-17 09:40:35 -07:00
|
|
|
self.viewer = None
|
2018-01-25 18:33:48 -08:00
|
|
|
VecEnv.__init__(self, len(env_fns), observation_space, action_space)
|
2017-08-18 09:25:39 -07:00
|
|
|
|
2018-01-25 18:33:48 -08:00
|
|
|
def step_async(self, actions):
|
2018-09-10 11:58:22 -07:00
|
|
|
self._assert_not_closed()
|
2017-08-18 09:25:39 -07:00
|
|
|
for remote, action in zip(self.remotes, actions):
|
|
|
|
remote.send(('step', action))
|
2018-01-25 18:33:48 -08:00
|
|
|
self.waiting = True
|
|
|
|
|
|
|
|
def step_wait(self):
|
2018-09-10 11:58:22 -07:00
|
|
|
self._assert_not_closed()
|
2017-08-18 09:25:39 -07:00
|
|
|
results = [remote.recv() for remote in self.remotes]
|
2018-01-25 18:33:48 -08:00
|
|
|
self.waiting = False
|
2017-08-18 09:25:39 -07:00
|
|
|
obs, rews, dones, infos = zip(*results)
|
2018-12-18 17:37:22 -08:00
|
|
|
return _flatten_obs(obs), np.stack(rews), np.stack(dones), infos
|
2017-08-18 09:25:39 -07:00
|
|
|
|
|
|
|
def reset(self):
|
2018-09-10 11:58:22 -07:00
|
|
|
self._assert_not_closed()
|
2017-08-18 09:25:39 -07:00
|
|
|
for remote in self.remotes:
|
|
|
|
remote.send(('reset', None))
|
2018-12-18 17:37:22 -08:00
|
|
|
return _flatten_obs([remote.recv() for remote in self.remotes])
|
2017-08-18 09:25:39 -07:00
|
|
|
|
2018-08-22 13:54:34 -07:00
|
|
|
def close_extras(self):
|
2018-09-10 11:58:22 -07:00
|
|
|
self.closed = True
|
2018-01-25 18:33:48 -08:00
|
|
|
if self.waiting:
|
2018-08-17 09:40:35 -07:00
|
|
|
for remote in self.remotes:
|
2018-01-25 18:33:48 -08:00
|
|
|
remote.recv()
|
2017-08-18 09:25:39 -07:00
|
|
|
for remote in self.remotes:
|
|
|
|
remote.send(('close', None))
|
|
|
|
for p in self.ps:
|
|
|
|
p.join()
|
2018-06-06 11:39:13 -07:00
|
|
|
|
2018-08-22 13:54:34 -07:00
|
|
|
def get_images(self):
|
2018-09-10 11:58:22 -07:00
|
|
|
self._assert_not_closed()
|
2018-06-06 11:39:13 -07:00
|
|
|
for pipe in self.remotes:
|
|
|
|
pipe.send(('render', None))
|
|
|
|
imgs = [pipe.recv() for pipe in self.remotes]
|
2018-08-22 13:54:34 -07:00
|
|
|
return imgs
|
2018-09-10 11:58:22 -07:00
|
|
|
|
|
|
|
def _assert_not_closed(self):
|
|
|
|
assert not self.closed, "Trying to operate on a SubprocVecEnv after calling close()"
|
2018-12-18 17:37:22 -08:00
|
|
|
|
1.5 months of codegen changes (#196)
* play with resnet
* feed_dict version
* coinrun prob and more stats
* fixes to get_choices_specs & hp search
* minor prob fixes
* minor fixes
* minor
* alternative version of rl_algo stuff
* pylint fixes
* fix bugs, move node_filters to soup
* changed how get_algo works
* change how get_algo works, probably broke all tests
* continue previous refactor
* get eval_agent running again
* fixing tests
* fix tests
* fix more tests
* clean up cma stuff
* fix experiment
* minor changes to eval_agent to make ppo_metal use gpu
* make dict space work
* modify mac makefile to use conda
* recurrent layers
* play with bn and resnets
* minor hp changes
* minor
* got rid of use_fb argument and jtft (joint-train-fine-tune) functionality
built test phase directly into AlgoProb
* make new rl algos generateable
* pylint; start fixing tests
* fixing tests
* more test fixes
* pylint
* fix search
* work on search
* hack around infinite loop caused by scan
* algo search fixes
* misc changes for search expt
* enable annealing, overriding options of Op
* pylint fixes
* identity op
* achieve use_last_output through masking so it automatically works in other distributions
* fix tests
* minor
* discrete
* use_last_output to be just a preference, not a hard constraint
* pred delay, pruning
* require nontrivial inputs
* aliases for get_sm
* add probname to probs
* fixes
* small fixes
* fix tests
* fix tests
* fix tests
* minor
* test scripts
* dualgru network improvements
* minor
* work on mysterious bugs
* rcall gpu-usage command for kube
* use cache dir that’s not in code folder, so that it doesn’t get removed by rcall code rsync
* add power mode to gpu usage
* make sure train/test actually different
* remove VR for now
* minor fixes
* simplify soln_db
* minor
* big refactor of mpi eda
* improve mpieda for multitask
* - get rid of timelimit hack
- add __del__ to cleanup SubprocVecEnv
* get multitask working better
* fixes
* working on atari, various
* annotate ops with whether they’re parametrized
* minor
* gym version
* rand atari prob
* minor
* SolnDb bugfix and name change
* pyspy script
* switch conv layers
* fix roboschool/bullet3
* nenvs assertion
* fix rand atari
* get rid of blanket exception catching
fix soln_db bug
* fix rand_atari
* dynamic routing as cmdline arg
* slight modifications to test_mpi_map and pyspy-all
* max_tries argument for run_until_successs
* dedup option in train_mle
* simplify soln_db
* increase atari horizon for 1 experiment
* start implementing reward increment
* ent multiplier
* create cc dsl
other misc fixes
* cc ops
* q_func -> qs in rl_algos_cc.py
* fix PredictDistr
* rl_ops_cc fixes, MakeAction op
* augment algo agent to support cc stuff
* work on ddpg experiments
* fix blocking
temporarily change logger
* allow layer scaling
* pylint fixes
* spawn_method
* isolate ddpg hacks
* improve pruning
* use spawn for subproc
* remove use of python -c in rcall
* fix pylint warning
* fix static
* maybe fix local backend
* switch to DummyVecEnv
* making some fixes via pylint
* pylint fixes
* fixing tests
* fix tests
* fix tests
* write scaffolding for SSL in Codegen
* logger fix
* fix error
* add EMA op to sl_ops
* save many changes
* save
* add upsampler
* add sl ops, enhance state machine
* get ssl search working — some gross hacking
* fix session/graph issue
* fix importing
* work on mle
* - scale embeddings in gru model
- better exception handling in sl_prob
- use emas for test/val
- use non-contrib batch_norm layer
* improve logging
* option to average before dumping in logger
* default arguments, etc
* new ddpg and identity test
* concat fix
* minor
* move realistic ssl stuff to third-party (underscore to dash)
* fixes
* remove realistic_ssl_evaluation
* pylint fixes
* use gym master
* try again
* pass around args without gin
* fix tests
* separate line to install gym
* rename failing tests that should be ignored
* add data aug
* ssl improvements
* use fixed time limit
* try to fix baselines tests
* add score_floor, max_walltime, fiddle with lr decay
* realistic_ssl
* autopep8
* various ssl
- enable blocking grad for simplification
- kl
- multiple final prediction
* fix pruning
* misc ssl stuff
* bring back linear schedule, don’t use allgather for collecting stats
(i’ve been getting nondeterministic errors from the old code)
* save/load weights in SSL, big stepsize
* cleanup SslProb
* fix
* get rid of kl coef
* fix simplification, lower lr
* search over hps
* minor fixes
* minor
* static analysis
* move files and rename things for improved consistency.
still broken, and just saving before making nontrivial changes
* various
* make tests pass
* move coinrun_train to codegen since it depends on codegen
* fixes
* pylint fixes
* improve tests
fix some things
* improve tests
* lint
* fix up db_info.py, tests
* mostly restore master version of envs directory, except for makefile changes
* fix tests
* improve printing
* minor fixes
* fix fixmes
* pruning test
* fixes
* lint
* write new test that makes tf graphs of random algos; fix some bugs it caught
* add —delete flag to rcall upload-code command
* lint
* get cifar10 lazily for testing purposes
* disable codegen ci tests for now
* clean up rl_ops
* rename spec classes
* td3 with identity test
* identity tests without gin files
* remove gin.configurable from AlgoAgent
* comments about reduction in rl_ops_cc
* address @pzhokhov comments
* fix tests
* more linting
* better tests
* clean up filtering a bit
* fix concat
2019-01-03 13:23:18 -08:00
|
|
|
def __del__(self):
|
|
|
|
if not self.closed:
|
|
|
|
self.close()
|
2018-12-18 17:37:22 -08:00
|
|
|
|
|
|
|
def _flatten_obs(obs):
|
1.5 months of codegen changes (#196)
* play with resnet
* feed_dict version
* coinrun prob and more stats
* fixes to get_choices_specs & hp search
* minor prob fixes
* minor fixes
* minor
* alternative version of rl_algo stuff
* pylint fixes
* fix bugs, move node_filters to soup
* changed how get_algo works
* change how get_algo works, probably broke all tests
* continue previous refactor
* get eval_agent running again
* fixing tests
* fix tests
* fix more tests
* clean up cma stuff
* fix experiment
* minor changes to eval_agent to make ppo_metal use gpu
* make dict space work
* modify mac makefile to use conda
* recurrent layers
* play with bn and resnets
* minor hp changes
* minor
* got rid of use_fb argument and jtft (joint-train-fine-tune) functionality
built test phase directly into AlgoProb
* make new rl algos generateable
* pylint; start fixing tests
* fixing tests
* more test fixes
* pylint
* fix search
* work on search
* hack around infinite loop caused by scan
* algo search fixes
* misc changes for search expt
* enable annealing, overriding options of Op
* pylint fixes
* identity op
* achieve use_last_output through masking so it automatically works in other distributions
* fix tests
* minor
* discrete
* use_last_output to be just a preference, not a hard constraint
* pred delay, pruning
* require nontrivial inputs
* aliases for get_sm
* add probname to probs
* fixes
* small fixes
* fix tests
* fix tests
* fix tests
* minor
* test scripts
* dualgru network improvements
* minor
* work on mysterious bugs
* rcall gpu-usage command for kube
* use cache dir that’s not in code folder, so that it doesn’t get removed by rcall code rsync
* add power mode to gpu usage
* make sure train/test actually different
* remove VR for now
* minor fixes
* simplify soln_db
* minor
* big refactor of mpi eda
* improve mpieda for multitask
* - get rid of timelimit hack
- add __del__ to cleanup SubprocVecEnv
* get multitask working better
* fixes
* working on atari, various
* annotate ops with whether they’re parametrized
* minor
* gym version
* rand atari prob
* minor
* SolnDb bugfix and name change
* pyspy script
* switch conv layers
* fix roboschool/bullet3
* nenvs assertion
* fix rand atari
* get rid of blanket exception catching
fix soln_db bug
* fix rand_atari
* dynamic routing as cmdline arg
* slight modifications to test_mpi_map and pyspy-all
* max_tries argument for run_until_successs
* dedup option in train_mle
* simplify soln_db
* increase atari horizon for 1 experiment
* start implementing reward increment
* ent multiplier
* create cc dsl
other misc fixes
* cc ops
* q_func -> qs in rl_algos_cc.py
* fix PredictDistr
* rl_ops_cc fixes, MakeAction op
* augment algo agent to support cc stuff
* work on ddpg experiments
* fix blocking
temporarily change logger
* allow layer scaling
* pylint fixes
* spawn_method
* isolate ddpg hacks
* improve pruning
* use spawn for subproc
* remove use of python -c in rcall
* fix pylint warning
* fix static
* maybe fix local backend
* switch to DummyVecEnv
* making some fixes via pylint
* pylint fixes
* fixing tests
* fix tests
* fix tests
* write scaffolding for SSL in Codegen
* logger fix
* fix error
* add EMA op to sl_ops
* save many changes
* save
* add upsampler
* add sl ops, enhance state machine
* get ssl search working — some gross hacking
* fix session/graph issue
* fix importing
* work on mle
* - scale embeddings in gru model
- better exception handling in sl_prob
- use emas for test/val
- use non-contrib batch_norm layer
* improve logging
* option to average before dumping in logger
* default arguments, etc
* new ddpg and identity test
* concat fix
* minor
* move realistic ssl stuff to third-party (underscore to dash)
* fixes
* remove realistic_ssl_evaluation
* pylint fixes
* use gym master
* try again
* pass around args without gin
* fix tests
* separate line to install gym
* rename failing tests that should be ignored
* add data aug
* ssl improvements
* use fixed time limit
* try to fix baselines tests
* add score_floor, max_walltime, fiddle with lr decay
* realistic_ssl
* autopep8
* various ssl
- enable blocking grad for simplification
- kl
- multiple final prediction
* fix pruning
* misc ssl stuff
* bring back linear schedule, don’t use allgather for collecting stats
(i’ve been getting nondeterministic errors from the old code)
* save/load weights in SSL, big stepsize
* cleanup SslProb
* fix
* get rid of kl coef
* fix simplification, lower lr
* search over hps
* minor fixes
* minor
* static analysis
* move files and rename things for improved consistency.
still broken, and just saving before making nontrivial changes
* various
* make tests pass
* move coinrun_train to codegen since it depends on codegen
* fixes
* pylint fixes
* improve tests
fix some things
* improve tests
* lint
* fix up db_info.py, tests
* mostly restore master version of envs directory, except for makefile changes
* fix tests
* improve printing
* minor fixes
* fix fixmes
* pruning test
* fixes
* lint
* write new test that makes tf graphs of random algos; fix some bugs it caught
* add —delete flag to rcall upload-code command
* lint
* get cifar10 lazily for testing purposes
* disable codegen ci tests for now
* clean up rl_ops
* rename spec classes
* td3 with identity test
* identity tests without gin files
* remove gin.configurable from AlgoAgent
* comments about reduction in rl_ops_cc
* address @pzhokhov comments
* fix tests
* more linting
* better tests
* clean up filtering a bit
* fix concat
2019-01-03 13:23:18 -08:00
|
|
|
assert isinstance(obs, (list, tuple))
|
2018-12-18 17:37:22 -08:00
|
|
|
assert len(obs) > 0
|
|
|
|
|
|
|
|
if isinstance(obs[0], dict):
|
|
|
|
import collections
|
|
|
|
assert isinstance(obs, collections.OrderedDict)
|
|
|
|
keys = obs[0].keys()
|
|
|
|
return {k: np.stack([o[k] for o in obs]) for k in keys}
|
|
|
|
else:
|
|
|
|
return np.stack(obs)
|