Procgen Benchmark Updates (#328)
* directory cleanup * logging, num_experiments * fixes * cleanup * gin fixes * fix local max gpu * resid nx * tweak * num machines and download params * rename * cleanup * create workbench * more reorg * fix * more logging wrappers * lint fix * restore train procgen * restore train procgen * pylint fix * better wrapping * whackamole walls * config sweep * tweak * args sweep * tweak * test workers * mpi_weight * train test comm and high difficulty fix * enjoy show returns * better joint training * tweak * Add —update to args and add gin-config to requirements.txt * add username to download_file * removing gin, procgen_parser * removing gin * procgen args * config fixes * cleanup * cleanup * procgen args fix * fix * rcall syncing * lint * rename mpi_weight * begin composable game * more composable game * tweak * background alpha * use username for sync * fixes * microbatch fix * lure composable game * merge * proc trans update * proc trans update (#307) * finetuning experiment * Change is_local to use `use_rcall` and fix error of `enjoy.py` with multiple ends * graphing help * add --local * change args_dict['env_name'] to ENV_NAME * finetune experiments * tweak * tweak * reorg wrappers, remove is_local * workdir/local fixes * move finetune experiments * default dir and graphing * more graphing * fix * pooled syncing * tweaks * dir fix * tweak * wrapper mpi fix * wind and turrets * composability cleanup * radius cleanup * composable reorg * laser gates * composable tweaks * soft walls * tweak * begin swamp * more swamp * more swamp * fix * hidden mines * use maze layout * tweak * laser gate tweaks * tweaks * tweaks * lure/propel updates * composable midnight * composable coinmaze * composability difficulty * tweak * add step to save_params * composable offsets * composable boxpush * composable combiner * tweak * tweak * always choose correct number of mechanics * fix * rcall local fix * add steps when dump and save parmas * loading rank 1,2,3.. error fix * add experiments.py * fix loading latest weight with no -rest * support more complex run_id and add more examples * fix typo * move post_run_id into experiments.py * add hp_search example * error fix * joint experiments in progress * joint hp finished * typo * error fix * edit experiments * Save experiments set up in code and save weights per step (#319) * add step to save_params * add steps when dump and save parmas * loading rank 1,2,3.. error fix * add experiments.py * fix loading latest weight with no -rest * support more complex run_id and add more examples * fix typo * move post_run_id into experiments.py * add hp_search example * error fix * joint experiments in progress * joint hp finished * typo * error fix * edit experiments * tweaks * graph exp WIP * depth tweaks * move save_all * fix * restore_dir name * restore depth * choose max mechanics * use override mode * tweak frogger * lstm default * fix * patience is composable * hunter is composable * fixed asset seed cleanup * minesweeper is composable * eggcatch is composable * tweak * applesort is composable * chaser game * begin lighter * lighter game * tractor game * boxgather game * plumber game * hitcher game * doorbell game * lawnmower game * connecter game * cannonaim * outrun game * encircle game * spinner game * tweak * tweak * detonator game * driller * driller * mixer * conveyor * conveyor game * joint pcg experiments * fixes * pcg sweep experiment * cannonaim fix * combiner fix * store save time * laseraim fix * lightup fix * detonator tweaks * detonator fixes * driller fix * lawnmower calibration * spinner calibration * propel fix * train experiment * print load time * system independent hashing * remove gin configurable * task ids fix * test_pcg experiment * connecter dense reward * hard_pcg * num train comms * mpi splits envs * tweaks * tweaks * graph tweaks * graph tweaks * lint fix * fix tests * load bugfix * difficulty timeout tweak * tweaks * more graphing * graph tweaks * tweak * download file fix * pcg train envs list * cleanup * tweak * manually name impala layers * tweak * expect fps * backend arg * args tweak * workbench cleanup * move graph files * workbench cleanup * split env name by comma * workbench cleanup * ema graph * remove Dict * use tf.io.gfile * comments for auto-killing jobs * lint fix * write latest file when not saving all and load it when step=None
This commit is contained in:
committed by
Peter Zhokhov
parent
bc4eef6053
commit
ddcab1606d
@@ -30,8 +30,17 @@ def build_impala_cnn(unscaled_images, depths=[16,32,32], **conv_kwargs):
|
||||
Model used in the paper "IMPALA: Scalable Distributed Deep-RL with
|
||||
Importance Weighted Actor-Learner Architectures" https://arxiv.org/abs/1802.01561
|
||||
"""
|
||||
|
||||
layer_num = 0
|
||||
|
||||
def get_layer_num_str():
|
||||
nonlocal layer_num
|
||||
num_str = str(layer_num)
|
||||
layer_num += 1
|
||||
return num_str
|
||||
|
||||
def conv_layer(out, depth):
|
||||
return tf.layers.conv2d(out, depth, 3, padding='same')
|
||||
return tf.layers.conv2d(out, depth, 3, padding='same', name='layer_' + get_layer_num_str())
|
||||
|
||||
def residual_block(inputs):
|
||||
depth = inputs.get_shape()[-1].value
|
||||
@@ -57,7 +66,7 @@ def build_impala_cnn(unscaled_images, depths=[16,32,32], **conv_kwargs):
|
||||
|
||||
out = tf.layers.flatten(out)
|
||||
out = tf.nn.relu(out)
|
||||
out = tf.layers.dense(out, 256, activation=tf.nn.relu)
|
||||
out = tf.layers.dense(out, 256, activation=tf.nn.relu, name='layer_' + get_layer_num_str())
|
||||
|
||||
return out
|
||||
|
||||
|
@@ -8,7 +8,7 @@ class MicrobatchedModel(Model):
|
||||
on the entire minibatch causes some overflow
|
||||
"""
|
||||
def __init__(self, *, policy, ob_space, ac_space, nbatch_act, nbatch_train,
|
||||
nsteps, ent_coef, vf_coef, max_grad_norm, mpi_rank_weight, microbatch_size):
|
||||
nsteps, ent_coef, vf_coef, max_grad_norm, mpi_rank_weight, comm, microbatch_size):
|
||||
|
||||
self.nmicrobatches = nbatch_train // microbatch_size
|
||||
self.microbatch_size = microbatch_size
|
||||
@@ -24,7 +24,8 @@ class MicrobatchedModel(Model):
|
||||
ent_coef=ent_coef,
|
||||
vf_coef=vf_coef,
|
||||
max_grad_norm=max_grad_norm,
|
||||
mpi_rank_weight=mpi_rank_weight)
|
||||
mpi_rank_weight=mpi_rank_weight,
|
||||
comm=comm)
|
||||
|
||||
self.grads_ph = [tf.placeholder(dtype=g.dtype, shape=g.shape) for g in self.grads]
|
||||
grads_ph_and_vars = list(zip(self.grads_ph, self.var))
|
||||
|
@@ -25,9 +25,12 @@ class Model(object):
|
||||
- Save load the model
|
||||
"""
|
||||
def __init__(self, *, policy, ob_space, ac_space, nbatch_act, nbatch_train,
|
||||
nsteps, ent_coef, vf_coef, max_grad_norm, mpi_rank_weight=1, microbatch_size=None):
|
||||
nsteps, ent_coef, vf_coef, max_grad_norm, mpi_rank_weight=1, comm=None, microbatch_size=None):
|
||||
self.sess = sess = get_session()
|
||||
|
||||
if MPI is not None and comm is None:
|
||||
comm = MPI.COMM_WORLD
|
||||
|
||||
with tf.variable_scope('ppo2_model', reuse=tf.AUTO_REUSE):
|
||||
# CREATE OUR TWO MODELS
|
||||
# act_model that is used for sampling
|
||||
@@ -91,8 +94,8 @@ class Model(object):
|
||||
# 1. Get the model parameters
|
||||
params = tf.trainable_variables('ppo2_model')
|
||||
# 2. Build our trainer
|
||||
if MPI is not None:
|
||||
self.trainer = MpiAdamOptimizer(MPI.COMM_WORLD, learning_rate=LR, mpi_rank_weight=mpi_rank_weight, epsilon=1e-5)
|
||||
if comm is not None and comm.Get_size() > 1:
|
||||
self.trainer = MpiAdamOptimizer(comm, learning_rate=LR, mpi_rank_weight=mpi_rank_weight, epsilon=1e-5)
|
||||
else:
|
||||
self.trainer = tf.train.AdamOptimizer(learning_rate=LR, epsilon=1e-5)
|
||||
# 3. Calculate the gradients
|
||||
|
@@ -21,7 +21,7 @@ def constfn(val):
|
||||
def learn(*, network, env, total_timesteps, eval_env = None, seed=None, nsteps=2048, ent_coef=0.0, lr=3e-4,
|
||||
vf_coef=0.5, max_grad_norm=0.5, gamma=0.99, lam=0.95,
|
||||
log_interval=10, nminibatches=4, noptepochs=4, cliprange=0.2,
|
||||
save_interval=0, load_path=None, model_fn=None, update_fn=None, init_fn=None, mpi_rank_weight=1, **network_kwargs):
|
||||
save_interval=0, load_path=None, model_fn=None, update_fn=None, init_fn=None, mpi_rank_weight=1, comm=None, **network_kwargs):
|
||||
'''
|
||||
Learn policy using PPO algorithm (https://arxiv.org/abs/1707.06347)
|
||||
|
||||
@@ -105,7 +105,7 @@ def learn(*, network, env, total_timesteps, eval_env = None, seed=None, nsteps=2
|
||||
|
||||
model = model_fn(policy=policy, ob_space=ob_space, ac_space=ac_space, nbatch_act=nenvs, nbatch_train=nbatch_train,
|
||||
nsteps=nsteps, ent_coef=ent_coef, vf_coef=vf_coef,
|
||||
max_grad_norm=max_grad_norm, mpi_rank_weight=mpi_rank_weight)
|
||||
max_grad_norm=max_grad_norm, comm=comm, mpi_rank_weight=mpi_rank_weight)
|
||||
|
||||
if load_path is not None:
|
||||
model.load(load_path)
|
||||
|
Reference in New Issue
Block a user