Compare commits
31 Commits
games/mast
...
old_acktr_
Author | SHA1 | Date | |
---|---|---|---|
|
0a40206c6c | ||
|
1937826784 | ||
|
b29c8020d7 | ||
|
4ec308aaa4 | ||
|
3bbf3f3511 | ||
|
e5de29a954 | ||
|
2507d335f9 | ||
|
bdd4d385a6 | ||
|
0961f5dd94 | ||
|
337d913a8f | ||
|
34af61a132 | ||
|
1ea5ec647c | ||
|
2fc7a1cbee | ||
|
14c1d69ef4 | ||
|
c8f6d8bac7 | ||
|
3a006ba50e | ||
|
c6c0f45cb1 | ||
|
e92a6ad8f4 | ||
|
92b9a37257 | ||
|
cb14da96ca | ||
|
3900f2a447 | ||
|
20d22a5d79 | ||
|
caf7b08b4d | ||
|
ca0165cdf5 | ||
|
eb5b605f86 | ||
|
a89bee3c8d | ||
|
353bb15e90 | ||
|
64c0c0a043 | ||
|
5fee99e771 | ||
|
5edcd6886e | ||
|
cd375ab209 |
@@ -10,5 +10,5 @@ install:
|
||||
- docker build . -t baselines-test
|
||||
|
||||
script:
|
||||
- flake8 --select=F,E999 baselines/common baselines/trpo_mpi baselines/ppo2 baselines/a2c baselines/deepq baselines/acer
|
||||
- docker run baselines-test pytest --runslow
|
||||
- flake8 .
|
||||
- docker run baselines-test pytest -v .
|
||||
|
@@ -18,6 +18,7 @@ WORKDIR $CODE_DIR/baselines
|
||||
# Clean up pycache and pyc files
|
||||
RUN rm -rf __pycache__ && \
|
||||
find . -name "*.pyc" -delete && \
|
||||
pip install tensorflow && \
|
||||
pip install -e .[test]
|
||||
|
||||
|
||||
|
27
README.md
27
README.md
@@ -45,8 +45,8 @@ cd baselines
|
||||
```
|
||||
If using virtualenv, create a new virtualenv and activate it
|
||||
```bash
|
||||
virtualenv env --python=python3
|
||||
. env/bin/activate
|
||||
virtualenv env --python=python3
|
||||
. env/bin/activate
|
||||
```
|
||||
Install baselines package
|
||||
```bash
|
||||
@@ -62,29 +62,20 @@ pip install pytest
|
||||
pytest
|
||||
```
|
||||
|
||||
## Subpackages
|
||||
|
||||
## Testing the installation
|
||||
All unit tests in baselines can be run using pytest runner:
|
||||
```
|
||||
pip install pytest
|
||||
pytest
|
||||
```
|
||||
|
||||
## Training models
|
||||
Most of the algorithms in baselines repo are used as follows:
|
||||
```bash
|
||||
python -m baselines.run --alg=<name of the algorithm> --env=<environment_id> [additional arguments]
|
||||
python -m baselines.run --alg=<name of the algorithm> --env=<environment_id> [additional arguments]
|
||||
```
|
||||
### Example 1. PPO with MuJoCo Humanoid
|
||||
For instance, to train a fully-connected network controlling MuJoCo humanoid using a2c for 20M timesteps
|
||||
For instance, to train a fully-connected network controlling MuJoCo humanoid using PPO2 for 20M timesteps
|
||||
```bash
|
||||
python -m baselines.run --alg=a2c --env=Humanoid-v2 --network=mlp --num_timesteps=2e7
|
||||
python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7
|
||||
```
|
||||
Note that for mujoco environments fully-connected network is default, so we can omit `--network=mlp`
|
||||
The hyperparameters for both network and the learning algorithm can be controlled via the command line, for instance:
|
||||
```bash
|
||||
python -m baselines.run --alg=a2c --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --ent_coef=0.1 --num_hidden=32 --num_layers=3 --value_network=copy
|
||||
python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --ent_coef=0.1 --num_hidden=32 --num_layers=3 --value_network=copy
|
||||
```
|
||||
will set entropy coeffient to 0.1, and construct fully connected network with 3 layers with 32 hidden units in each, and create a separate network for value function estimation (so that its parameters are not shared with the policy network, but the structure is the same)
|
||||
|
||||
@@ -94,7 +85,7 @@ docstring for [baselines/ppo2/ppo2.py/learn()](ppo2/ppo2.py) fir the description
|
||||
### Example 2. DQN on Atari
|
||||
DQN with Atari is at this point a classics of benchmarks. To run the baselines implementation of DQN on Atari Pong:
|
||||
```
|
||||
python -m baselines.run --alg=deepq --env=PongNoFrameskip-v4 --num_timesteps=1e6
|
||||
python -m baselines.run --alg=deepq --env=PongNoFrameskip-v4 --num_timesteps=1e6
|
||||
```
|
||||
|
||||
## Saving, loading and visualizing models
|
||||
@@ -102,11 +93,11 @@ The algorithms serialization API is not properly unified yet; however, there is
|
||||
`--save_path` and `--load_path` command-line option loads the tensorflow state from a given path before training, and saves it after the training, respectively.
|
||||
Let's imagine you'd like to train ppo2 on Atari Pong, save the model and then later visualize what has it learnt.
|
||||
```bash
|
||||
python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num-timesteps=2e7 --save_path=~/models/pong_20M_ppo2
|
||||
python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=2e7 --save_path=~/models/pong_20M_ppo2
|
||||
```
|
||||
This should get to the mean reward per episode about 5k. To load and visualize the model, we'll do the following - load the model, train it for 0 steps, and then visualize:
|
||||
```bash
|
||||
python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num-timesteps=0 --load_path=~/models/pong_20M_ppo2 --play
|
||||
python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=0 --load_path=~/models/pong_20M_ppo2 --play
|
||||
```
|
||||
|
||||
*NOTE:* At the moment Mujoco training uses VecNormalize wrapper for the environment which is not being saved correctly; so loading the models trained on Mujoco will not work well if the environment is recreated. If necessary, you can work around that by replacing RunningMeanStd by TfRunningMeanStd in [baselines/common/vec_env/vec_normalize.py](baselines/common/vec_env/vec_normalize.py#L12). This way, mean and std of environment normalizing wrapper will be saved in tensorflow variables and included in the model file; however, training is slower that way - hence not including it by default
|
||||
|
@@ -54,7 +54,7 @@ def learn(env, policy, vf, gamma, lam, timesteps_per_batch, num_timesteps,
|
||||
stepsize = tf.Variable(initial_value=np.float32(np.array(0.03)), name='stepsize')
|
||||
inputs, loss, loss_sampled = policy.update_info
|
||||
optim = kfac.KfacOptimizer(learning_rate=stepsize, cold_lr=stepsize*(1-0.9), momentum=0.9, kfac_update=2,\
|
||||
epsilon=1e-2, stats_decay=0.99, async=1, cold_iter=1,
|
||||
epsilon=1e-2, stats_decay=0.99, async_=1, cold_iter=1,
|
||||
weight_decay_dict=policy.wd_dict, max_grad_norm=None)
|
||||
pi_var_list = []
|
||||
for var in tf.trainable_variables():
|
||||
|
@@ -58,7 +58,7 @@ class Model(object):
|
||||
with tf.device('/gpu:0'):
|
||||
self.optim = optim = kfac.KfacOptimizer(learning_rate=PG_LR, clip_kl=kfac_clip,\
|
||||
momentum=0.9, kfac_update=1, epsilon=0.01,\
|
||||
stats_decay=0.99, async=1, cold_iter=10, max_grad_norm=max_grad_norm)
|
||||
stats_decay=0.99, async_=1, cold_iter=10, max_grad_norm=max_grad_norm)
|
||||
|
||||
update_stats_op = optim.compute_and_apply_stats(joint_fisher_loss, var_list=params)
|
||||
train_op, q_runner = optim.apply_gradients(list(zip(grads,params)))
|
||||
@@ -97,7 +97,7 @@ def learn(network, env, seed, total_timesteps=int(40e6), gamma=0.99, log_interva
|
||||
kfac_clip=0.001, save_interval=None, lrschedule='linear', load_path=None, **network_kwargs):
|
||||
set_global_seeds(seed)
|
||||
|
||||
|
||||
|
||||
if network == 'cnn':
|
||||
network_kwargs['one_dim_bias'] = True
|
||||
|
||||
@@ -115,7 +115,7 @@ def learn(network, env, seed, total_timesteps=int(40e6), gamma=0.99, log_interva
|
||||
with open(osp.join(logger.get_dir(), 'make_model.pkl'), 'wb') as fh:
|
||||
fh.write(cloudpickle.dumps(make_model))
|
||||
model = make_model()
|
||||
|
||||
|
||||
if load_path is not None:
|
||||
model.load(load_path)
|
||||
|
||||
|
@@ -10,14 +10,14 @@ KFAC_DEBUG = False
|
||||
|
||||
class KfacOptimizer():
|
||||
|
||||
def __init__(self, learning_rate=0.01, momentum=0.9, clip_kl=0.01, kfac_update=2, stats_accum_iter=60, full_stats_init=False, cold_iter=100, cold_lr=None, async=False, async_stats=False, epsilon=1e-2, stats_decay=0.95, blockdiag_bias=False, channel_fac=False, factored_damping=False, approxT2=False, use_float64=False, weight_decay_dict={},max_grad_norm=0.5):
|
||||
def __init__(self, learning_rate=0.01, momentum=0.9, clip_kl=0.01, kfac_update=2, stats_accum_iter=60, full_stats_init=False, cold_iter=100, cold_lr=None, async_=False, async_stats=False, epsilon=1e-2, stats_decay=0.95, blockdiag_bias=False, channel_fac=False, factored_damping=False, approxT2=False, use_float64=False, weight_decay_dict={},max_grad_norm=0.5):
|
||||
self.max_grad_norm = max_grad_norm
|
||||
self._lr = learning_rate
|
||||
self._momentum = momentum
|
||||
self._clip_kl = clip_kl
|
||||
self._channel_fac = channel_fac
|
||||
self._kfac_update = kfac_update
|
||||
self._async = async
|
||||
self._async = async_
|
||||
self._async_stats = async_stats
|
||||
self._epsilon = epsilon
|
||||
self._stats_decay = stats_decay
|
||||
|
@@ -1,23 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
|
||||
from functools import partial
|
||||
|
||||
from baselines import logger
|
||||
from baselines.acktr.acktr_disc import learn
|
||||
from baselines.common.cmd_util import make_atari_env, atari_arg_parser
|
||||
from baselines.common.vec_env.vec_frame_stack import VecFrameStack
|
||||
from baselines.common.policies import cnn
|
||||
|
||||
def train(env_id, num_timesteps, seed, num_cpu):
|
||||
env = VecFrameStack(make_atari_env(env_id, num_cpu, seed), 4)
|
||||
policy_fn = cnn(env=env, one_dim_bias=True)
|
||||
learn(policy_fn, env, seed, total_timesteps=int(num_timesteps * 1.1), nprocs=num_cpu)
|
||||
env.close()
|
||||
|
||||
def main():
|
||||
args = atari_arg_parser().parse_args()
|
||||
logger.configure()
|
||||
train(args.env, num_timesteps=args.num_timesteps, seed=args.seed, num_cpu=32)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
@@ -21,7 +21,7 @@ class NeuralNetValueFunction(object):
|
||||
self._predict = U.function([X], vpred_n)
|
||||
optim = kfac.KfacOptimizer(learning_rate=0.001, cold_lr=0.001*(1-0.9), momentum=0.9, \
|
||||
clip_kl=0.3, epsilon=0.1, stats_decay=0.95, \
|
||||
async=1, kfac_update=2, cold_iter=50, \
|
||||
async_=1, kfac_update=2, cold_iter=50, \
|
||||
weight_decay_dict=wd_dict, max_grad_norm=None)
|
||||
vf_var_list = []
|
||||
for var in tf.trainable_variables():
|
||||
|
@@ -15,22 +15,31 @@ from baselines.bench import Monitor
|
||||
from baselines.common import set_global_seeds
|
||||
from baselines.common.atari_wrappers import make_atari, wrap_deepmind
|
||||
from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv
|
||||
from baselines.common.vec_env.dummy_vec_env import DummyVecEnv
|
||||
from baselines.common.retro_wrappers import RewardScaler
|
||||
|
||||
def make_atari_env(env_id, num_env, seed, wrapper_kwargs=None, start_index=0):
|
||||
|
||||
def make_vec_env(env_id, env_type, num_env, seed, wrapper_kwargs=None, start_index=0, reward_scale=1.0):
|
||||
"""
|
||||
Create a wrapped, monitored SubprocVecEnv for Atari.
|
||||
Create a wrapped, monitored SubprocVecEnv for Atari and MuJoCo.
|
||||
"""
|
||||
if wrapper_kwargs is None: wrapper_kwargs = {}
|
||||
mpi_rank = MPI.COMM_WORLD.Get_rank() if MPI else 0
|
||||
def make_env(rank): # pylint: disable=C0111
|
||||
def _thunk():
|
||||
env = make_atari(env_id)
|
||||
env = make_atari(env_id) if env_type == 'atari' else gym.make(env_id)
|
||||
env.seed(seed + 10000*mpi_rank + rank if seed is not None else None)
|
||||
env = Monitor(env, logger.get_dir() and os.path.join(logger.get_dir(), str(mpi_rank) + '.' + str(rank)))
|
||||
return wrap_deepmind(env, **wrapper_kwargs)
|
||||
env = Monitor(env,
|
||||
logger.get_dir() and os.path.join(logger.get_dir(), str(mpi_rank) + '.' + str(rank)),
|
||||
allow_early_resets=True)
|
||||
|
||||
if env_type == 'atari': return wrap_deepmind(env, **wrapper_kwargs)
|
||||
elif reward_scale != 1: return RewardScaler(env, reward_scale)
|
||||
else: return env
|
||||
return _thunk
|
||||
set_global_seeds(seed)
|
||||
return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
|
||||
if num_env > 1: return SubprocVecEnv([make_env(i + start_index) for i in range(num_env)])
|
||||
else: return DummyVecEnv([make_env(start_index)])
|
||||
|
||||
def make_mujoco_env(env_id, seed, reward_scale=1.0):
|
||||
"""
|
||||
@@ -40,13 +49,12 @@ def make_mujoco_env(env_id, seed, reward_scale=1.0):
|
||||
myseed = seed + 1000 * rank if seed is not None else None
|
||||
set_global_seeds(myseed)
|
||||
env = gym.make(env_id)
|
||||
env = Monitor(env, os.path.join(logger.get_dir(), str(rank)), allow_early_resets=True)
|
||||
logger_path = None if logger.get_dir() is None else os.path.join(logger.get_dir(), str(rank))
|
||||
env = Monitor(env, logger_path, allow_early_resets=True)
|
||||
env.seed(seed)
|
||||
|
||||
if reward_scale != 1.0:
|
||||
from baselines.common.retro_wrappers import RewardScaler
|
||||
env = RewardScaler(env, reward_scale)
|
||||
|
||||
return env
|
||||
|
||||
def make_robotics_env(env_id, seed, rank=0):
|
||||
@@ -88,7 +96,7 @@ def common_arg_parser():
|
||||
parser.add_argument('--env', help='environment ID', type=str, default='Reacher-v2')
|
||||
parser.add_argument('--seed', help='RNG seed', type=int, default=None)
|
||||
parser.add_argument('--alg', help='Algorithm', type=str, default='ppo2')
|
||||
parser.add_argument('--num_timesteps', type=float, default=1e6),
|
||||
parser.add_argument('--num_timesteps', type=float, default=1e6),
|
||||
parser.add_argument('--network', help='network type (mlp, cnn, lstm, cnn_lstm, conv_only)', default=None)
|
||||
parser.add_argument('--gamestate', help='game state to load (so far only used in retro games)', default=None)
|
||||
parser.add_argument('--num_env', help='Number of environment copies being run in parallel. When not specified, set to number of cpus for Atari, and to 1 for Mujoco', default=None, type=int)
|
||||
@@ -121,6 +129,3 @@ def parse_unknown_args(args):
|
||||
retval[key] = value
|
||||
|
||||
return retval
|
||||
|
||||
|
||||
|
||||
|
@@ -2,6 +2,8 @@ from __future__ import print_function
|
||||
from contextlib import contextmanager
|
||||
import numpy as np
|
||||
import time
|
||||
import shlex
|
||||
import subprocess
|
||||
|
||||
# ================================================================
|
||||
# Misc
|
||||
@@ -37,7 +39,7 @@ color2num = dict(
|
||||
crimson=38
|
||||
)
|
||||
|
||||
def colorize(string, color, bold=False, highlight=False):
|
||||
def colorize(string, color='green', bold=False, highlight=False):
|
||||
attr = []
|
||||
num = color2num[color]
|
||||
if highlight: num += 10
|
||||
@@ -45,6 +47,22 @@ def colorize(string, color, bold=False, highlight=False):
|
||||
if bold: attr.append('1')
|
||||
return '\x1b[%sm%s\x1b[0m' % (';'.join(attr), string)
|
||||
|
||||
def print_cmd(cmd, dry=False):
|
||||
if isinstance(cmd, str): # for shell=True
|
||||
pass
|
||||
else:
|
||||
cmd = ' '.join(shlex.quote(arg) for arg in cmd)
|
||||
print(colorize(('CMD: ' if not dry else 'DRY: ') + cmd))
|
||||
|
||||
|
||||
def get_git_commit(cwd=None):
|
||||
return subprocess.check_output(['git', 'rev-parse', '--short', 'HEAD'], cwd=cwd).decode('utf8')
|
||||
|
||||
def ccap(cmd, dry=False, env=None, **kwargs):
|
||||
print_cmd(cmd, dry)
|
||||
if not dry:
|
||||
subprocess.check_call(cmd, env=env, **kwargs)
|
||||
|
||||
|
||||
MESSAGE_DEPTH = 0
|
||||
|
||||
|
@@ -22,8 +22,7 @@ def nature_cnn(unscaled_images, **conv_kwargs):
|
||||
|
||||
def mlp(num_layers=2, num_hidden=64, activation=tf.tanh):
|
||||
"""
|
||||
Simple fully connected layer policy. Separate stacks of fully-connected layers are used for policy and value function estimation.
|
||||
More customized fully-connected policies can be obtained by using PolicyWithV class directly.
|
||||
Stack of fully-connected layers to be used in a policy / q-function approximator
|
||||
|
||||
Parameters:
|
||||
----------
|
||||
@@ -37,7 +36,7 @@ def mlp(num_layers=2, num_hidden=64, activation=tf.tanh):
|
||||
Returns:
|
||||
-------
|
||||
|
||||
function that builds fully connected network with a given input placeholder
|
||||
function that builds fully connected network with a given input tensor / placeholder
|
||||
"""
|
||||
def network_fn(X):
|
||||
h = tf.layers.flatten(X)
|
||||
@@ -68,6 +67,34 @@ def cnn_small(**conv_kwargs):
|
||||
|
||||
|
||||
def lstm(nlstm=128, layer_norm=False):
|
||||
"""
|
||||
Builds LSTM (Long-Short Term Memory) network to be used in a policy.
|
||||
Note that the resulting function returns not only the output of the LSTM
|
||||
(i.e. hidden state of lstm for each step in the sequence), but also a dictionary
|
||||
with auxiliary tensors to be set as policy attributes.
|
||||
|
||||
Specifically,
|
||||
S is a placeholder to feed current state (LSTM state has to be managed outside policy)
|
||||
M is a placeholder for the mask (used to mask out observations after the end of the episode, but can be used for other purposes too)
|
||||
initial_state is a numpy array containing initial lstm state (usually zeros)
|
||||
state is the output LSTM state (to be fed into S at the next call)
|
||||
|
||||
|
||||
An example of usage of lstm-based policy can be found here: common/tests/test_doc_examples.py/test_lstm_example
|
||||
|
||||
Parameters:
|
||||
----------
|
||||
|
||||
nlstm: int LSTM hidden state size
|
||||
|
||||
layer_norm: bool if True, layer-normalized version of LSTM is used
|
||||
|
||||
Returns:
|
||||
-------
|
||||
|
||||
function that builds LSTM with a given input tensor / placeholder
|
||||
"""
|
||||
|
||||
def network_fn(X, nenv=1):
|
||||
nbatch = X.shape[0]
|
||||
nsteps = nbatch // nenv
|
||||
|
@@ -72,7 +72,7 @@ class PolicyWithValue(object):
|
||||
|
||||
def step(self, observation, **extra_feed):
|
||||
"""
|
||||
Compute next action(s) given the observaion(s)
|
||||
Compute next action(s) given the observation(s)
|
||||
|
||||
Parameters:
|
||||
----------
|
||||
@@ -93,7 +93,7 @@ class PolicyWithValue(object):
|
||||
|
||||
def value(self, ob, *args, **kwargs):
|
||||
"""
|
||||
Compute value estimate(s) given the observaion(s)
|
||||
Compute value estimate(s) given the observation(s)
|
||||
|
||||
Parameters:
|
||||
----------
|
||||
|
@@ -14,7 +14,7 @@ common_kwargs = dict(
|
||||
learn_kwargs = {
|
||||
'a2c' : dict(nsteps=32, value_network='copy', lr=0.05),
|
||||
'acktr': dict(nsteps=32, value_network='copy'),
|
||||
'deepq': {},
|
||||
'deepq': dict(total_timesteps=20000),
|
||||
'ppo2': dict(value_network='copy'),
|
||||
'trpo_mpi': {}
|
||||
}
|
||||
@@ -38,3 +38,6 @@ def test_cartpole(alg):
|
||||
return env
|
||||
|
||||
reward_per_episode_test(env_fn, learn_fn, 100)
|
||||
|
||||
if __name__ == '__main__':
|
||||
test_cartpole('deepq')
|
||||
|
48
baselines/common/tests/test_doc_examples.py
Normal file
48
baselines/common/tests/test_doc_examples.py
Normal file
@@ -0,0 +1,48 @@
|
||||
import pytest
|
||||
try:
|
||||
import mujoco_py
|
||||
_mujoco_present = True
|
||||
except BaseException:
|
||||
mujoco_py = None
|
||||
_mujoco_present = False
|
||||
|
||||
|
||||
@pytest.mark.skipif(
|
||||
not _mujoco_present,
|
||||
reason='error loading mujoco - either mujoco / mujoco key not present, or LD_LIBRARY_PATH is not pointing to mujoco library'
|
||||
)
|
||||
def test_lstm_example():
|
||||
import tensorflow as tf
|
||||
from baselines.common import policies, models, cmd_util
|
||||
from baselines.common.vec_env.dummy_vec_env import DummyVecEnv
|
||||
|
||||
# create vectorized environment
|
||||
venv = DummyVecEnv([lambda: cmd_util.make_mujoco_env('Reacher-v2', seed=0)])
|
||||
|
||||
with tf.Session() as sess:
|
||||
# build policy based on lstm network with 128 units
|
||||
policy = policies.build_policy(venv, models.lstm(128))(nbatch=1, nsteps=1)
|
||||
|
||||
# initialize tensorflow variables
|
||||
sess.run(tf.global_variables_initializer())
|
||||
|
||||
# prepare environment variables
|
||||
ob = venv.reset()
|
||||
state = policy.initial_state
|
||||
done = [False]
|
||||
step_counter = 0
|
||||
|
||||
# run a single episode until the end (i.e. until done)
|
||||
while True:
|
||||
action, _, state, _ = policy.step(ob, S=state, M=done)
|
||||
ob, reward, done, _ = venv.step(action)
|
||||
step_counter += 1
|
||||
if done:
|
||||
break
|
||||
|
||||
|
||||
assert step_counter > 5
|
||||
|
||||
|
||||
|
||||
|
@@ -62,7 +62,7 @@ def make_session(config=None, num_cpu=None, make_default=False, graph=None):
|
||||
num_cpu = int(os.getenv('RCALL_NUM_CPU', multiprocessing.cpu_count()))
|
||||
if config is None:
|
||||
config = tf.ConfigProto(
|
||||
allow_soft_placement=True,
|
||||
allow_soft_placement=True,
|
||||
inter_op_parallelism_threads=num_cpu,
|
||||
intra_op_parallelism_threads=num_cpu)
|
||||
config.gpu_options.allow_growth = True
|
||||
@@ -328,7 +328,7 @@ def save_state(fname, sess=None):
|
||||
def save_variables(save_path, variables=None, sess=None):
|
||||
sess = sess or get_session()
|
||||
variables = variables or tf.trainable_variables()
|
||||
|
||||
|
||||
ps = sess.run(variables)
|
||||
save_dict = {v.name: value for v, value in zip(variables, ps)}
|
||||
os.makedirs(os.path.dirname(save_path), exist_ok=True)
|
||||
@@ -354,10 +354,10 @@ def adjust_shape(placeholder, data):
|
||||
If shape is incompatible, AssertionError is thrown
|
||||
|
||||
Parameters:
|
||||
placeholder tensorflow input placeholder
|
||||
|
||||
placeholder tensorflow input placeholder
|
||||
|
||||
data input data to be (potentially) reshaped to be fed into placeholder
|
||||
|
||||
|
||||
Returns:
|
||||
reshaped data
|
||||
'''
|
||||
@@ -366,14 +366,14 @@ def adjust_shape(placeholder, data):
|
||||
return data
|
||||
if isinstance(data, list):
|
||||
data = np.array(data)
|
||||
|
||||
|
||||
placeholder_shape = [x or -1 for x in placeholder.shape.as_list()]
|
||||
|
||||
|
||||
assert _check_shape(placeholder_shape, data.shape), \
|
||||
'Shape of data {} is not compatible with shape of the placeholder {}'.format(data.shape, placeholder_shape)
|
||||
|
||||
return np.reshape(data, placeholder_shape)
|
||||
|
||||
return np.reshape(data, placeholder_shape)
|
||||
|
||||
|
||||
def _check_shape(placeholder_shape, data_shape):
|
||||
''' check if two shapes are compatible (i.e. differ only by dimensions of size 1, or by the batch dimension)'''
|
||||
@@ -381,7 +381,7 @@ def _check_shape(placeholder_shape, data_shape):
|
||||
return True
|
||||
squeezed_placeholder_shape = _squeeze_shape(placeholder_shape)
|
||||
squeezed_data_shape = _squeeze_shape(data_shape)
|
||||
|
||||
|
||||
for i, s_data in enumerate(squeezed_data_shape):
|
||||
s_placeholder = squeezed_placeholder_shape[i]
|
||||
if s_placeholder != -1 and s_data != s_placeholder:
|
||||
@@ -392,14 +392,26 @@ def _check_shape(placeholder_shape, data_shape):
|
||||
|
||||
def _squeeze_shape(shape):
|
||||
return [x for x in shape if x != 1]
|
||||
|
||||
|
||||
# ================================================================
|
||||
# Tensorboard interfacing
|
||||
# ================================================================
|
||||
|
||||
def launch_tensorboard_in_background(log_dir):
|
||||
from tensorboard import main as tb
|
||||
import threading
|
||||
tf.flags.FLAGS.logdir = log_dir
|
||||
t = threading.Thread(target=tb.main, args=([]))
|
||||
t.start()
|
||||
|
||||
'''
|
||||
To log the Tensorflow graph when using rl-algs
|
||||
algorithms, you can run the following code
|
||||
in your main script:
|
||||
import threading, time
|
||||
def start_tensorboard(session):
|
||||
time.sleep(10) # Wait until graph is setup
|
||||
tb_path = osp.join(logger.get_dir(), 'tb')
|
||||
summary_writer = tf.summary.FileWriter(tb_path, graph=session.graph)
|
||||
summary_op = tf.summary.merge_all()
|
||||
launch_tensorboard_in_background(tb_path)
|
||||
session = tf.get_default_session()
|
||||
t = threading.Thread(target=start_tensorboard, args=([session]))
|
||||
t.start()
|
||||
'''
|
||||
import subprocess
|
||||
subprocess.Popen(['tensorboard', '--logdir', log_dir])
|
||||
|
@@ -1,38 +1,45 @@
|
||||
from abc import ABC, abstractmethod
|
||||
from baselines import logger
|
||||
from baselines.common.tile_images import tile_images
|
||||
|
||||
class AlreadySteppingError(Exception):
|
||||
"""
|
||||
Raised when an asynchronous step is running while
|
||||
step_async() is called again.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
msg = 'already running an async step'
|
||||
Exception.__init__(self, msg)
|
||||
|
||||
|
||||
class NotSteppingError(Exception):
|
||||
"""
|
||||
Raised when an asynchronous step is not running but
|
||||
step_wait() is called.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
msg = 'not running an async step'
|
||||
Exception.__init__(self, msg)
|
||||
|
||||
|
||||
class VecEnv(ABC):
|
||||
"""
|
||||
An abstract asynchronous, vectorized environment.
|
||||
"""
|
||||
|
||||
def __init__(self, num_envs, observation_space, action_space):
|
||||
self.num_envs = num_envs
|
||||
self.observation_space = observation_space
|
||||
self.action_space = action_space
|
||||
self.closed = False
|
||||
self.viewer = None # For rendering
|
||||
|
||||
@abstractmethod
|
||||
def reset(self):
|
||||
"""
|
||||
Reset all the environments and return an array of
|
||||
observations, or a tuple of observation arrays.
|
||||
observations, or a dict of observation arrays.
|
||||
|
||||
If step_async is still doing work, that work will
|
||||
be cancelled and step_wait() should not be called
|
||||
@@ -58,7 +65,7 @@ class VecEnv(ABC):
|
||||
Wait for the step taken with step_async().
|
||||
|
||||
Returns (obs, rews, dones, infos):
|
||||
- obs: an array of observations, or a tuple of
|
||||
- obs: an array of observations, or a dict of
|
||||
arrays of observations.
|
||||
- rews: an array of rewards
|
||||
- dones: an array of "episode done" booleans
|
||||
@@ -66,19 +73,45 @@ class VecEnv(ABC):
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def close(self):
|
||||
def close_extras(self):
|
||||
"""
|
||||
Clean up the environments' resources.
|
||||
Clean up the extra resources, beyond what's in this base class.
|
||||
Only runs when not self.closed.
|
||||
"""
|
||||
pass
|
||||
|
||||
def close(self):
|
||||
if self.closed:
|
||||
return
|
||||
if self.viewer is not None:
|
||||
self.viewer.close()
|
||||
self.close_extras()
|
||||
self.closed = True
|
||||
|
||||
def step(self, actions):
|
||||
"""
|
||||
Step the environments synchronously.
|
||||
|
||||
This is available for backwards compatibility.
|
||||
"""
|
||||
self.step_async(actions)
|
||||
return self.step_wait()
|
||||
|
||||
def render(self, mode='human'):
|
||||
logger.warn('Render not defined for %s'%self)
|
||||
imgs = self.get_images()
|
||||
bigimg = tile_images(imgs)
|
||||
if mode == 'human':
|
||||
self.get_viewer().imshow(bigimg)
|
||||
elif mode == 'rgb_array':
|
||||
return bigimg
|
||||
else:
|
||||
raise NotImplementedError
|
||||
|
||||
def get_images(self):
|
||||
"""
|
||||
Return RGB images from each environment
|
||||
"""
|
||||
raise NotImplementedError
|
||||
|
||||
@property
|
||||
def unwrapped(self):
|
||||
@@ -87,13 +120,25 @@ class VecEnv(ABC):
|
||||
else:
|
||||
return self
|
||||
|
||||
def get_viewer(self):
|
||||
if self.viewer is None:
|
||||
from gym.envs.classic_control import rendering
|
||||
self.viewer = rendering.SimpleImageViewer()
|
||||
return self.viewer
|
||||
|
||||
|
||||
class VecEnvWrapper(VecEnv):
|
||||
"""
|
||||
An environment wrapper that applies to an entire batch
|
||||
of environments at once.
|
||||
"""
|
||||
|
||||
def __init__(self, venv, observation_space=None, action_space=None):
|
||||
self.venv = venv
|
||||
VecEnv.__init__(self,
|
||||
num_envs=venv.num_envs,
|
||||
observation_space=observation_space or venv.observation_space,
|
||||
action_space=action_space or venv.action_space)
|
||||
VecEnv.__init__(self,
|
||||
num_envs=venv.num_envs,
|
||||
observation_space=observation_space or venv.observation_space,
|
||||
action_space=action_space or venv.action_space)
|
||||
|
||||
def step_async(self, actions):
|
||||
self.venv.step_async(actions)
|
||||
@@ -109,18 +154,24 @@ class VecEnvWrapper(VecEnv):
|
||||
def close(self):
|
||||
return self.venv.close()
|
||||
|
||||
def render(self):
|
||||
self.venv.render()
|
||||
def render(self, mode='human'):
|
||||
return self.venv.render(mode=mode)
|
||||
|
||||
def get_images(self):
|
||||
return self.venv.get_images()
|
||||
|
||||
class CloudpickleWrapper(object):
|
||||
"""
|
||||
Uses cloudpickle to serialize contents (otherwise multiprocessing tries to use pickle)
|
||||
"""
|
||||
|
||||
def __init__(self, x):
|
||||
self.x = x
|
||||
|
||||
def __getstate__(self):
|
||||
import cloudpickle
|
||||
return cloudpickle.dumps(self.x)
|
||||
|
||||
def __setstate__(self, ob):
|
||||
import pickle
|
||||
self.x = pickle.loads(ob)
|
||||
|
@@ -1,28 +1,16 @@
|
||||
import numpy as np
|
||||
from gym import spaces
|
||||
from collections import OrderedDict
|
||||
from . import VecEnv
|
||||
from .util import copy_obs_dict, dict_to_obs, obs_space_info
|
||||
|
||||
class DummyVecEnv(VecEnv):
|
||||
def __init__(self, env_fns):
|
||||
self.envs = [fn() for fn in env_fns]
|
||||
env = self.envs[0]
|
||||
VecEnv.__init__(self, len(env_fns), env.observation_space, env.action_space)
|
||||
shapes, dtypes = {}, {}
|
||||
self.keys = []
|
||||
obs_space = env.observation_space
|
||||
|
||||
if isinstance(obs_space, spaces.Dict):
|
||||
assert isinstance(obs_space.spaces, OrderedDict)
|
||||
subspaces = obs_space.spaces
|
||||
else:
|
||||
subspaces = {None: obs_space}
|
||||
|
||||
for key, box in subspaces.items():
|
||||
shapes[key] = box.shape
|
||||
dtypes[key] = box.dtype
|
||||
self.keys.append(key)
|
||||
|
||||
|
||||
self.keys, shapes, dtypes = obs_space_info(obs_space)
|
||||
self.buf_obs = { k: np.zeros((self.num_envs,) + tuple(shapes[k]), dtype=dtypes[k]) for k in self.keys }
|
||||
self.buf_dones = np.zeros((self.num_envs,), dtype=np.bool)
|
||||
self.buf_rews = np.zeros((self.num_envs,), dtype=np.float32)
|
||||
@@ -53,7 +41,7 @@ class DummyVecEnv(VecEnv):
|
||||
if self.buf_dones[e]:
|
||||
obs = self.envs[e].reset()
|
||||
self._save_obs(e, obs)
|
||||
return (np.copy(self._obs_from_buf()), np.copy(self.buf_rews), np.copy(self.buf_dones),
|
||||
return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones),
|
||||
self.buf_infos.copy())
|
||||
|
||||
def reset(self):
|
||||
@@ -65,9 +53,6 @@ class DummyVecEnv(VecEnv):
|
||||
def close(self):
|
||||
return
|
||||
|
||||
def render(self, mode='human'):
|
||||
return [e.render(mode=mode) for e in self.envs]
|
||||
|
||||
def _save_obs(self, e, obs):
|
||||
for k in self.keys:
|
||||
if k is None:
|
||||
@@ -76,7 +61,8 @@ class DummyVecEnv(VecEnv):
|
||||
self.buf_obs[k][e] = obs[k]
|
||||
|
||||
def _obs_from_buf(self):
|
||||
if self.keys==[None]:
|
||||
return self.buf_obs[None]
|
||||
else:
|
||||
return self.buf_obs
|
||||
return dict_to_obs(copy_obs_dict(self.buf_obs))
|
||||
|
||||
def get_images(self):
|
||||
return [env.render(mode='rgb_array') for env in self.envs]
|
||||
|
||||
|
138
baselines/common/vec_env/shmem_vec_env.py
Normal file
138
baselines/common/vec_env/shmem_vec_env.py
Normal file
@@ -0,0 +1,138 @@
|
||||
"""
|
||||
An interface for asynchronous vectorized environments.
|
||||
"""
|
||||
|
||||
from multiprocessing import Pipe, Array, Process
|
||||
import numpy as np
|
||||
from . import VecEnv, CloudpickleWrapper
|
||||
import ctypes
|
||||
from baselines import logger
|
||||
|
||||
from .util import dict_to_obs, obs_space_info, obs_to_dict
|
||||
|
||||
_NP_TO_CT = {np.float32: ctypes.c_float,
|
||||
np.int32: ctypes.c_int32,
|
||||
np.int8: ctypes.c_int8,
|
||||
np.uint8: ctypes.c_char,
|
||||
np.bool: ctypes.c_bool}
|
||||
|
||||
|
||||
class ShmemVecEnv(VecEnv):
|
||||
"""
|
||||
An AsyncEnv that uses multiprocessing to run multiple
|
||||
environments in parallel.
|
||||
"""
|
||||
|
||||
def __init__(self, env_fns, spaces=None):
|
||||
"""
|
||||
If you don't specify observation_space, we'll have to create a dummy
|
||||
environment to get it.
|
||||
"""
|
||||
if spaces:
|
||||
observation_space, action_space = spaces
|
||||
else:
|
||||
logger.log('Creating dummy env object to get spaces')
|
||||
with logger.scoped_configure(format_strs=[]):
|
||||
dummy = env_fns[0]()
|
||||
observation_space, action_space = dummy.observation_space, dummy.action_space
|
||||
dummy.close()
|
||||
del dummy
|
||||
VecEnv.__init__(self, len(env_fns), observation_space, action_space)
|
||||
self.obs_keys, self.obs_shapes, self.obs_dtypes = obs_space_info(observation_space)
|
||||
self.obs_bufs = [
|
||||
{k: Array(_NP_TO_CT[self.obs_dtypes[k].type], int(np.prod(self.obs_shapes[k]))) for k in self.obs_keys}
|
||||
for _ in env_fns]
|
||||
self.parent_pipes = []
|
||||
self.procs = []
|
||||
for env_fn, obs_buf in zip(env_fns, self.obs_bufs):
|
||||
wrapped_fn = CloudpickleWrapper(env_fn)
|
||||
parent_pipe, child_pipe = Pipe()
|
||||
proc = Process(target=_subproc_worker,
|
||||
args=(child_pipe, parent_pipe, wrapped_fn, obs_buf, self.obs_shapes, self.obs_dtypes, self.obs_keys))
|
||||
proc.daemon = True
|
||||
self.procs.append(proc)
|
||||
self.parent_pipes.append(parent_pipe)
|
||||
proc.start()
|
||||
child_pipe.close()
|
||||
self.waiting_step = False
|
||||
self.viewer = None
|
||||
|
||||
def reset(self):
|
||||
if self.waiting_step:
|
||||
logger.warn('Called reset() while waiting for the step to complete')
|
||||
self.step_wait()
|
||||
for pipe in self.parent_pipes:
|
||||
pipe.send(('reset', None))
|
||||
return self._decode_obses([pipe.recv() for pipe in self.parent_pipes])
|
||||
|
||||
def step_async(self, actions):
|
||||
assert len(actions) == len(self.parent_pipes)
|
||||
for pipe, act in zip(self.parent_pipes, actions):
|
||||
pipe.send(('step', act))
|
||||
|
||||
def step_wait(self):
|
||||
outs = [pipe.recv() for pipe in self.parent_pipes]
|
||||
obs, rews, dones, infos = zip(*outs)
|
||||
return self._decode_obses(obs), np.array(rews), np.array(dones), infos
|
||||
|
||||
def close_extras(self):
|
||||
if self.waiting_step:
|
||||
self.step_wait()
|
||||
for pipe in self.parent_pipes:
|
||||
pipe.send(('close', None))
|
||||
for pipe in self.parent_pipes:
|
||||
pipe.recv()
|
||||
pipe.close()
|
||||
for proc in self.procs:
|
||||
proc.join()
|
||||
|
||||
def get_images(self, mode='human'):
|
||||
for pipe in self.parent_pipes:
|
||||
pipe.send(('render', None))
|
||||
return [pipe.recv() for pipe in self.parent_pipes]
|
||||
|
||||
def _decode_obses(self, obs):
|
||||
result = {}
|
||||
for k in self.obs_keys:
|
||||
|
||||
bufs = [b[k] for b in self.obs_bufs]
|
||||
o = [np.frombuffer(b.get_obj(), dtype=self.obs_dtypes[k]).reshape(self.obs_shapes[k]) for b in bufs]
|
||||
result[k] = np.array(o)
|
||||
return dict_to_obs(result)
|
||||
|
||||
|
||||
def _subproc_worker(pipe, parent_pipe, env_fn_wrapper, obs_bufs, obs_shapes, obs_dtypes, keys):
|
||||
"""
|
||||
Control a single environment instance using IPC and
|
||||
shared memory.
|
||||
"""
|
||||
def _write_obs(maybe_dict_obs):
|
||||
flatdict = obs_to_dict(maybe_dict_obs)
|
||||
for k in keys:
|
||||
dst = obs_bufs[k].get_obj()
|
||||
dst_np = np.frombuffer(dst, dtype=obs_dtypes[k]).reshape(obs_shapes[k]) # pylint: disable=W0212
|
||||
np.copyto(dst_np, flatdict[k])
|
||||
|
||||
env = env_fn_wrapper.x()
|
||||
parent_pipe.close()
|
||||
try:
|
||||
while True:
|
||||
cmd, data = pipe.recv()
|
||||
if cmd == 'reset':
|
||||
pipe.send(_write_obs(env.reset()))
|
||||
elif cmd == 'step':
|
||||
obs, reward, done, info = env.step(data)
|
||||
if done:
|
||||
obs = env.reset()
|
||||
pipe.send((_write_obs(obs), reward, done, info))
|
||||
elif cmd == 'render':
|
||||
pipe.send(env.render(mode='rgb_array'))
|
||||
elif cmd == 'close':
|
||||
pipe.send(None)
|
||||
break
|
||||
else:
|
||||
raise RuntimeError('Got unrecognized cmd %s' % cmd)
|
||||
except KeyboardInterrupt:
|
||||
print('ShmemVecEnv worker: got KeyboardInterrupt')
|
||||
finally:
|
||||
env.close()
|
@@ -1,8 +1,6 @@
|
||||
import numpy as np
|
||||
from multiprocessing import Process, Pipe
|
||||
from baselines.common.vec_env import VecEnv, CloudpickleWrapper
|
||||
from baselines.common.tile_images import tile_images
|
||||
|
||||
from . import VecEnv, CloudpickleWrapper
|
||||
|
||||
def worker(remote, parent_remote, env_fn_wrapper):
|
||||
parent_remote.close()
|
||||
@@ -32,25 +30,26 @@ def worker(remote, parent_remote, env_fn_wrapper):
|
||||
finally:
|
||||
env.close()
|
||||
|
||||
|
||||
class SubprocVecEnv(VecEnv):
|
||||
def __init__(self, env_fns, spaces=None):
|
||||
"""
|
||||
envs: list of gym environments to run in subprocesses
|
||||
"""
|
||||
self.waiting = False
|
||||
self.closed = False
|
||||
nenvs = len(env_fns)
|
||||
self.remotes, self.work_remotes = zip(*[Pipe() for _ in range(nenvs)])
|
||||
self.ps = [Process(target=worker, args=(work_remote, remote, CloudpickleWrapper(env_fn)))
|
||||
for (work_remote, remote, env_fn) in zip(self.work_remotes, self.remotes, env_fns)]
|
||||
for (work_remote, remote, env_fn) in zip(self.work_remotes, self.remotes, env_fns)]
|
||||
for p in self.ps:
|
||||
p.daemon = True # if the main process crashes, we should not cause things to hang
|
||||
p.daemon = True # if the main process crashes, we should not cause things to hang
|
||||
p.start()
|
||||
for remote in self.work_remotes:
|
||||
remote.close()
|
||||
|
||||
self.remotes[0].send(('get_spaces', None))
|
||||
observation_space, action_space = self.remotes[0].recv()
|
||||
self.viewer = None
|
||||
VecEnv.__init__(self, len(env_fns), observation_space, action_space)
|
||||
|
||||
def step_async(self, actions):
|
||||
@@ -69,33 +68,17 @@ class SubprocVecEnv(VecEnv):
|
||||
remote.send(('reset', None))
|
||||
return np.stack([remote.recv() for remote in self.remotes])
|
||||
|
||||
def reset_task(self):
|
||||
for remote in self.remotes:
|
||||
remote.send(('reset_task', None))
|
||||
return np.stack([remote.recv() for remote in self.remotes])
|
||||
|
||||
def close(self):
|
||||
if self.closed:
|
||||
return
|
||||
def close_extras(self):
|
||||
if self.waiting:
|
||||
for remote in self.remotes:
|
||||
for remote in self.remotes:
|
||||
remote.recv()
|
||||
for remote in self.remotes:
|
||||
remote.send(('close', None))
|
||||
for p in self.ps:
|
||||
p.join()
|
||||
self.closed = True
|
||||
|
||||
def render(self, mode='human'):
|
||||
def get_images(self):
|
||||
for pipe in self.remotes:
|
||||
pipe.send(('render', None))
|
||||
imgs = [pipe.recv() for pipe in self.remotes]
|
||||
bigimg = tile_images(imgs)
|
||||
if mode == 'human':
|
||||
import cv2
|
||||
cv2.imshow('vecenv', bigimg[:,:,::-1])
|
||||
cv2.waitKey(1)
|
||||
elif mode == 'rgb_array':
|
||||
return bigimg
|
||||
else:
|
||||
raise NotImplementedError
|
||||
return imgs
|
||||
|
101
baselines/common/vec_env/test_vec_env.py
Normal file
101
baselines/common/vec_env/test_vec_env.py
Normal file
@@ -0,0 +1,101 @@
|
||||
"""
|
||||
Tests for asynchronous vectorized environments.
|
||||
"""
|
||||
|
||||
import gym
|
||||
import numpy as np
|
||||
import pytest
|
||||
from .dummy_vec_env import DummyVecEnv
|
||||
from .shmem_vec_env import ShmemVecEnv
|
||||
from .subproc_vec_env import SubprocVecEnv
|
||||
|
||||
|
||||
def assert_envs_equal(env1, env2, num_steps):
|
||||
"""
|
||||
Compare two environments over num_steps steps and make sure
|
||||
that the observations produced by each are the same when given
|
||||
the same actions.
|
||||
"""
|
||||
assert env1.num_envs == env2.num_envs
|
||||
assert env1.action_space.shape == env2.action_space.shape
|
||||
assert env1.action_space.dtype == env2.action_space.dtype
|
||||
joint_shape = (env1.num_envs,) + env1.action_space.shape
|
||||
|
||||
try:
|
||||
obs1, obs2 = env1.reset(), env2.reset()
|
||||
assert np.array(obs1).shape == np.array(obs2).shape
|
||||
assert np.array(obs1).shape == joint_shape
|
||||
assert np.allclose(obs1, obs2)
|
||||
np.random.seed(1337)
|
||||
for _ in range(num_steps):
|
||||
actions = np.array(np.random.randint(0, 0x100, size=joint_shape),
|
||||
dtype=env1.action_space.dtype)
|
||||
for env in [env1, env2]:
|
||||
env.step_async(actions)
|
||||
outs1 = env1.step_wait()
|
||||
outs2 = env2.step_wait()
|
||||
for out1, out2 in zip(outs1[:3], outs2[:3]):
|
||||
assert np.array(out1).shape == np.array(out2).shape
|
||||
assert np.allclose(out1, out2)
|
||||
assert list(outs1[3]) == list(outs2[3])
|
||||
finally:
|
||||
env1.close()
|
||||
env2.close()
|
||||
|
||||
|
||||
@pytest.mark.parametrize('klass', (ShmemVecEnv, SubprocVecEnv))
|
||||
@pytest.mark.parametrize('dtype', ('uint8', 'float32'))
|
||||
def test_vec_env(klass, dtype): # pylint: disable=R0914
|
||||
"""
|
||||
Test that a vectorized environment is equivalent to
|
||||
DummyVecEnv, since DummyVecEnv is less likely to be
|
||||
error prone.
|
||||
"""
|
||||
num_envs = 3
|
||||
num_steps = 100
|
||||
shape = (3, 8)
|
||||
|
||||
def make_fn(seed):
|
||||
"""
|
||||
Get an environment constructor with a seed.
|
||||
"""
|
||||
return lambda: SimpleEnv(seed, shape, dtype)
|
||||
fns = [make_fn(i) for i in range(num_envs)]
|
||||
env1 = DummyVecEnv(fns)
|
||||
env2 = klass(fns)
|
||||
assert_envs_equal(env1, env2, num_steps=num_steps)
|
||||
|
||||
|
||||
class SimpleEnv(gym.Env):
|
||||
"""
|
||||
An environment with a pre-determined observation space
|
||||
and RNG seed.
|
||||
"""
|
||||
|
||||
def __init__(self, seed, shape, dtype):
|
||||
np.random.seed(seed)
|
||||
self._dtype = dtype
|
||||
self._start_obs = np.array(np.random.randint(0, 0x100, size=shape),
|
||||
dtype=dtype)
|
||||
self._max_steps = seed + 1
|
||||
self._cur_obs = None
|
||||
self._cur_step = 0
|
||||
# this is 0xFF instead of 0x100 because the Box space includes
|
||||
# the high end, while randint does not
|
||||
self.action_space = gym.spaces.Box(low=0, high=0xFF, shape=shape, dtype=dtype)
|
||||
self.observation_space = self.action_space
|
||||
|
||||
def step(self, action):
|
||||
self._cur_obs += np.array(action, dtype=self._dtype)
|
||||
self._cur_step += 1
|
||||
done = self._cur_step >= self._max_steps
|
||||
reward = self._cur_step / self._max_steps
|
||||
return self._cur_obs, reward, done, {'foo': 'bar' + str(reward)}
|
||||
|
||||
def reset(self):
|
||||
self._cur_obs = self._start_obs
|
||||
self._cur_step = 0
|
||||
return self._cur_obs
|
||||
|
||||
def render(self, mode=None):
|
||||
raise NotImplementedError
|
59
baselines/common/vec_env/util.py
Normal file
59
baselines/common/vec_env/util.py
Normal file
@@ -0,0 +1,59 @@
|
||||
"""
|
||||
Helpers for dealing with vectorized environments.
|
||||
"""
|
||||
|
||||
from collections import OrderedDict
|
||||
|
||||
import gym
|
||||
import numpy as np
|
||||
|
||||
|
||||
def copy_obs_dict(obs):
|
||||
"""
|
||||
Deep-copy an observation dict.
|
||||
"""
|
||||
return {k: np.copy(v) for k, v in obs.items()}
|
||||
|
||||
|
||||
def dict_to_obs(obs_dict):
|
||||
"""
|
||||
Convert an observation dict into a raw array if the
|
||||
original observation space was not a Dict space.
|
||||
"""
|
||||
if set(obs_dict.keys()) == {None}:
|
||||
return obs_dict[None]
|
||||
return obs_dict
|
||||
|
||||
|
||||
def obs_space_info(obs_space):
|
||||
"""
|
||||
Get dict-structured information about a gym.Space.
|
||||
|
||||
Returns:
|
||||
A tuple (keys, shapes, dtypes):
|
||||
keys: a list of dict keys.
|
||||
shapes: a dict mapping keys to shapes.
|
||||
dtypes: a dict mapping keys to dtypes.
|
||||
"""
|
||||
if isinstance(obs_space, gym.spaces.Dict):
|
||||
assert isinstance(obs_space.spaces, OrderedDict)
|
||||
subspaces = obs_space.spaces
|
||||
else:
|
||||
subspaces = {None: obs_space}
|
||||
keys = []
|
||||
shapes = {}
|
||||
dtypes = {}
|
||||
for key, box in subspaces.items():
|
||||
keys.append(key)
|
||||
shapes[key] = box.shape
|
||||
dtypes[key] = box.dtype
|
||||
return keys, shapes, dtypes
|
||||
|
||||
|
||||
def obs_to_dict(obs):
|
||||
"""
|
||||
Convert an observation into a dict.
|
||||
"""
|
||||
if isinstance(obs, dict):
|
||||
return obs
|
||||
return {None: obs}
|
@@ -1,18 +1,16 @@
|
||||
from baselines.common.vec_env import VecEnvWrapper
|
||||
from . import VecEnvWrapper
|
||||
import numpy as np
|
||||
from gym import spaces
|
||||
|
||||
|
||||
class VecFrameStack(VecEnvWrapper):
|
||||
"""
|
||||
Vectorized environment base class
|
||||
"""
|
||||
def __init__(self, venv, nstack):
|
||||
self.venv = venv
|
||||
self.nstack = nstack
|
||||
wos = venv.observation_space # wrapped ob space
|
||||
wos = venv.observation_space # wrapped ob space
|
||||
low = np.repeat(wos.low, self.nstack, axis=-1)
|
||||
high = np.repeat(wos.high, self.nstack, axis=-1)
|
||||
self.stackedobs = np.zeros((venv.num_envs,)+low.shape, low.dtype)
|
||||
self.stackedobs = np.zeros((venv.num_envs,) + low.shape, low.dtype)
|
||||
observation_space = spaces.Box(low=low, high=high, dtype=venv.observation_space.dtype)
|
||||
VecEnvWrapper.__init__(self, venv, observation_space=observation_space)
|
||||
|
||||
@@ -26,9 +24,6 @@ class VecFrameStack(VecEnvWrapper):
|
||||
return self.stackedobs, rews, news, infos
|
||||
|
||||
def reset(self):
|
||||
"""
|
||||
Reset all environments
|
||||
"""
|
||||
obs = self.venv.reset()
|
||||
self.stackedobs[...] = 0
|
||||
self.stackedobs[..., -obs.shape[-1]:] = obs
|
||||
|
29
baselines/common/vec_env/vec_monitor.py
Normal file
29
baselines/common/vec_env/vec_monitor.py
Normal file
@@ -0,0 +1,29 @@
|
||||
from . import VecEnvWrapper
|
||||
import numpy as np
|
||||
|
||||
|
||||
class VecMonitor(VecEnvWrapper):
|
||||
def __init__(self, venv):
|
||||
VecEnvWrapper.__init__(self, venv)
|
||||
self.eprets = None
|
||||
self.eplens = None
|
||||
|
||||
def reset(self):
|
||||
obs = self.venv.reset()
|
||||
self.eprets = np.zeros(self.num_envs, 'f')
|
||||
self.eplens = np.zeros(self.num_envs, 'i')
|
||||
return obs
|
||||
|
||||
def step_wait(self):
|
||||
obs, rews, dones, infos = self.venv.step_wait()
|
||||
self.eprets += rews
|
||||
self.eplens += 1
|
||||
newinfos = []
|
||||
for (i, (done, ret, eplen, info)) in enumerate(zip(dones, self.eprets, self.eplens, infos)):
|
||||
info = info.copy()
|
||||
if done:
|
||||
info['episode'] = {'r': ret, 'l': eplen}
|
||||
self.eprets[i] = 0
|
||||
self.eplens[i] = 0
|
||||
newinfos.append(info)
|
||||
return obs, rews, dones, newinfos
|
@@ -1,17 +1,18 @@
|
||||
from baselines.common.vec_env import VecEnvWrapper
|
||||
from . import VecEnvWrapper
|
||||
from baselines.common.running_mean_std import RunningMeanStd
|
||||
import numpy as np
|
||||
|
||||
|
||||
class VecNormalize(VecEnvWrapper):
|
||||
"""
|
||||
Vectorized environment base class
|
||||
A vectorized wrapper that normalizes the observations
|
||||
and returns from an environment.
|
||||
"""
|
||||
|
||||
def __init__(self, venv, ob=True, ret=True, clipob=10., cliprew=10., gamma=0.99, epsilon=1e-8):
|
||||
VecEnvWrapper.__init__(self, venv)
|
||||
self.ob_rms = RunningMeanStd(shape=self.observation_space.shape) if ob else None
|
||||
self.ret_rms = RunningMeanStd(shape=()) if ret else None
|
||||
#self.ob_rms = TfRunningMeanStd(shape=self.observation_space.shape, scope='observation_running_mean_std') if ob else None
|
||||
#self.ret_rms = TfRunningMeanStd(shape=(), scope='return_running_mean_std') if ret else None
|
||||
self.clipob = clipob
|
||||
self.cliprew = cliprew
|
||||
self.ret = np.zeros(self.num_envs)
|
||||
@@ -19,12 +20,6 @@ class VecNormalize(VecEnvWrapper):
|
||||
self.epsilon = epsilon
|
||||
|
||||
def step_wait(self):
|
||||
"""
|
||||
Apply sequence of actions to sequence of environments
|
||||
actions -> (observations, rewards, news)
|
||||
|
||||
where 'news' is a boolean vector indicating whether each element is new.
|
||||
"""
|
||||
obs, rews, news, infos = self.venv.step_wait()
|
||||
self.ret = self.ret * self.gamma + rews
|
||||
obs = self._obfilt(obs)
|
||||
@@ -42,8 +37,5 @@ class VecNormalize(VecEnvWrapper):
|
||||
return obs
|
||||
|
||||
def reset(self):
|
||||
"""
|
||||
Reset all environments
|
||||
"""
|
||||
obs = self.venv.reset()
|
||||
return self._obfilt(obs)
|
||||
|
@@ -11,7 +11,7 @@ Here's a list of commands to run to quickly get a working example:
|
||||
# Train model and save the results to cartpole_model.pkl
|
||||
python -m baselines.run --alg=deepq --env=CartPole-v0 --save_path=./cartpole_model.pkl --num_timesteps=1e5
|
||||
# Load the model saved in cartpole_model.pkl and visualize the learned policy
|
||||
python -m baselines.run --alg=deepq --env=CartPole-v0 --load_apth=./cartpole_model.pkl --num_timesteps=0 --play
|
||||
python -m baselines.run --alg=deepq --env=CartPole-v0 --load_path=./cartpole_model.pkl --num_timesteps=0 --play
|
||||
```
|
||||
|
||||
## If you wish to apply DQN to solve a problem.
|
||||
|
@@ -163,7 +163,7 @@ def learn(*, network, env, total_timesteps, seed=None, nsteps=2048, ent_coef=0.0
|
||||
specifying the standard network architecture, or a function that takes tensorflow tensor as input and returns
|
||||
tuple (output_tensor, extra_feed) where output tensor is the last network layer output, extra_feed is None for feed-forward
|
||||
neural nets, and extra_feed is a dictionary describing how to feed state into the network for recurrent neural nets.
|
||||
See baselines.common/policies.py/lstm for more details on using recurrent nets in policies
|
||||
See common/models.py/lstm for more details on using recurrent nets in policies
|
||||
|
||||
env: baselines.common.vec_env.VecEnv environment. Needs to be vectorized for parallel environment simulation.
|
||||
The environments produced by gym.make can be wrapped using baselines.common.vec_env.DummyVecEnv class.
|
||||
@@ -189,7 +189,8 @@ def learn(*, network, env, total_timesteps, seed=None, nsteps=2048, ent_coef=0.0
|
||||
|
||||
log_interval: int number of timesteps between logging events
|
||||
|
||||
nminibatches: int number of training minibatches per update
|
||||
nminibatches: int number of training minibatches per update. For recurrent policies,
|
||||
should be smaller or equal than number of environments run in parallel.
|
||||
|
||||
noptepochs: int number of training epochs per update
|
||||
|
||||
@@ -226,10 +227,6 @@ def learn(*, network, env, total_timesteps, seed=None, nsteps=2048, ent_coef=0.0
|
||||
make_model = lambda : Model(policy=policy, ob_space=ob_space, ac_space=ac_space, nbatch_act=nenvs, nbatch_train=nbatch_train,
|
||||
nsteps=nsteps, ent_coef=ent_coef, vf_coef=vf_coef,
|
||||
max_grad_norm=max_grad_norm)
|
||||
if save_interval and logger.get_dir():
|
||||
import cloudpickle
|
||||
with open(osp.join(logger.get_dir(), 'make_model.pkl'), 'wb') as fh:
|
||||
fh.write(cloudpickle.dumps(make_model))
|
||||
model = make_model()
|
||||
if load_path is not None:
|
||||
model.load(load_path)
|
||||
|
@@ -1,20 +1,19 @@
|
||||
import sys
|
||||
import multiprocessing
|
||||
import os
|
||||
import multiprocessing
|
||||
import os.path as osp
|
||||
import gym
|
||||
from collections import defaultdict
|
||||
import tensorflow as tf
|
||||
import numpy as np
|
||||
|
||||
from baselines.common.vec_env.vec_frame_stack import VecFrameStack
|
||||
from baselines.common.cmd_util import common_arg_parser, parse_unknown_args, make_mujoco_env, make_atari_env
|
||||
from baselines.common.tf_util import save_state, load_state, get_session
|
||||
from baselines.common.cmd_util import common_arg_parser, parse_unknown_args, make_vec_env
|
||||
from baselines.common.tf_util import get_session
|
||||
from baselines import bench, logger
|
||||
from importlib import import_module
|
||||
|
||||
from baselines.common.vec_env.vec_normalize import VecNormalize
|
||||
from baselines.common.vec_env.dummy_vec_env import DummyVecEnv
|
||||
from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv
|
||||
from baselines.common import atari_wrappers, retro_wrappers
|
||||
|
||||
try:
|
||||
@@ -28,10 +27,10 @@ for env in gym.envs.registry.all():
|
||||
env_type = env._entry_point.split(':')[0].split('.')[-1]
|
||||
_game_envs[env_type].add(env.id)
|
||||
|
||||
# reading benchmark names directly from retro requires
|
||||
# importing retro here, and for some reason that crashes tensorflow
|
||||
# in ubuntu
|
||||
_game_envs['retro'] = set([
|
||||
# reading benchmark names directly from retro requires
|
||||
# importing retro here, and for some reason that crashes tensorflow
|
||||
# in ubuntu
|
||||
_game_envs['retro'] = {
|
||||
'BubbleBobble-Nes',
|
||||
'SuperMarioBros-Nes',
|
||||
'TwinBee3PokoPokoDaimaou-Nes',
|
||||
@@ -40,12 +39,12 @@ _game_envs['retro'] = set([
|
||||
'Vectorman-Genesis',
|
||||
'FinalFight-Snes',
|
||||
'SpaceInvaders-Snes',
|
||||
])
|
||||
}
|
||||
|
||||
|
||||
def train(args, extra_args):
|
||||
env_type, env_id = get_env_type(args.env)
|
||||
|
||||
|
||||
total_timesteps = int(args.num_timesteps)
|
||||
seed = args.seed
|
||||
|
||||
@@ -60,13 +59,11 @@ def train(args, extra_args):
|
||||
else:
|
||||
if alg_kwargs.get('network') is None:
|
||||
alg_kwargs['network'] = get_default_network(env_type)
|
||||
|
||||
|
||||
|
||||
|
||||
print('Training {} on {}:{} with arguments \n{}'.format(args.alg, env_type, env_id, alg_kwargs))
|
||||
|
||||
model = learn(
|
||||
env=env,
|
||||
env=env,
|
||||
seed=seed,
|
||||
total_timesteps=total_timesteps,
|
||||
**alg_kwargs
|
||||
@@ -75,30 +72,30 @@ def train(args, extra_args):
|
||||
return model, env
|
||||
|
||||
|
||||
def build_env(args, render=False):
|
||||
def build_env(args):
|
||||
ncpu = multiprocessing.cpu_count()
|
||||
if sys.platform == 'darwin': ncpu //= 2
|
||||
nenv = args.num_env or ncpu if not render else 1
|
||||
nenv = args.num_env or ncpu
|
||||
alg = args.alg
|
||||
rank = MPI.COMM_WORLD.Get_rank() if MPI else 0
|
||||
seed = args.seed
|
||||
seed = args.seed
|
||||
|
||||
env_type, env_id = get_env_type(args.env)
|
||||
if env_type == 'mujoco':
|
||||
get_session(tf.ConfigProto(allow_soft_placement=True,
|
||||
intra_op_parallelism_threads=1,
|
||||
intra_op_parallelism_threads=1,
|
||||
inter_op_parallelism_threads=1))
|
||||
|
||||
if args.num_env:
|
||||
env = SubprocVecEnv([lambda: make_mujoco_env(env_id, seed + i if seed is not None else None, args.reward_scale) for i in range(args.num_env)])
|
||||
env = make_vec_env(env_id, env_type, nenv, seed, reward_scale=args.reward_scale)
|
||||
else:
|
||||
env = DummyVecEnv([lambda: make_mujoco_env(env_id, seed, args.reward_scale)])
|
||||
env = make_vec_env(env_id, env_type, 1, seed, reward_scale=args.reward_scale)
|
||||
|
||||
env = VecNormalize(env)
|
||||
|
||||
elif env_type == 'atari':
|
||||
if alg == 'acer':
|
||||
env = make_atari_env(env_id, nenv, seed)
|
||||
env = make_vec_env(env_id, env_type, nenv, seed)
|
||||
elif alg == 'deepq':
|
||||
env = atari_wrappers.make_atari(env_id)
|
||||
env.seed(seed)
|
||||
@@ -113,23 +110,24 @@ def build_env(args, render=False):
|
||||
env.seed(seed)
|
||||
else:
|
||||
frame_stack_size = 4
|
||||
env = VecFrameStack(make_atari_env(env_id, nenv, seed), frame_stack_size)
|
||||
env = VecFrameStack(make_vec_env(env_id, env_type, nenv, seed), frame_stack_size)
|
||||
|
||||
elif env_type == 'retro':
|
||||
import retro
|
||||
gamestate = args.gamestate or 'Level1-1'
|
||||
env = retro_wrappers.make_retro(game=args.env, state=gamestate, max_episode_steps=10000, use_restricted_actions=retro.Actions.DISCRETE)
|
||||
env = retro_wrappers.make_retro(game=args.env, state=gamestate, max_episode_steps=10000,
|
||||
use_restricted_actions=retro.Actions.DISCRETE)
|
||||
env.seed(args.seed)
|
||||
env = bench.Monitor(env, logger.get_dir())
|
||||
env = retro_wrappers.wrap_deepmind_retro(env)
|
||||
|
||||
|
||||
elif env_type == 'classic_control':
|
||||
def make_env():
|
||||
e = gym.make(env_id)
|
||||
e = bench.Monitor(e, logger.get_dir(), allow_early_resets=True)
|
||||
e.seed(seed)
|
||||
return e
|
||||
|
||||
|
||||
env = DummyVecEnv([make_env])
|
||||
|
||||
else:
|
||||
@@ -141,17 +139,18 @@ def build_env(args, render=False):
|
||||
def get_env_type(env_id):
|
||||
if env_id in _game_envs.keys():
|
||||
env_type = env_id
|
||||
env_id = [g for g in _game_envs[env_type]][0]
|
||||
env_id = [g for g in _game_envs[env_type]][0]
|
||||
else:
|
||||
env_type = None
|
||||
for g, e in _game_envs.items():
|
||||
if env_id in e:
|
||||
env_type = g
|
||||
break
|
||||
break
|
||||
assert env_type is not None, 'env_id {} is not recognized in env types'.format(env_id, _game_envs.keys())
|
||||
|
||||
return env_type, env_id
|
||||
|
||||
|
||||
def get_default_network(env_type):
|
||||
if env_type == 'mujoco' or env_type == 'classic_control':
|
||||
return 'mlp'
|
||||
@@ -159,7 +158,8 @@ def get_default_network(env_type):
|
||||
return 'cnn'
|
||||
|
||||
raise ValueError('Unknown env_type {}'.format(env_type))
|
||||
|
||||
|
||||
|
||||
def get_alg_module(alg, submodule=None):
|
||||
submodule = submodule or alg
|
||||
try:
|
||||
@@ -168,46 +168,47 @@ def get_alg_module(alg, submodule=None):
|
||||
except ImportError:
|
||||
# then from rl_algs
|
||||
alg_module = import_module('.'.join(['rl_' + 'algs', alg, submodule]))
|
||||
|
||||
|
||||
return alg_module
|
||||
|
||||
|
||||
|
||||
def get_learn_function(alg):
|
||||
return get_alg_module(alg).learn
|
||||
|
||||
|
||||
def get_learn_function_defaults(alg, env_type):
|
||||
try:
|
||||
alg_defaults = get_alg_module(alg, 'defaults')
|
||||
kwargs = getattr(alg_defaults, env_type)()
|
||||
except (ImportError, AttributeError):
|
||||
kwargs = {}
|
||||
kwargs = {}
|
||||
return kwargs
|
||||
|
||||
def parse(v):
|
||||
|
||||
|
||||
def parse(v):
|
||||
'''
|
||||
convert value of a command-line arg to a python object if possible, othewise, keep as string
|
||||
'''
|
||||
|
||||
assert isinstance(v, str)
|
||||
try:
|
||||
return eval(v)
|
||||
except (NameError, SyntaxError):
|
||||
return eval(v)
|
||||
except (NameError, SyntaxError):
|
||||
return v
|
||||
|
||||
|
||||
def main():
|
||||
# configure logger, disable logging in child MPI processes (with rank > 0)
|
||||
|
||||
# configure logger, disable logging in child MPI processes (with rank > 0)
|
||||
|
||||
arg_parser = common_arg_parser()
|
||||
args, unknown_args = arg_parser.parse_known_args()
|
||||
extra_args = {k: parse(v) for k,v in parse_unknown_args(unknown_args).items()}
|
||||
extra_args = {k: parse(v) for k, v in parse_unknown_args(unknown_args).items()}
|
||||
|
||||
|
||||
if MPI is None or MPI.COMM_WORLD.Get_rank() == 0:
|
||||
rank = 0
|
||||
logger.configure()
|
||||
else:
|
||||
logger.configure(format_strs = [])
|
||||
logger.configure(format_strs=[])
|
||||
rank = MPI.COMM_WORLD.Get_rank()
|
||||
|
||||
model, _ = train(args, extra_args)
|
||||
@@ -215,19 +216,19 @@ def main():
|
||||
if args.save_path is not None and rank == 0:
|
||||
save_path = osp.expanduser(args.save_path)
|
||||
model.save(save_path)
|
||||
|
||||
|
||||
if args.play:
|
||||
logger.log("Running trained model")
|
||||
env = build_env(args, render=True)
|
||||
env = build_env(args)
|
||||
obs = env.reset()
|
||||
while True:
|
||||
actions = model.step(obs)[0]
|
||||
obs, _, done, _ = env.step(actions)
|
||||
obs, _, done, _ = env.step(actions)
|
||||
env.render()
|
||||
done = done.any() if isinstance(done, np.ndarray) else done
|
||||
|
||||
if done:
|
||||
obs = env.reset()
|
||||
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
@@ -1,4 +1,4 @@
|
||||
from rl_common.models import mlp, cnn_small
|
||||
from baselines.common.models import mlp, cnn_small
|
||||
|
||||
|
||||
def atari():
|
||||
|
19
conftest.py
19
conftest.py
@@ -1,19 +0,0 @@
|
||||
import pytest
|
||||
|
||||
|
||||
def pytest_addoption(parser):
|
||||
parser.addoption('--runslow', action='store_true', default=False, help='run slow tests')
|
||||
|
||||
|
||||
def pytest_collection_modifyitems(config, items):
|
||||
if config.getoption('--runslow'):
|
||||
# --runslow given in cli: do not skip slow tests
|
||||
return
|
||||
skip_slow = pytest.mark.skip(reason='need --runslow option to run')
|
||||
slow_tests = []
|
||||
for item in items:
|
||||
if 'slow' in item.keywords:
|
||||
slow_tests.append(item.name)
|
||||
item.add_marker(skip_slow)
|
||||
|
||||
print('skipping slow tests', ' '.join(slow_tests), 'use --runslow to run this')
|
15
setup.cfg
Normal file
15
setup.cfg
Normal file
@@ -0,0 +1,15 @@
|
||||
[flake8]
|
||||
select = F,E999
|
||||
exclude =
|
||||
.git,
|
||||
__pycache__,
|
||||
baselines/her,
|
||||
baselines/ddpg,
|
||||
baselines/ppo1,
|
||||
baselines/bench,
|
||||
baselines/acktr,
|
||||
|
||||
|
||||
|
||||
|
||||
|
31
setup.py
31
setup.py
@@ -6,6 +6,20 @@ if sys.version_info.major != 3:
|
||||
'Python {}. The installation will likely fail.'.format(sys.version_info.major))
|
||||
|
||||
|
||||
extras = {
|
||||
'test': [
|
||||
'filelock',
|
||||
'pytest'
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
all_deps = []
|
||||
for group_name in extras:
|
||||
all_deps += extras[group_name]
|
||||
|
||||
extras['all'] = all_deps
|
||||
|
||||
setup(name='baselines',
|
||||
packages=[package for package in find_packages()
|
||||
if package.startswith('baselines')],
|
||||
@@ -18,18 +32,21 @@ setup(name='baselines',
|
||||
'progressbar2',
|
||||
'mpi4py',
|
||||
'cloudpickle',
|
||||
'tensorflow>=1.4.0',
|
||||
'click',
|
||||
'opencv-python'
|
||||
],
|
||||
extras_require={
|
||||
'test': [
|
||||
'filelock',
|
||||
'pytest'
|
||||
]
|
||||
},
|
||||
extras_require=extras,
|
||||
description='OpenAI baselines: high quality implementations of reinforcement learning algorithms',
|
||||
author='OpenAI',
|
||||
url='https://github.com/openai/baselines',
|
||||
author_email='gym@openai.com',
|
||||
version='0.1.5')
|
||||
|
||||
|
||||
# ensure there is some tensorflow build with version above 1.4
|
||||
try:
|
||||
from distutils.version import StrictVersion
|
||||
import tensorflow
|
||||
assert StrictVersion(tensorflow.__version__) >= StrictVersion('1.4.0')
|
||||
except ImportError:
|
||||
assert False, "TensorFlow needed, of version above 1.4"
|
||||
|
Reference in New Issue
Block a user