updated links in README and notebook

replaced vizualization doc with notebook
Update viz.md
2018-11-07 16:23:32 -08:00 · 2018-11-07 16:18:47 -08:00 · 2018-11-07 14:25:35 -08:00 · 2018-11-06 17:02:20 -08:00 · 2018-11-06 15:25:17 -08:00 · 2018-11-06 14:07:53 -08:00
19 changed files with 1500 additions and 68 deletions
--- a/.travis.yml
+++ b/.travis.yml
@@ -11,4 +11,4 @@ install:
 script:
    - flake8 . --show-source --statistics
-    - docker run baselines-test pytest -v .
+    - docker run baselines-test pytest -v --forked .
--- a/15
+++ b/15
@@ -1,16 +1,9 @@
-FROM ubuntu:16.04
+FROM python:3.6
 RUN apt-get -y update && apt-get -y install ffmpeg
 # RUN apt-get -y update && apt-get -y install git wget python-dev python3-dev libopenmpi-dev python-pip zlib1g-dev cmake python-opencv
 RUN apt-get -y update && apt-get -y install git wget python-dev python3-dev libopenmpi-dev python-pip zlib1g-dev cmake python-opencv
 ENV CODE_DIR /root/code
 ENV VENV /root/venv
 RUN \
    pip install virtualenv && \
    virtualenv $VENV --python=python3 && \
    . $VENV/bin/activate && \
    pip install --upgrade pip
 ENV PATH=$VENV/bin:$PATH
 COPY . $CODE_DIR/baselines
 WORKDIR $CODE_DIR/baselines
--- a/README.md
+++ b/README.md
@@ -109,17 +109,9 @@ python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=0 --
 *NOTE:* At the moment Mujoco training uses VecNormalize wrapper for the environment which is not being saved correctly; so loading the models trained on Mujoco will not work well if the environment is recreated. If necessary, you can work around that by replacing RunningMeanStd by TfRunningMeanStd in [baselines/common/vec_env/vec_normalize.py](baselines/common/vec_env/vec_normalize.py#L12). This way, mean and std of environment normalizing wrapper will be saved in tensorflow variables and included in the model file; however, training is slower that way - hence not including it by default
 ## Loading and vizualizing learning curves and other training metrics
 See [here](docs/viz/viz.ipynb) for instructions on how to load and display the training data. 
 ## Using baselines with TensorBoard
 Baselines logger can save data in the TensorBoard format. To do so, set environment variables `OPENAI_LOG_FORMAT` and `OPENAI_LOGDIR`:
 ```bash
 export OPENAI_LOG_FORMAT='stdout,log,csv,tensorboard' # formats are comma-separated, but for tensorboard you only really need the last one
 export OPENAI_LOGDIR=path/to/tensorboard/data
 ```
 And you can now start TensorBoard with:
 ```bash
 tensorboard --logdir=$OPENAI_LOGDIR
 ```
 ## Subpackages
 - [A2C](baselines/a2c)
--- a/baselines/common/cmd_util.py
+++ b/baselines/common/cmd_util.py
@@ -60,12 +60,14 @@ def make_env(env_id, env_type, subrank=0, seed=None, reward_scale=1.0, gamestate
                  allow_early_resets=True)
    if env_type == 'atari':
-         return wrap_deepmind(env, **wrapper_kwargs)
+        env = wrap_deepmind(env, **wrapper_kwargs)
-    elif reward_scale != 1:
+    elif env_type == 'retro':
-         return retro_wrappers.RewardScaler(env, reward_scale)
+        env = retro_wrappers.wrap_deepmind_retro(env, **wrapper_kwargs)
    else:
        return env
    if reward_scale != 1:
        env = retro_wrappers.RewardScaler(env, reward_scale)
    return env
 def make_mujoco_env(env_id, seed, reward_scale=1.0):
@@ -129,6 +131,8 @@ def common_arg_parser():
    parser.add_argument('--num_env', help='Number of environment copies being run in parallel. When not specified, set to number of cpus for Atari, and to 1 for Mujoco', default=None, type=int)
    parser.add_argument('--reward_scale', help='Reward scale factor. Default: 1.0', default=1.0, type=float)
    parser.add_argument('--save_path', help='Path to save trained model to', default=None, type=str)
    parser.add_argument('--save_video_interval', help='Save video every x steps (0 = disabled)', default=0, type=int)
    parser.add_argument('--save_video_length', help='Length of recorded video. Default: 200', default=200, type=int)
    parser.add_argument('--play', default=False, action='store_true')
    return parser
--- a/baselines/common/mpi_adam.py
+++ b/baselines/common/mpi_adam.py
@@ -1,7 +1,11 @@
 from mpi4py import MPI
 import baselines.common.tf_util as U
 import tensorflow as tf
 import numpy as np
 try:
    from mpi4py import MPI
 except ImportError:
    MPI = None
 class MpiAdam(object):
    def __init__(self, var_list, *, beta1=0.9, beta2=0.999, epsilon=1e-08, scale_grad_by_procs=True, comm=None):
@@ -16,16 +20,19 @@ class MpiAdam(object):
        self.t = 0
        self.setfromflat = U.SetFromFlat(var_list)
        self.getflat = U.GetFlat(var_list)
-        self.comm = MPI.COMM_WORLD if comm is None else comm
+        self.comm = MPI.COMM_WORLD if comm is None and MPI is not None else comm
    def update(self, localg, stepsize):
        if self.t % 100 == 0:
            self.check_synced()
        localg = localg.astype('float32')
-        globalg = np.zeros_like(localg)
+        if self.comm is not None:
-        self.comm.Allreduce(localg, globalg, op=MPI.SUM)
+            globalg = np.zeros_like(localg)
-        if self.scale_grad_by_procs:
+            self.comm.Allreduce(localg, globalg, op=MPI.SUM)
-            globalg /= self.comm.Get_size()
+            if self.scale_grad_by_procs:
                globalg /= self.comm.Get_size()
        else:
            globalg = np.copy(localg)
        self.t += 1
        a = stepsize * np.sqrt(1 - self.beta2**self.t)/(1 - self.beta1**self.t)
@@ -35,11 +42,15 @@ class MpiAdam(object):
        self.setfromflat(self.getflat() + step)
    def sync(self):
        if self.comm is None:
            return
        theta = self.getflat()
        self.comm.Bcast(theta, root=0)
        self.setfromflat(theta)
    def check_synced(self):
        if self.comm is None:
            return
        if self.comm.Get_rank() == 0: # this is root
            theta = self.getflat()
            self.comm.Bcast(theta, root=0)
@@ -63,17 +74,30 @@ def test_MpiAdam():
    do_update = U.function([], loss, updates=[update_op])
    tf.get_default_session().run(tf.global_variables_initializer())
    losslist_ref = []
    for i in range(10):
-        print(i,do_update())
+        l = do_update()
        print(i, l)
        losslist_ref.append(l)
    tf.set_random_seed(0)
    tf.get_default_session().run(tf.global_variables_initializer())
    var_list = [a,b]
-    lossandgrad = U.function([], [loss, U.flatgrad(loss, var_list)], updates=[update_op])
+    lossandgrad = U.function([], [loss, U.flatgrad(loss, var_list)])
    adam = MpiAdam(var_list)
    losslist_test = []
    for i in range(10):
        l,g = lossandgrad()
        adam.update(g, stepsize)
        print(i,l)
        losslist_test.append(l)
    np.testing.assert_allclose(np.array(losslist_ref), np.array(losslist_test), atol=1e-4)
 if __name__ == '__main__':
    test_MpiAdam()
--- a/baselines/common/mpi_running_mean_std.py
+++ b/baselines/common/mpi_running_mean_std.py
@@ -1,4 +1,8 @@
-from mpi4py import MPI
+try:
    from mpi4py import MPI
 except ImportError:
    MPI = None
 import tensorflow as tf, baselines.common.tf_util as U, numpy as np
 class RunningMeanStd(object):
@@ -39,7 +43,8 @@ class RunningMeanStd(object):
        n = int(np.prod(self.shape))
        totalvec = np.zeros(n*2+1, 'float64')
        addvec = np.concatenate([x.sum(axis=0).ravel(), np.square(x).sum(axis=0).ravel(), np.array([len(x)],dtype='float64')])
-        MPI.COMM_WORLD.Allreduce(addvec, totalvec, op=MPI.SUM)
+        if MPI is not None:
            MPI.COMM_WORLD.Allreduce(addvec, totalvec, op=MPI.SUM)
        self.incfiltparams(totalvec[0:n].reshape(self.shape), totalvec[n:2*n].reshape(self.shape), totalvec[2*n])
@U.in_session
--- a/baselines/common/plot_util.py
+++ b/baselines/common/plot_util.py
@@ -0,0 +1,401 @@
 import matplotlib.pyplot as plt
 import os.path as osp
 import json
 import os
 import numpy as np
 import pandas
 from collections import defaultdict, namedtuple
 from baselines.bench import monitor
 from baselines.logger import read_json, read_csv
 def smooth(y, radius, mode='two_sided', valid_only=False):
    '''
    Smooth signal y, where radius is determines the size of the window
    mode='twosided':
        average over the window [max(index - radius, 0), min(index + radius, len(y)-1)]
    mode='causal':
        average over the window [max(index - radius, 0), index]
    valid_only: put nan in entries where the full-sized window is not available
    '''
    assert mode in ('two_sided', 'causal')
    if len(y) < 2*radius+1:
        return np.ones_like(y) * y.mean()
    elif mode == 'two_sided':
        convkernel = np.ones(2 * radius+1)
        out = np.convolve(y, convkernel,mode='same') / np.convolve(np.ones_like(y), convkernel, mode='same')
        if valid_only:
            out[:radius] = out[-radius:] = np.nan
    elif mode == 'causal':
        convkernel = np.ones(radius)
        out = np.convolve(y, convkernel,mode='full') / np.convolve(np.ones_like(y), convkernel, mode='full')
        out = out[:-radius+1]
        if valid_only:
            out[:radius] = np.nan
    return out
 def one_sided_ema(xolds, yolds, low=None, high=None, n=512, decay_steps=1., low_counts_threshold=1e-8):
    '''
    perform one-sided (causal) EMA (exponential moving average)
    smoothing and resampling to an even grid with n points.
    Does not do extrapolation, so we assume
    xolds[0] <= low && high <= xolds[-1]
    Arguments:
    xolds: array or list  - x values of data. Needs to be sorted in ascending order
    yolds: array of list  - y values of data. Has to have the same length as xolds
    low: float            - min value of the new x grid. By default equals to xolds[0]
    high: float           - max value of the new x grid. By default equals to xolds[-1]
    n: int                - number of points in new x grid
    decay_steps: float    - EMA decay factor, expressed in new x grid steps.
    low_counts_threshold: float or int
                          - y values with counts less than this value will be set to NaN
    Returns:
        tuple sum_ys, count_ys where
            xs        - array with new x grid
            ys        - array of EMA of y at each point of the new x grid
            count_ys  - array of EMA of y counts at each point of the new x grid
    '''
    low = xolds[0] if low is None else low
    high = xolds[-1] if high is None else high
    assert xolds[0] <= low, 'low = {} < xolds[0] = {} - extrapolation not permitted!'.format(low, xolds[0])
    assert xolds[-1] >= high, 'high = {} > xolds[-1] = {}  - extrapolation not permitted!'.format(high, xolds[-1])
    assert len(xolds) == len(yolds), 'length of xolds ({}) and yolds ({}) do not match!'.format(len(xolds), len(yolds))
    xolds = xolds.astype('float64')
    yolds = yolds.astype('float64')
    luoi = 0 # last unused old index
    sum_y = 0.
    count_y = 0.
    xnews = np.linspace(low, high, n)
    decay_period = (high - low) / (n - 1) * decay_steps
    interstep_decay = np.exp(- 1. / decay_steps)
    sum_ys = np.zeros_like(xnews)
    count_ys = np.zeros_like(xnews)
    for i in range(n):
        xnew = xnews[i]
        sum_y *= interstep_decay
        count_y *= interstep_decay
        while True:
            xold = xolds[luoi]
            if xold <= xnew:
                decay = np.exp(- (xnew - xold) / decay_period)
                sum_y += decay * yolds[luoi]
                count_y += decay
                luoi += 1
            else:
                break
            if luoi >= len(xolds):
                break
        sum_ys[i] = sum_y
        count_ys[i] = count_y
    ys = sum_ys / count_ys
    ys[count_ys < low_counts_threshold] = np.nan
    return xnews, ys, count_ys
 def symmetric_ema(xolds, yolds, low=None, high=None, n=512, decay_steps=1., low_counts_threshold=1e-8):
    '''
    perform symmetric EMA (exponential moving average)
    smoothing and resampling to an even grid with n points.
    Does not do extrapolation, so we assume
    xolds[0] <= low && high <= xolds[-1]
    Arguments:
    xolds: array or list  - x values of data. Needs to be sorted in ascending order
    yolds: array of list  - y values of data. Has to have the same length as xolds
    low: float            - min value of the new x grid. By default equals to xolds[0]
    high: float           - max value of the new x grid. By default equals to xolds[-1]
    n: int                - number of points in new x grid
    decay_steps: float    - EMA decay factor, expressed in new x grid steps.
    low_counts_threshold: float or int
                          - y values with counts less than this value will be set to NaN
    Returns:
        tuple sum_ys, count_ys where
            xs        - array with new x grid
            ys        - array of EMA of y at each point of the new x grid
            count_ys  - array of EMA of y counts at each point of the new x grid
    '''
    xs, ys1, count_ys1 = one_sided_ema(xolds, yolds, low, high, n, decay_steps, low_counts_threshold=0)
    _,  ys2, count_ys2 = one_sided_ema(-xolds[::-1], yolds[::-1], -high, -low, n, decay_steps, low_counts_threshold=0)
    ys2 = ys2[::-1]
    count_ys2 = count_ys2[::-1]
    count_ys = count_ys1 + count_ys2
    ys = (ys1 * count_ys1 + ys2 * count_ys2) / count_ys
    ys[count_ys < low_counts_threshold] = np.nan
    return xs, ys, count_ys
 Result = namedtuple('Result', 'monitor progress dirname metadata')
 Result.__new__.__defaults__ = (None,) * len(Result._fields)
 def load_results(root_dir_or_dirs, enable_progress=True, enable_monitor=True, verbose=False):
    '''
    load summaries of runs from a list of directories (including subdirectories)
    Arguments:
    enable_progress: bool - if True, will attempt to load data from progress.csv files (data saved by logger). Default: True
    enable_monitor: bool - if True, will attempt to load data from monitor.csv files (data saved by Monitor environment wrapper). Default: True
    verbose: bool - if True, will print out list of directories from which the data is loaded. Default: False
    Returns:
    List of Result objects with the following fields:
         - dirname - path to the directory data was loaded from
         - metadata - run metadata (such as command-line arguments and anything else in metadata.json file
         - monitor - if enable_monitor is True, this field contains pandas dataframe with loaded monitor.csv file (or aggregate of all *.monitor.csv files in the directory)
         - progress - if enable_progress is True, this field contains pandas dataframe with loaded progress.csv file
    '''
    if isinstance(root_dir_or_dirs, str):
        rootdirs = [osp.expanduser(root_dir_or_dirs)]
    else:
        rootdirs = [osp.expanduser(d) for d in root_dir_or_dirs]
    allresults = []
    for rootdir in rootdirs:
        assert osp.exists(rootdir), "%s doesn't exist"%rootdir
        for dirname, dirs, files in os.walk(rootdir):
            if '-proc' in dirname:
                files[:] = []
                continue
            if set(['metadata.json', 'monitor.json', 'monitor.csv', 'progress.json', 'progress.csv']).intersection(files):
                # used to be uncommented, which means do not go deeper than current directory if any of the data files
                # are found
                # dirs[:] = []
                result = {'dirname' : dirname}
                if "metadata.json" in files:
                    with open(osp.join(dirname, "metadata.json"), "r") as fh:
                        result['metadata'] = json.load(fh)
                progjson = osp.join(dirname, "progress.json")
                progcsv = osp.join(dirname, "progress.csv")
                if enable_progress:
                    if osp.exists(progjson):
                        result['progress'] = pandas.DataFrame(read_json(progjson))
                    elif osp.exists(progcsv):
                        try:
                            result['progress'] = read_csv(progcsv)
                        except pandas.errors.EmptyDataError:
                            print('skipping progress file in ', dirname, 'empty data')
                    else:
                        if verbose: print('skipping %s: no progress file'%dirname)
                if enable_monitor:
                    try:
                        result['monitor'] = pandas.DataFrame(monitor.load_results(dirname))
                    except monitor.LoadMonitorResultsError:
                        print('skipping %s: no monitor files'%dirname)
                    except Exception as e:
                        print('exception loading monitor file in %s: %s'%(dirname, e))
                if result.get('monitor') is not None or result.get('progress') is not None:
                    allresults.append(Result(**result))
                    if verbose:
                        print('successfully loaded %s'%dirname)
    if verbose: print('loaded %i results'%len(allresults))
    return allresults
 COLORS = ['blue', 'green', 'red', 'cyan', 'magenta', 'yellow', 'black', 'purple', 'pink',
        'brown', 'orange', 'teal',  'lightblue', 'lime', 'lavender', 'turquoise',
        'darkgreen', 'tan', 'salmon', 'gold',  'darkred', 'darkblue']
 def default_xy_fn(r):
    x = np.cumsum(r.monitor.l)
    y = smooth(r.monitor.r, radius=10)
    return x,y
 def default_split_fn(r):
    import re
    # match name between slash and -<digits> at the end of the string
    # (slash in the beginning or -<digits> in the end or either may be missing)
    match = re.search(r'[^/-]+(?=(-\d+)?\Z)', r.dirname)
    if match:
        return match.group(0)
 def plot_results(
    allresults, *,
    xy_fn=default_xy_fn,
    split_fn=default_split_fn,
    group_fn=default_split_fn,
    average_group=False,
    shaded_std=True,
    shaded_err=True,
    figsize=None,
    legend_outside=False,
    resample=0,
    smooth_step=1.0,
 ):
    '''
    Plot multiple Results objects
    xy_fn: function Result -> x,y           - function that converts results objects into tuple of x and y values.
                                              By default, x is cumsum of episode lengths, and y is episode rewards
    split_fn: function Result -> hashable   - function that converts results objects into keys to split curves into sub-panels by.
                                              That is, the results r for which split_fn(r) is different will be put on different sub-panels.
                                              By default, the portion of r.dirname between last / and -<digits> is returned. The sub-panels are
                                              stacked vertically in the figure.
    group_fn: function Result -> hashable   - function that converts results objects into keys to group curves by.
                                              That is, the results r for which group_fn(r) is the same will be put into the same group.
                                              Curves in the same group have the same color (if average_group is False), or averaged over
                                              (if average_group is True). The default value is the same as default value for split_fn
    average_group: bool                     - if True, will average the curves in the same group and plot the mean. Enables resampling
                                              (if resample = 0, will use 512 steps)
    shaded_std: bool                        - if True (default), the shaded region corresponding to standard deviation of the group of curves will be
                                              shown (only applicable if average_group = True)
    shaded_err: bool                        - if True (default), the shaded region corresponding to error in mean estimate of the group of curves
                                              (that is, standard deviation divided by square root of number of curves) will be
                                              shown (only applicable if average_group = True)
    figsize: tuple or None                  - size of the resulting figure (including sub-panels). By default, width is 6 and height is 6 times number of
                                              sub-panels.
    legend_outside: bool                    - if True, will place the legend outside of the sub-panels.
    resample: int                           - if not zero, size of the uniform grid in x direction to resample onto. Resampling is performed via symmetric
                                              EMA smoothing (see the docstring for symmetric_ema).
                                              Default is zero (no resampling). Note that if average_group is True, resampling is necessary; in that case, default
                                              value is 512.
    smooth_step: float                      - when resampling (i.e. when resample > 0 or average_group is True), use this EMA decay parameter (in units of the new grid step).
                                              See docstrings for decay_steps in symmetric_ema or one_sided_ema functions.
    '''
    if split_fn is None: split_fn = lambda _ : ''
    if group_fn is None: group_fn = lambda _ : ''
    sk2r = defaultdict(list) # splitkey2results
    for result in allresults:
        splitkey = split_fn(result)
        sk2r[splitkey].append(result)
    assert len(sk2r) > 0
    assert isinstance(resample, int), "0: don't resample. <integer>: that many samples"
    nrows = len(sk2r)
    ncols = 1
    figsize = figsize or (6, 6 * nrows)
    f, axarr = plt.subplots(nrows, ncols, sharex=False, squeeze=False, figsize=figsize)
    groups = list(set(group_fn(result) for result in allresults))
    default_samples = 512
    if average_group:
        resample = resample or default_samples
    for (isplit, sk) in enumerate(sorted(sk2r.keys())):
        g2l = {}
        g2c = defaultdict(int)
        sresults = sk2r[sk]
        gresults = defaultdict(list)
        ax = axarr[isplit][0]
        for result in sresults:
            group = group_fn(result)
            g2c[group] += 1
            x, y = xy_fn(result)
            if x is None: x = np.arange(len(y))
            x, y = map(np.asarray, (x, y))
            if average_group:
                gresults[group].append((x,y))
            else:
                if resample:
                    x, y, counts = symmetric_ema(x, y, x[0], x[-1], resample, decay_steps=smooth_step)
                l, = ax.plot(x, y, color=COLORS[groups.index(group) % len(COLORS)])
                g2l[group] = l
        if average_group:
            for group in sorted(groups):
                xys = gresults[group]
                if not any(xys):
                    continue
                color = COLORS[groups.index(group) % len(COLORS)]
                origxs = [xy[0] for xy in xys]
                minxlen = min(map(len, origxs))
                def allequal(qs):
                    return all((q==qs[0]).all() for q in qs[1:])
                if resample:
                    low  = max(x[0] for x in origxs)
                    high = min(x[-1] for x in origxs)
                    usex = np.linspace(low, high, resample)
                    ys = []
                    for (x, y) in xys:
                        ys.append(symmetric_ema(x, y, low, high, resample, decay_steps=smooth_step)[1])
                else:
                    assert allequal([x[:minxlen] for x in origxs]),\
                        'If you want to average unevenly sampled data, set resample=<number of samples you want>'
                    usex = origxs[0]
                    ys = [xy[1][:minxlen] for xy in xys]
                ymean = np.mean(ys, axis=0)
                ystd = np.std(ys, axis=0)
                ystderr = ystd / np.sqrt(len(ys))
                l, = axarr[isplit][0].plot(usex, ymean, color=color)
                g2l[group] = l
                if shaded_err:
                    ax.fill_between(usex, ymean - ystderr, ymean + ystderr, color=color, alpha=.4)
                if shaded_std:
                    ax.fill_between(usex, ymean - ystd,    ymean + ystd,    color=color, alpha=.2)
        # https://matplotlib.org/users/legend_guide.html
        plt.tight_layout()
        if any(g2l.keys()):
            ax.legend(
                g2l.values(),
                ['%s (%i)'%(g, g2c[g]) for g in g2l] if average_group else g2l.keys(),
                loc=2 if legend_outside else None,
                bbox_to_anchor=(1,1) if legend_outside else None)
        ax.set_title(sk)
    return f, axarr
 def regression_analysis(df):
    xcols = list(df.columns.copy())
    xcols.remove('score')
    ycols = ['score']
    import statsmodels.api as sm
    mod = sm.OLS(df[ycols], sm.add_constant(df[xcols]), hasconst=False)
    res = mod.fit()
    print(res.summary())
 def test_smooth():
    norig = 100
    nup = 300
    ndown = 30
    xs = np.cumsum(np.random.rand(norig) * 10 / norig)
    yclean = np.sin(xs)
    ys = yclean + .1 * np.random.randn(yclean.size)
    xup, yup, _ = symmetric_ema(xs, ys, xs.min(), xs.max(), nup, decay_steps=nup/ndown)
    xdown, ydown, _ = symmetric_ema(xs, ys, xs.min(), xs.max(), ndown, decay_steps=ndown/ndown)
    xsame, ysame, _ = symmetric_ema(xs, ys, xs.min(), xs.max(), norig, decay_steps=norig/ndown)
    plt.plot(xs, ys, label='orig', marker='x')
    plt.plot(xup, yup, label='up', marker='x')
    plt.plot(xdown, ydown, label='down', marker='x')
    plt.plot(xsame, ysame, label='same', marker='x')
    plt.plot(xs, yclean, label='clean', marker='x')
    plt.legend()
    plt.show()
--- a/baselines/common/vec_env/init.py
+++ b/baselines/common/vec_env/init.py
@@ -32,6 +32,11 @@ class VecEnv(ABC):
    """
    closed = False
    viewer = None
    metadata = {
        'render.modes': ['human', 'rgb_array']
    }
    def __init__(self, num_envs, observation_space, action_space):
        self.num_envs = num_envs
        self.observation_space = observation_space
--- a/baselines/common/vec_env/dummy_vec_env.py
+++ b/baselines/common/vec_env/dummy_vec_env.py
@@ -20,9 +20,6 @@ class DummyVecEnv(VecEnv):
        env = self.envs[0]
        VecEnv.__init__(self, len(env_fns), env.observation_space, env.action_space)
        obs_space = env.observation_space
        if isinstance(obs_space, spaces.MultiDiscrete):
            obs_space.shape = obs_space.shape[0]
        self.keys, shapes, dtypes = obs_space_info(obs_space)
        self.buf_obs = { k: np.zeros((self.num_envs,) + tuple(shapes[k]), dtype=dtypes[k]) for k in self.keys }
@@ -79,6 +76,6 @@ class DummyVecEnv(VecEnv):
    def render(self, mode='human'):
        if self.num_envs == 1:
-            self.envs[0].render(mode=mode)
+            return self.envs[0].render(mode=mode)
        else:
-            super().render(mode=mode)
+            return super().render(mode=mode)
--- a/baselines/common/vec_env/test_video_recorder.py
+++ b/baselines/common/vec_env/test_video_recorder.py
@@ -0,0 +1,49 @@
 """
 Tests for asynchronous vectorized environments.
 """
 import gym
 import pytest
 import os
 import glob
 import tempfile
 from .dummy_vec_env import DummyVecEnv
 from .shmem_vec_env import ShmemVecEnv
 from .subproc_vec_env import SubprocVecEnv
 from .vec_video_recorder import VecVideoRecorder
@pytest.mark.parametrize('klass', (DummyVecEnv, ShmemVecEnv, SubprocVecEnv))
@pytest.mark.parametrize('num_envs', (1, 4))
@pytest.mark.parametrize('video_length', (10, 100))
@pytest.mark.parametrize('video_interval', (1, 50))
 def test_video_recorder(klass, num_envs, video_length, video_interval):
    """
    Wrap an existing VecEnv with VevVideoRecorder,
    Make (video_interval + video_length + 1) steps,
    then check that the file is present
    """
    def make_fn():
        env = gym.make('PongNoFrameskip-v4')
        return env
    fns = [make_fn for _ in range(num_envs)]
    env = klass(fns)
    with tempfile.TemporaryDirectory() as video_path:
        env = VecVideoRecorder(env, video_path, record_video_trigger=lambda x: x % video_interval == 0, video_length=video_length)
        env.reset()
        for _ in range(video_interval + video_length + 1):
            env.step([0] * num_envs)
        env.close()
        recorded_video = glob.glob(os.path.join(video_path, "*.mp4"))
        # first and second step
        assert len(recorded_video) == 2
        # Files are not empty
        assert all(os.stat(p).st_size != 0 for p in recorded_video)
--- a/baselines/common/vec_env/vec_video_recorder.py
+++ b/baselines/common/vec_env/vec_video_recorder.py
@@ -0,0 +1,89 @@
 import os
 from baselines import logger
 from baselines.common.vec_env import VecEnvWrapper
 from gym.wrappers.monitoring import video_recorder
 class VecVideoRecorder(VecEnvWrapper):
    """
    Wrap VecEnv to record rendered image as mp4 video.
    """
    def __init__(self, venv, directory, record_video_trigger, video_length=200):
        """
        # Arguments
            venv: VecEnv to wrap
            directory: Where to save videos
            record_video_trigger:
                Function that defines when to start recording.
                The function takes the current number of step,
                and returns whether we should start recording or not.
            video_length: Length of recorded video
        """
        VecEnvWrapper.__init__(self, venv)
        self.record_video_trigger = record_video_trigger
        self.video_recorder = None
        self.directory = os.path.abspath(directory)
        if not os.path.exists(self.directory): os.mkdir(self.directory)
        self.file_prefix = "vecenv"
        self.file_infix = '{}'.format(os.getpid())
        self.step_id = 0
        self.video_length = video_length
        self.recording = False
        self.recorded_frames = 0
    def reset(self):
        obs = self.venv.reset()
        self.start_video_recorder()
        return obs
    def start_video_recorder(self):
        self.close_video_recorder()
        base_path = os.path.join(self.directory, '{}.video.{}.video{:06}'.format(self.file_prefix, self.file_infix, self.step_id))
        self.video_recorder = video_recorder.VideoRecorder(
                env=self.venv,
                base_path=base_path,
                metadata={'step_id': self.step_id}
                )
        self.video_recorder.capture_frame()
        self.recorded_frames = 1
        self.recording = True
    def _video_enabled(self):
        return self.record_video_trigger(self.step_id)
    def step_wait(self):
        obs, rews, dones, infos = self.venv.step_wait()
        self.step_id += 1
        if self.recording:
            self.video_recorder.capture_frame()
            self.recorded_frames += 1
            if self.recorded_frames > self.video_length:
                logger.info("Saving video to ", self.video_recorder.path)
                self.close_video_recorder()
        elif self._video_enabled():
                self.start_video_recorder()
        return obs, rews, dones, infos
    def close_video_recorder(self):
        if self.recording:
            self.video_recorder.close()
        self.recording = False
        self.recorded_frames = 0
    def close(self):
        VecEnvWrapper.close(self)
        self.close_video_recorder()
    def __del__(self):
        self.close()
--- a/baselines/ddpg/ddpg.py
+++ b/baselines/ddpg/ddpg.py
@@ -12,8 +12,11 @@ import baselines.common.tf_util as U
 from baselines import logger
 import numpy as np
 from mpi4py import MPI
 try:
    from mpi4py import MPI
 except ImportError:
    MPI = None
 def learn(network, env,
          seed=None,
@@ -49,7 +52,11 @@ def learn(network, env,
    else:
        nb_epochs = 500
-    rank = MPI.COMM_WORLD.Get_rank()
+    if MPI is not None:
        rank = MPI.COMM_WORLD.Get_rank()
    else:
        rank = 0
    nb_actions = env.action_space.shape[-1]
    assert (np.abs(env.action_space.low) == env.action_space.high).all()  # we assume symmetric actions.
@@ -59,7 +66,6 @@ def learn(network, env,
    action_noise = None
    param_noise = None
    nb_actions = env.action_space.shape[-1]
    if noise_type is not None:
        for current_noise_type in noise_type.split(','):
            current_noise_type = current_noise_type.strip()
@@ -200,7 +206,11 @@ def learn(network, env,
                            eval_episode_rewards_history.append(eval_episode_reward[d])
                            eval_episode_reward[d] = 0.0
-        mpi_size = MPI.COMM_WORLD.Get_size()
+        if MPI is not None:
            mpi_size = MPI.COMM_WORLD.Get_size()
        else:
            mpi_size = 1
        # Log stats.
        # XXX shouldn't call np.mean on variable length lists
        duration = time.time() - start_time
@@ -234,7 +244,10 @@ def learn(network, env,
            else:
                raise ValueError('expected scalar, got %s'%x)
-        combined_stats_sums = MPI.COMM_WORLD.allreduce(np.array([ np.array(x).flatten()[0] for x in combined_stats.values()]))
+        combined_stats_sums = np.array([ np.array(x).flatten()[0] for x in combined_stats.values()])
        if MPI is not None:
            combined_stats_sums = MPI.COMM_WORLD.allreduce(combined_stats_sums)
        combined_stats = {k : v / mpi_size for (k,v) in zip(combined_stats.keys(), combined_stats_sums)}
        # Total statistics.
--- a/baselines/ddpg/ddpg_learner.py
+++ b/baselines/ddpg/ddpg_learner.py
@@ -9,7 +9,10 @@ from baselines import logger
 from baselines.common.mpi_adam import MpiAdam
 import baselines.common.tf_util as U
 from baselines.common.mpi_running_mean_std import RunningMeanStd
-from mpi4py import MPI
+try:
    from mpi4py import MPI
 except ImportError:
    MPI = None
 def normalize(x, stats):
    if stats is None:
@@ -268,7 +271,7 @@ class DDPG(object):
        if self.action_noise is not None and apply_noise:
            noise = self.action_noise()
-            assert noise.shape == action.shape
+            assert noise.shape == action[0].shape
            action += noise
        action = np.clip(action, self.action_range[0], self.action_range[1])
@@ -358,6 +361,11 @@ class DDPG(object):
        return stats
    def adapt_param_noise(self):
        try:
            from mpi4py import MPI
        except ImportError:
            MPI = None
        if self.param_noise is None:
            return 0.
@@ -371,7 +379,16 @@ class DDPG(object):
            self.param_noise_stddev: self.param_noise.current_stddev,
        })
-        mean_distance = MPI.COMM_WORLD.allreduce(distance, op=MPI.SUM) / MPI.COMM_WORLD.Get_size()
+        if MPI is not None:
            mean_distance = MPI.COMM_WORLD.allreduce(distance, op=MPI.SUM) / MPI.COMM_WORLD.Get_size()
        else:
            mean_distance = distance
        if MPI is not None:
            mean_distance = MPI.COMM_WORLD.allreduce(distance, op=MPI.SUM) / MPI.COMM_WORLD.Get_size()
        else:
            mean_distance = distance
        self.param_noise.adapt(mean_distance)
        return mean_distance
--- a/baselines/ppo2/defaults.py
+++ b/baselines/ppo2/defaults.py
@@ -20,3 +20,6 @@ def atari():
        lr=lambda f : f * 2.5e-4,
        cliprange=lambda f : f * 0.1,
    )
 def retro():
    return atari()
--- a/baselines/ppo2/ppo2.py
+++ b/baselines/ppo2/ppo2.py
@@ -10,11 +10,15 @@ from baselines.common import explained_variance, set_global_seeds
 from baselines.common.policies import build_policy
 from baselines.common.runners import AbstractEnvRunner
 from baselines.common.tf_util import get_session, save_variables, load_variables
 from baselines.common.mpi_adam_optimizer import MpiAdamOptimizer
-from mpi4py import MPI
+try:
    from baselines.common.mpi_adam_optimizer import MpiAdamOptimizer
    from mpi4py import MPI
    from baselines.common.mpi_util import sync_from_root
 except ImportError:
    MPI = None
 from baselines.common.tf_util import initialize
 from baselines.common.mpi_util import sync_from_root
 class Model(object):
    """
@@ -93,7 +97,10 @@ class Model(object):
        # 1. Get the model parameters
        params = tf.trainable_variables('ppo2_model')
        # 2. Build our trainer
-        trainer = MpiAdamOptimizer(MPI.COMM_WORLD, learning_rate=LR, epsilon=1e-5)
+        if MPI is not None:
            trainer = MpiAdamOptimizer(MPI.COMM_WORLD, learning_rate=LR, epsilon=1e-5)
        else:
            trainer = tf.train.AdamOptimizer(learning_rate=LR, epsilon=1e-5)
        # 3. Calculate the gradients
        grads_and_var = trainer.compute_gradients(loss, params)
        grads, var = zip(*grads_and_var)
@@ -136,10 +143,12 @@ class Model(object):
        self.save = functools.partial(save_variables, sess=sess)
        self.load = functools.partial(load_variables, sess=sess)
-        if MPI.COMM_WORLD.Get_rank() == 0:
+        if MPI is None or MPI.COMM_WORLD.Get_rank() == 0:
            initialize()
        global_variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="")
-        sync_from_root(sess, global_variables) #pylint: disable=E1101
+
        if MPI is not None:
            sync_from_root(sess, global_variables) #pylint: disable=E1101
 class Runner(AbstractEnvRunner):
    """
@@ -392,9 +401,9 @@ def learn(*, network, env, total_timesteps, eval_env = None, seed=None, nsteps=2
            logger.logkv('time_elapsed', tnow - tfirststart)
            for (lossval, lossname) in zip(lossvals, model.loss_names):
                logger.logkv(lossname, lossval)
-            if MPI.COMM_WORLD.Get_rank() == 0:
+            if MPI is None or MPI.COMM_WORLD.Get_rank() == 0:
                logger.dumpkvs()
-        if save_interval and (update % save_interval == 0 or update == 1) and logger.get_dir() and MPI.COMM_WORLD.Get_rank() == 0:
+        if save_interval and (update % save_interval == 0 or update == 1) and logger.get_dir() and (MPI is None or MPI.COMM_WORLD.Get_rank() == 0):
            checkdir = osp.join(logger.get_dir(), 'checkpoints')
            os.makedirs(checkdir, exist_ok=True)
            savepath = osp.join(checkdir, '%.5i'%update)
--- a/baselines/run.py
+++ b/baselines/run.py
@@ -6,6 +6,7 @@ from collections import defaultdict
 import tensorflow as tf
 import numpy as np
 from baselines.common.vec_env.vec_video_recorder import VecVideoRecorder
 from baselines.common.vec_env.vec_frame_stack import VecFrameStack
 from baselines.common.cmd_util import common_arg_parser, parse_unknown_args, make_vec_env, make_env
 from baselines.common.tf_util import get_session
@@ -62,6 +63,8 @@ def train(args, extra_args):
    alg_kwargs.update(extra_args)
    env = build_env(args)
    if args.save_video_interval != 0:
        env = VecVideoRecorder(env, osp.join(logger.Logger.CURRENT.dir, "videos"), record_video_trigger=lambda x: x % args.save_video_interval == 0, video_length=args.save_video_length)
    if args.network:
        alg_kwargs['network'] = args.network
--- a/baselines/trpo_mpi/trpo_mpi.py
+++ b/baselines/trpo_mpi/trpo_mpi.py
@@ -4,7 +4,6 @@ import baselines.common.tf_util as U
 import tensorflow as tf, numpy as np
 import time
 from baselines.common import colorize
 from mpi4py import MPI
 from collections import deque
 from baselines.common import set_global_seeds
 from baselines.common.mpi_adam import MpiAdam
@@ -13,6 +12,11 @@ from baselines.common.input import observation_placeholder
 from baselines.common.policies import build_policy
 from contextlib import contextmanager
 try:
    from mpi4py import MPI
 except ImportError:
    MPI = None
 def traj_segment_generator(pi, env, horizon, stochastic):
    # Initialize state variables
    t = 0
@@ -146,9 +150,12 @@ def learn(*,
    '''
-
+    if MPI is not None:
-    nworkers = MPI.COMM_WORLD.Get_size()
+        nworkers = MPI.COMM_WORLD.Get_size()
-    rank = MPI.COMM_WORLD.Get_rank()
+        rank = MPI.COMM_WORLD.Get_rank()
    else:
        nworkers = 1
        rank = 0
    cpus_per_worker = 1
    U.get_session(config=tf.ConfigProto(
@@ -237,9 +244,13 @@ def learn(*,
    def allmean(x):
        assert isinstance(x, np.ndarray)
-        out = np.empty_like(x)
+        if MPI is not None:
-        MPI.COMM_WORLD.Allreduce(x, out, op=MPI.SUM)
+            out = np.empty_like(x)
-        out /= nworkers
+            MPI.COMM_WORLD.Allreduce(x, out, op=MPI.SUM)
            out /= nworkers
        else:
            out = np.copy(x)
        return out
    U.initialize()
@@ -247,7 +258,9 @@ def learn(*,
        pi.load(load_path)
    th_init = get_flat()
-    MPI.COMM_WORLD.Bcast(th_init, root=0)
+    if MPI is not None:
        MPI.COMM_WORLD.Bcast(th_init, root=0)
    set_from_flat(th_init)
    vfadam.sync()
    print("Init param sum", th_init.sum(), flush=True)
@@ -353,7 +366,11 @@ def learn(*,
        logger.record_tabular("ev_tdlam_before", explained_variance(vpredbefore, tdlamret))
        lrlocal = (seg["ep_lens"], seg["ep_rets"]) # local values
-        listoflrpairs = MPI.COMM_WORLD.allgather(lrlocal) # list of tuples
+        if MPI is not None:
            listoflrpairs = MPI.COMM_WORLD.allgather(lrlocal) # list of tuples
        else:
            listoflrpairs = [lrlocal]
        lens, rews = map(flatten_lists, zip(*listoflrpairs))
        lenbuffer.extend(lens)
        rewbuffer.extend(rews)
--- a/docs/viz/viz.ipynb
+++ b/docs/viz/viz.ipynb
--- a/setup.py
+++ b/setup.py
@@ -11,10 +11,14 @@ extras = {
    'test': [
        'filelock',
        'pytest',
        'pytest-forked',
        'atari-py'
    ],
    'bullet': [
        'pybullet',
    ],
    'mpi': [
        'mpi4py'
    ]
 }
@@ -34,7 +38,6 @@ setup(name='baselines',
          'joblib',
          'dill',
          'progressbar2',
          'mpi4py',
          'cloudpickle',
          'click',
          'opencv-python'
Author	SHA1	Message	Date
Peter Zhokhov	6b41b6b984	updated links in README and notebook	2018-11-07 16:23:32 -08:00
Peter Zhokhov	9705773eab	replaced vizualization doc with notebook	2018-11-07 16:18:47 -08:00
pzhokhov	7bb405c7a7	Update viz.md	2018-11-07 14:25:35 -08:00
pzhokhov	8b95576a92	more viz + build fixes (#703 ) * viz docs * writing vizualization docs * documenting plot_util * docstrings in plot_util * autopep8 and flake8 * spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc) * rephrased viz.md a little bit * more examples of viz code usage in the docs	2018-11-06 17:02:20 -08:00
Peter Zhokhov	db9563ebf6	more examples of viz code usage in the docs	2018-11-06 15:25:17 -08:00
Peter Zhokhov	b8bc0f8791	more options in plot_util + docs + freezing build fixes	2018-11-06 14:07:53 -08:00
Peter Zhokhov	9d4fb76ef0	making num_envs and video length smaller in test_video_recorder to prevent hanging on travis	2018-11-06 09:58:43 -08:00
Peter Zhokhov	664ec6faf0	catch bugfixes in gym	2018-11-05 19:19:39 -08:00
Peter Zhokhov	3917321fbe	revert over-spellchecking	2018-11-05 17:00:40 -08:00
coord.e	6e607efa90	Add video recorder (#666 ) * Fix: Return the result of rendering from dummyvecenv * Add: Add a video recorder wrapper for vecenv * Change: Use VecVideoRecorder with --video_monitor flag * Change: Overwrite the metadata only when it isn't defined * Add: Define __del__ to make the file correctly closed in exit * Fix: Bump epidode_id in reset() * Fix: Use hasattr to check the existence of .metadata * Fix: Make directory when it doesn't exist * Change: Kepp recording for `video_length` steps, then close Because reset() is not what it is in normal gym.Env * Add: Enable to specify video_length from command line argument * Delete: Delete default value, None, of video_callable * Change: Use self.recorded_frames and self.recording to manage intervals * Add: Log the status of video recording * Fix: Fix saving path * Change: Place metadata in the base VecEnv * Delete: Delete unused imports * Fix: epidode_id => step_id * Fix: Refine the flag name * Change: Unify the flag name folloing to previous change * [WIP] Add: Add a test of VecVideoRecorder * Fix: Use PongNoFrameskip-v0 because SimpleEnv doesn't have render() * Change; Use TemporaryDirectory * Fix: minimal successful test * Add: Test against parallel environments * Add: Test against different type of VecEnvs * Change: Test against different length and interval of video capture * Delete: Reduce the number of tests * Change: Test if the output video is not empty * Add: Add some comments * Fix: Fix the flag name * Add: Add docstrings * Fix: Install ffmpeg in testing container for VecVideoRecorder's test * Fix: Delete unused things * Fix: Replace `video_callable` with `record_video_trigger` * Fix: Improve the explanation of `record_video_trigger` argument * Fix: Close owning vecenv in VecVideoRecorder.close to resolve memory leak	2018-11-05 14:32:17 -08:00
pzhokhov	c74ce02b9d	visualization code docs / bugfixes (#701 ) * viz docs * writing vizualization docs * documenting plot_util * docstrings in plot_util * autopep8 and flake8 * spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc) * rephrased viz.md a little bit	2018-11-05 14:31:15 -08:00
Peter Zhokhov	fa199534c5	rephrased viz.md a little bit	2018-11-05 14:03:13 -08:00
Peter Zhokhov	09b42f8c26	spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)	2018-11-05 14:00:19 -08:00
Peter Zhokhov	06421877bf	autopep8 and flake8	2018-11-05 10:04:43 -08:00
Peter Zhokhov	527acf123f	docstrings in plot_util	2018-11-05 10:02:45 -08:00
Peter Zhokhov	1fc5e137b2	Merge branch 'master' of github.com:openai/baselines into peterz_viz	2018-10-31 12:03:25 -07:00
pzhokhov	ab59de6922	mpi-less baselines (#689 ) * make baselines run without mpi wip * squash-merged latest master * further removing MPI references where unnecessary * more MPI removal * syntax and flake8 * MpiAdam becomes regular Adam if Mpi not present * autopep8 * add assertion to test in mpi_adam; fix trpo_mpi failure without MPI on cartpole * mpiless ddpg	2018-10-31 11:15:41 -07:00
Mathieu Poliquin	a071fa7630	Add retro to ppo2 defaults (#682 ) * Adds retro to ppo2 defaults Created defaults for retro, copied from Atari defaults for now. Tested with SuperMarioBros-Nes * ppo2 retro defaults to atari	2018-10-30 10:17:46 -07:00
Mathieu Poliquin	637bf55da7	Use deepmind wrapper for retro (#685 ) * Use deepmind wrapper for retro * moved wrap_deepmind_retro after Monitor wrapper	2018-10-30 10:16:15 -07:00
AurelianTactics	165c622572	DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError (#680 ) * DDPG has unused 'seed' argument DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for: ``` from baselines.common import set_global_seeds ... def learn(...): ... set_global_seeds(seed) ``` DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds. * DDPG: duplicate variable assignment variable nb_actions assigned same value twice in space of 10 lines nb_actions = env.action_space.shape[-1] * DDPG: noise_type 'normal_x' and 'ou_x' cause assert noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block: ''' if self.action_noise is not None and apply_noise: noise = self.action_noise() assert noise.shape == action.shape action += noise ''' noise is not nested: [number_of_actions] actions is nested: [[number_of_actions]] Can either nest noise or unnest actions * Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert" * DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block: ''' if self.action_noise is not None and apply_noise: noise = self.action_noise() assert noise.shape == action.shape action += noise ''' noise is not nested: [number_of_actions] action is nested: [[number_of_actions]] Hence the shapes do not pass the assert line even though the action += noise line is correct	2018-10-30 10:13:39 -07:00
Peter Zhokhov	6c194a8b15	documenting plot_util	2018-10-30 09:45:51 -07:00
Peter Zhokhov	0d0701f594	writing vizualization docs	2018-10-29 16:15:42 -07:00
Peter Zhokhov	be433fdb83	viz docs	2018-10-29 15:53:50 -07:00