updated links in README and notebook

replaced vizualization doc with notebook
Update viz.md
2018-11-07 16:23:32 -08:00 · 2018-11-07 16:18:47 -08:00 · 2018-11-07 14:25:35 -08:00 · 2018-11-06 17:02:20 -08:00 · 2018-11-06 15:25:17 -08:00 · 2018-11-06 14:07:53 -08:00
19 changed files with 1500 additions and 68 deletions
--- a/.travis.yml
+++ b/.travis.yml
@@ -11,4 +11,4 @@ install:

 script:
    - flake8 . --show-source --statistics
-    - docker run baselines-test pytest -v .
+    - docker run baselines-test pytest -v --forked .
--- a/15
+++ b/15
@@ -1,16 +1,9 @@
-FROM ubuntu:16.04
+FROM python:3.6
+
+RUN apt-get -y update && apt-get -y install ffmpeg
+# RUN apt-get -y update && apt-get -y install git wget python-dev python3-dev libopenmpi-dev python-pip zlib1g-dev cmake python-opencv

-RUN apt-get -y update && apt-get -y install git wget python-dev python3-dev libopenmpi-dev python-pip zlib1g-dev cmake python-opencv
 ENV CODE_DIR /root/code
-ENV VENV /root/venv
-
-RUN \
-    pip install virtualenv && \
-    virtualenv $VENV --python=python3 && \
-    . $VENV/bin/activate && \
-    pip install --upgrade pip
-
-ENV PATH=$VENV/bin:$PATH

 COPY . $CODE_DIR/baselines
 WORKDIR $CODE_DIR/baselines
--- a/README.md
+++ b/README.md
@@ -109,17 +109,9 @@ python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=0 --

 *NOTE:* At the moment Mujoco training uses VecNormalize wrapper for the environment which is not being saved correctly; so loading the models trained on Mujoco will not work well if the environment is recreated. If necessary, you can work around that by replacing RunningMeanStd by TfRunningMeanStd in [baselines/common/vec_env/vec_normalize.py](baselines/common/vec_env/vec_normalize.py#L12). This way, mean and std of environment normalizing wrapper will be saved in tensorflow variables and included in the model file; however, training is slower that way - hence not including it by default

+## Loading and vizualizing learning curves and other training metrics
+See [here](docs/viz/viz.ipynb) for instructions on how to load and display the training data. 

-## Using baselines with TensorBoard
-Baselines logger can save data in the TensorBoard format. To do so, set environment variables `OPENAI_LOG_FORMAT` and `OPENAI_LOGDIR`:
-```bash
-export OPENAI_LOG_FORMAT='stdout,log,csv,tensorboard' # formats are comma-separated, but for tensorboard you only really need the last one
-export OPENAI_LOGDIR=path/to/tensorboard/data
-```
-And you can now start TensorBoard with:
-```bash
-tensorboard --logdir=$OPENAI_LOGDIR
-```
 ## Subpackages

 - [A2C](baselines/a2c)
--- a/baselines/common/cmd_util.py
+++ b/baselines/common/cmd_util.py
@@ -60,12 +60,14 @@ def make_env(env_id, env_type, subrank=0, seed=None, reward_scale=1.0, gamestate
                  allow_early_resets=True)

    if env_type == 'atari':
-         return wrap_deepmind(env, **wrapper_kwargs)
-    elif reward_scale != 1:
-         return retro_wrappers.RewardScaler(env, reward_scale)
-    else:
-        return env
+        env = wrap_deepmind(env, **wrapper_kwargs)
+    elif env_type == 'retro':
+        env = retro_wrappers.wrap_deepmind_retro(env, **wrapper_kwargs)

+    if reward_scale != 1:
+        env = retro_wrappers.RewardScaler(env, reward_scale)
+
+    return env


 def make_mujoco_env(env_id, seed, reward_scale=1.0):
@@ -129,6 +131,8 @@ def common_arg_parser():
    parser.add_argument('--num_env', help='Number of environment copies being run in parallel. When not specified, set to number of cpus for Atari, and to 1 for Mujoco', default=None, type=int)
    parser.add_argument('--reward_scale', help='Reward scale factor. Default: 1.0', default=1.0, type=float)
    parser.add_argument('--save_path', help='Path to save trained model to', default=None, type=str)
+    parser.add_argument('--save_video_interval', help='Save video every x steps (0 = disabled)', default=0, type=int)
+    parser.add_argument('--save_video_length', help='Length of recorded video. Default: 200', default=200, type=int)
    parser.add_argument('--play', default=False, action='store_true')
    return parser

--- a/baselines/common/mpi_adam.py
+++ b/baselines/common/mpi_adam.py
@@ -1,7 +1,11 @@
-from mpi4py import MPI
 import baselines.common.tf_util as U
 import tensorflow as tf
 import numpy as np
+try:
+    from mpi4py import MPI
+except ImportError:
+    MPI = None
+

 class MpiAdam(object):
    def __init__(self, var_list, *, beta1=0.9, beta2=0.999, epsilon=1e-08, scale_grad_by_procs=True, comm=None):
@@ -16,16 +20,19 @@ class MpiAdam(object):
        self.t = 0
        self.setfromflat = U.SetFromFlat(var_list)
        self.getflat = U.GetFlat(var_list)
-        self.comm = MPI.COMM_WORLD if comm is None else comm
+        self.comm = MPI.COMM_WORLD if comm is None and MPI is not None else comm

    def update(self, localg, stepsize):
        if self.t % 100 == 0:
            self.check_synced()
        localg = localg.astype('float32')
+        if self.comm is not None:
            globalg = np.zeros_like(localg)
            self.comm.Allreduce(localg, globalg, op=MPI.SUM)
            if self.scale_grad_by_procs:
                globalg /= self.comm.Get_size()
+        else:
+            globalg = np.copy(localg)

        self.t += 1
        a = stepsize * np.sqrt(1 - self.beta2**self.t)/(1 - self.beta1**self.t)
@@ -35,11 +42,15 @@ class MpiAdam(object):
        self.setfromflat(self.getflat() + step)

    def sync(self):
+        if self.comm is None:
+            return
        theta = self.getflat()
        self.comm.Bcast(theta, root=0)
        self.setfromflat(theta)

    def check_synced(self):
+        if self.comm is None:
+            return
        if self.comm.Get_rank() == 0: # this is root
            theta = self.getflat()
            self.comm.Bcast(theta, root=0)
@@ -63,17 +74,30 @@ def test_MpiAdam():
    do_update = U.function([], loss, updates=[update_op])

    tf.get_default_session().run(tf.global_variables_initializer())
+    losslist_ref = []
    for i in range(10):
-        print(i,do_update())
+        l = do_update()
+        print(i, l)
+        losslist_ref.append(l)
+
+

    tf.set_random_seed(0)
    tf.get_default_session().run(tf.global_variables_initializer())

    var_list = [a,b]
-    lossandgrad = U.function([], [loss, U.flatgrad(loss, var_list)], updates=[update_op])
+    lossandgrad = U.function([], [loss, U.flatgrad(loss, var_list)])
    adam = MpiAdam(var_list)

+    losslist_test = []
    for i in range(10):
        l,g = lossandgrad()
        adam.update(g, stepsize)
        print(i,l)
+        losslist_test.append(l)
+
+    np.testing.assert_allclose(np.array(losslist_ref), np.array(losslist_test), atol=1e-4)
+
+
+if __name__ == '__main__':
+    test_MpiAdam()
--- a/baselines/common/mpi_running_mean_std.py
+++ b/baselines/common/mpi_running_mean_std.py
@@ -1,4 +1,8 @@
+try:
    from mpi4py import MPI
+except ImportError:
+    MPI = None
+
 import tensorflow as tf, baselines.common.tf_util as U, numpy as np

 class RunningMeanStd(object):
@@ -39,6 +43,7 @@ class RunningMeanStd(object):
        n = int(np.prod(self.shape))
        totalvec = np.zeros(n*2+1, 'float64')
        addvec = np.concatenate([x.sum(axis=0).ravel(), np.square(x).sum(axis=0).ravel(), np.array([len(x)],dtype='float64')])
+        if MPI is not None:
            MPI.COMM_WORLD.Allreduce(addvec, totalvec, op=MPI.SUM)
        self.incfiltparams(totalvec[0:n].reshape(self.shape), totalvec[n:2*n].reshape(self.shape), totalvec[2*n])

--- a/baselines/common/plot_util.py
+++ b/baselines/common/plot_util.py
@@ -0,0 +1,401 @@
+import matplotlib.pyplot as plt
+import os.path as osp
+import json
+import os
+import numpy as np
+import pandas
+from collections import defaultdict, namedtuple
+from baselines.bench import monitor
+from baselines.logger import read_json, read_csv
+
+def smooth(y, radius, mode='two_sided', valid_only=False):
+    '''
+    Smooth signal y, where radius is determines the size of the window
+
+    mode='twosided':
+        average over the window [max(index - radius, 0), min(index + radius, len(y)-1)]
+    mode='causal':
+        average over the window [max(index - radius, 0), index]
+
+    valid_only: put nan in entries where the full-sized window is not available
+
+    '''
+    assert mode in ('two_sided', 'causal')
+    if len(y) < 2*radius+1:
+        return np.ones_like(y) * y.mean()
+    elif mode == 'two_sided':
+        convkernel = np.ones(2 * radius+1)
+        out = np.convolve(y, convkernel,mode='same') / np.convolve(np.ones_like(y), convkernel, mode='same')
+        if valid_only:
+            out[:radius] = out[-radius:] = np.nan
+    elif mode == 'causal':
+        convkernel = np.ones(radius)
+        out = np.convolve(y, convkernel,mode='full') / np.convolve(np.ones_like(y), convkernel, mode='full')
+        out = out[:-radius+1]
+        if valid_only:
+            out[:radius] = np.nan
+    return out
+
+def one_sided_ema(xolds, yolds, low=None, high=None, n=512, decay_steps=1., low_counts_threshold=1e-8):
+    '''
+    perform one-sided (causal) EMA (exponential moving average)
+    smoothing and resampling to an even grid with n points.
+    Does not do extrapolation, so we assume
+    xolds[0] <= low && high <= xolds[-1]
+
+    Arguments:
+
+    xolds: array or list  - x values of data. Needs to be sorted in ascending order
+    yolds: array of list  - y values of data. Has to have the same length as xolds
+
+    low: float            - min value of the new x grid. By default equals to xolds[0]
+    high: float           - max value of the new x grid. By default equals to xolds[-1]
+
+    n: int                - number of points in new x grid
+
+    decay_steps: float    - EMA decay factor, expressed in new x grid steps.
+
+    low_counts_threshold: float or int
+                          - y values with counts less than this value will be set to NaN
+
+    Returns:
+        tuple sum_ys, count_ys where
+            xs        - array with new x grid
+            ys        - array of EMA of y at each point of the new x grid
+            count_ys  - array of EMA of y counts at each point of the new x grid
+
+    '''
+
+    low = xolds[0] if low is None else low
+    high = xolds[-1] if high is None else high
+
+    assert xolds[0] <= low, 'low = {} < xolds[0] = {} - extrapolation not permitted!'.format(low, xolds[0])
+    assert xolds[-1] >= high, 'high = {} > xolds[-1] = {}  - extrapolation not permitted!'.format(high, xolds[-1])
+    assert len(xolds) == len(yolds), 'length of xolds ({}) and yolds ({}) do not match!'.format(len(xolds), len(yolds))
+
+
+    xolds = xolds.astype('float64')
+    yolds = yolds.astype('float64')
+
+    luoi = 0 # last unused old index
+    sum_y = 0.
+    count_y = 0.
+    xnews = np.linspace(low, high, n)
+    decay_period = (high - low) / (n - 1) * decay_steps
+    interstep_decay = np.exp(- 1. / decay_steps)
+    sum_ys = np.zeros_like(xnews)
+    count_ys = np.zeros_like(xnews)
+    for i in range(n):
+        xnew = xnews[i]
+        sum_y *= interstep_decay
+        count_y *= interstep_decay
+        while True:
+            xold = xolds[luoi]
+            if xold <= xnew:
+                decay = np.exp(- (xnew - xold) / decay_period)
+                sum_y += decay * yolds[luoi]
+                count_y += decay
+                luoi += 1
+            else:
+                break
+            if luoi >= len(xolds):
+                break
+        sum_ys[i] = sum_y
+        count_ys[i] = count_y
+
+    ys = sum_ys / count_ys
+    ys[count_ys < low_counts_threshold] = np.nan
+
+    return xnews, ys, count_ys
+
+def symmetric_ema(xolds, yolds, low=None, high=None, n=512, decay_steps=1., low_counts_threshold=1e-8):
+    '''
+    perform symmetric EMA (exponential moving average)
+    smoothing and resampling to an even grid with n points.
+    Does not do extrapolation, so we assume
+    xolds[0] <= low && high <= xolds[-1]
+
+    Arguments:
+
+    xolds: array or list  - x values of data. Needs to be sorted in ascending order
+    yolds: array of list  - y values of data. Has to have the same length as xolds
+
+    low: float            - min value of the new x grid. By default equals to xolds[0]
+    high: float           - max value of the new x grid. By default equals to xolds[-1]
+
+    n: int                - number of points in new x grid
+
+    decay_steps: float    - EMA decay factor, expressed in new x grid steps.
+
+    low_counts_threshold: float or int
+                          - y values with counts less than this value will be set to NaN
+
+    Returns:
+        tuple sum_ys, count_ys where
+            xs        - array with new x grid
+            ys        - array of EMA of y at each point of the new x grid
+            count_ys  - array of EMA of y counts at each point of the new x grid
+
+    '''
+    xs, ys1, count_ys1 = one_sided_ema(xolds, yolds, low, high, n, decay_steps, low_counts_threshold=0)
+    _,  ys2, count_ys2 = one_sided_ema(-xolds[::-1], yolds[::-1], -high, -low, n, decay_steps, low_counts_threshold=0)
+    ys2 = ys2[::-1]
+    count_ys2 = count_ys2[::-1]
+    count_ys = count_ys1 + count_ys2
+    ys = (ys1 * count_ys1 + ys2 * count_ys2) / count_ys
+    ys[count_ys < low_counts_threshold] = np.nan
+    return xs, ys, count_ys
+
+Result = namedtuple('Result', 'monitor progress dirname metadata')
+Result.__new__.__defaults__ = (None,) * len(Result._fields)
+
+def load_results(root_dir_or_dirs, enable_progress=True, enable_monitor=True, verbose=False):
+    '''
+    load summaries of runs from a list of directories (including subdirectories)
+    Arguments:
+
+    enable_progress: bool - if True, will attempt to load data from progress.csv files (data saved by logger). Default: True
+
+    enable_monitor: bool - if True, will attempt to load data from monitor.csv files (data saved by Monitor environment wrapper). Default: True
+
+    verbose: bool - if True, will print out list of directories from which the data is loaded. Default: False
+
+
+    Returns:
+    List of Result objects with the following fields:
+         - dirname - path to the directory data was loaded from
+         - metadata - run metadata (such as command-line arguments and anything else in metadata.json file
+         - monitor - if enable_monitor is True, this field contains pandas dataframe with loaded monitor.csv file (or aggregate of all *.monitor.csv files in the directory)
+         - progress - if enable_progress is True, this field contains pandas dataframe with loaded progress.csv file
+    '''
+    if isinstance(root_dir_or_dirs, str):
+        rootdirs = [osp.expanduser(root_dir_or_dirs)]
+    else:
+        rootdirs = [osp.expanduser(d) for d in root_dir_or_dirs]
+    allresults = []
+    for rootdir in rootdirs:
+        assert osp.exists(rootdir), "%s doesn't exist"%rootdir
+        for dirname, dirs, files in os.walk(rootdir):
+            if '-proc' in dirname:
+                files[:] = []
+                continue
+            if set(['metadata.json', 'monitor.json', 'monitor.csv', 'progress.json', 'progress.csv']).intersection(files):
+                # used to be uncommented, which means do not go deeper than current directory if any of the data files
+                # are found
+                # dirs[:] = []
+                result = {'dirname' : dirname}
+                if "metadata.json" in files:
+                    with open(osp.join(dirname, "metadata.json"), "r") as fh:
+                        result['metadata'] = json.load(fh)
+                progjson = osp.join(dirname, "progress.json")
+                progcsv = osp.join(dirname, "progress.csv")
+                if enable_progress:
+                    if osp.exists(progjson):
+                        result['progress'] = pandas.DataFrame(read_json(progjson))
+                    elif osp.exists(progcsv):
+                        try:
+                            result['progress'] = read_csv(progcsv)
+                        except pandas.errors.EmptyDataError:
+                            print('skipping progress file in ', dirname, 'empty data')
+                    else:
+                        if verbose: print('skipping %s: no progress file'%dirname)
+
+                if enable_monitor:
+                    try:
+                        result['monitor'] = pandas.DataFrame(monitor.load_results(dirname))
+                    except monitor.LoadMonitorResultsError:
+                        print('skipping %s: no monitor files'%dirname)
+                    except Exception as e:
+                        print('exception loading monitor file in %s: %s'%(dirname, e))
+
+                if result.get('monitor') is not None or result.get('progress') is not None:
+                    allresults.append(Result(**result))
+                    if verbose:
+                        print('successfully loaded %s'%dirname)
+
+    if verbose: print('loaded %i results'%len(allresults))
+    return allresults
+
+COLORS = ['blue', 'green', 'red', 'cyan', 'magenta', 'yellow', 'black', 'purple', 'pink',
+        'brown', 'orange', 'teal',  'lightblue', 'lime', 'lavender', 'turquoise',
+        'darkgreen', 'tan', 'salmon', 'gold',  'darkred', 'darkblue']
+
+
+def default_xy_fn(r):
+    x = np.cumsum(r.monitor.l)
+    y = smooth(r.monitor.r, radius=10)
+    return x,y
+
+def default_split_fn(r):
+    import re
+    # match name between slash and -<digits> at the end of the string
+    # (slash in the beginning or -<digits> in the end or either may be missing)
+    match = re.search(r'[^/-]+(?=(-\d+)?\Z)', r.dirname)
+    if match:
+        return match.group(0)
+
+def plot_results(
+    allresults, *,
+    xy_fn=default_xy_fn,
+    split_fn=default_split_fn,
+    group_fn=default_split_fn,
+    average_group=False,
+    shaded_std=True,
+    shaded_err=True,
+    figsize=None,
+    legend_outside=False,
+    resample=0,
+    smooth_step=1.0,
+):
+    '''
+    Plot multiple Results objects
+
+    xy_fn: function Result -> x,y           - function that converts results objects into tuple of x and y values.
+                                              By default, x is cumsum of episode lengths, and y is episode rewards
+
+    split_fn: function Result -> hashable   - function that converts results objects into keys to split curves into sub-panels by.
+                                              That is, the results r for which split_fn(r) is different will be put on different sub-panels.
+                                              By default, the portion of r.dirname between last / and -<digits> is returned. The sub-panels are
+                                              stacked vertically in the figure.
+
+    group_fn: function Result -> hashable   - function that converts results objects into keys to group curves by.
+                                              That is, the results r for which group_fn(r) is the same will be put into the same group.
+                                              Curves in the same group have the same color (if average_group is False), or averaged over
+                                              (if average_group is True). The default value is the same as default value for split_fn
+
+    average_group: bool                     - if True, will average the curves in the same group and plot the mean. Enables resampling
+                                              (if resample = 0, will use 512 steps)
+
+    shaded_std: bool                        - if True (default), the shaded region corresponding to standard deviation of the group of curves will be
+                                              shown (only applicable if average_group = True)
+
+    shaded_err: bool                        - if True (default), the shaded region corresponding to error in mean estimate of the group of curves
+                                              (that is, standard deviation divided by square root of number of curves) will be
+                                              shown (only applicable if average_group = True)
+
+    figsize: tuple or None                  - size of the resulting figure (including sub-panels). By default, width is 6 and height is 6 times number of
+                                              sub-panels.
+
+
+    legend_outside: bool                    - if True, will place the legend outside of the sub-panels.
+
+    resample: int                           - if not zero, size of the uniform grid in x direction to resample onto. Resampling is performed via symmetric
+                                              EMA smoothing (see the docstring for symmetric_ema).
+                                              Default is zero (no resampling). Note that if average_group is True, resampling is necessary; in that case, default
+                                              value is 512.
+
+    smooth_step: float                      - when resampling (i.e. when resample > 0 or average_group is True), use this EMA decay parameter (in units of the new grid step).
+                                              See docstrings for decay_steps in symmetric_ema or one_sided_ema functions.
+
+    '''
+
+    if split_fn is None: split_fn = lambda _ : ''
+    if group_fn is None: group_fn = lambda _ : ''
+    sk2r = defaultdict(list) # splitkey2results
+    for result in allresults:
+        splitkey = split_fn(result)
+        sk2r[splitkey].append(result)
+    assert len(sk2r) > 0
+    assert isinstance(resample, int), "0: don't resample. <integer>: that many samples"
+    nrows = len(sk2r)
+    ncols = 1
+    figsize = figsize or (6, 6 * nrows)
+    f, axarr = plt.subplots(nrows, ncols, sharex=False, squeeze=False, figsize=figsize)
+
+    groups = list(set(group_fn(result) for result in allresults))
+
+    default_samples = 512
+    if average_group:
+        resample = resample or default_samples
+
+    for (isplit, sk) in enumerate(sorted(sk2r.keys())):
+        g2l = {}
+        g2c = defaultdict(int)
+        sresults = sk2r[sk]
+        gresults = defaultdict(list)
+        ax = axarr[isplit][0]
+        for result in sresults:
+            group = group_fn(result)
+            g2c[group] += 1
+            x, y = xy_fn(result)
+            if x is None: x = np.arange(len(y))
+            x, y = map(np.asarray, (x, y))
+            if average_group:
+                gresults[group].append((x,y))
+            else:
+                if resample:
+                    x, y, counts = symmetric_ema(x, y, x[0], x[-1], resample, decay_steps=smooth_step)
+                l, = ax.plot(x, y, color=COLORS[groups.index(group) % len(COLORS)])
+                g2l[group] = l
+        if average_group:
+            for group in sorted(groups):
+                xys = gresults[group]
+                if not any(xys):
+                    continue
+                color = COLORS[groups.index(group) % len(COLORS)]
+                origxs = [xy[0] for xy in xys]
+                minxlen = min(map(len, origxs))
+                def allequal(qs):
+                    return all((q==qs[0]).all() for q in qs[1:])
+                if resample:
+                    low  = max(x[0] for x in origxs)
+                    high = min(x[-1] for x in origxs)
+                    usex = np.linspace(low, high, resample)
+                    ys = []
+                    for (x, y) in xys:
+                        ys.append(symmetric_ema(x, y, low, high, resample, decay_steps=smooth_step)[1])
+                else:
+                    assert allequal([x[:minxlen] for x in origxs]),\
+                        'If you want to average unevenly sampled data, set resample=<number of samples you want>'
+                    usex = origxs[0]
+                    ys = [xy[1][:minxlen] for xy in xys]
+                ymean = np.mean(ys, axis=0)
+                ystd = np.std(ys, axis=0)
+                ystderr = ystd / np.sqrt(len(ys))
+                l, = axarr[isplit][0].plot(usex, ymean, color=color)
+                g2l[group] = l
+                if shaded_err:
+                    ax.fill_between(usex, ymean - ystderr, ymean + ystderr, color=color, alpha=.4)
+                if shaded_std:
+                    ax.fill_between(usex, ymean - ystd,    ymean + ystd,    color=color, alpha=.2)
+
+
+        # https://matplotlib.org/users/legend_guide.html
+        plt.tight_layout()
+        if any(g2l.keys()):
+            ax.legend(
+                g2l.values(),
+                ['%s (%i)'%(g, g2c[g]) for g in g2l] if average_group else g2l.keys(),
+                loc=2 if legend_outside else None,
+                bbox_to_anchor=(1,1) if legend_outside else None)
+        ax.set_title(sk)
+    return f, axarr
+
+def regression_analysis(df):
+    xcols = list(df.columns.copy())
+    xcols.remove('score')
+    ycols = ['score']
+    import statsmodels.api as sm
+    mod = sm.OLS(df[ycols], sm.add_constant(df[xcols]), hasconst=False)
+    res = mod.fit()
+    print(res.summary())
+
+def test_smooth():
+    norig = 100
+    nup = 300
+    ndown = 30
+    xs = np.cumsum(np.random.rand(norig) * 10 / norig)
+    yclean = np.sin(xs)
+    ys = yclean + .1 * np.random.randn(yclean.size)
+    xup, yup, _ = symmetric_ema(xs, ys, xs.min(), xs.max(), nup, decay_steps=nup/ndown)
+    xdown, ydown, _ = symmetric_ema(xs, ys, xs.min(), xs.max(), ndown, decay_steps=ndown/ndown)
+    xsame, ysame, _ = symmetric_ema(xs, ys, xs.min(), xs.max(), norig, decay_steps=norig/ndown)
+    plt.plot(xs, ys, label='orig', marker='x')
+    plt.plot(xup, yup, label='up', marker='x')
+    plt.plot(xdown, ydown, label='down', marker='x')
+    plt.plot(xsame, ysame, label='same', marker='x')
+    plt.plot(xs, yclean, label='clean', marker='x')
+    plt.legend()
+    plt.show()
+
+
--- a/baselines/common/vec_env/init.py
+++ b/baselines/common/vec_env/init.py
@@ -32,6 +32,11 @@ class VecEnv(ABC):
    """
    closed = False
    viewer = None
+
+    metadata = {
+        'render.modes': ['human', 'rgb_array']
+    }
+
    def __init__(self, num_envs, observation_space, action_space):
        self.num_envs = num_envs
        self.observation_space = observation_space
--- a/baselines/common/vec_env/dummy_vec_env.py
+++ b/baselines/common/vec_env/dummy_vec_env.py
@@ -20,9 +20,6 @@ class DummyVecEnv(VecEnv):
        env = self.envs[0]
        VecEnv.__init__(self, len(env_fns), env.observation_space, env.action_space)
        obs_space = env.observation_space
-        if isinstance(obs_space, spaces.MultiDiscrete):
-            obs_space.shape = obs_space.shape[0]
-
        self.keys, shapes, dtypes = obs_space_info(obs_space)

        self.buf_obs = { k: np.zeros((self.num_envs,) + tuple(shapes[k]), dtype=dtypes[k]) for k in self.keys }
@@ -79,6 +76,6 @@ class DummyVecEnv(VecEnv):

    def render(self, mode='human'):
        if self.num_envs == 1:
-            self.envs[0].render(mode=mode)
+            return self.envs[0].render(mode=mode)
        else:
-            super().render(mode=mode)
+            return super().render(mode=mode)
--- a/baselines/common/vec_env/test_video_recorder.py
+++ b/baselines/common/vec_env/test_video_recorder.py
@@ -0,0 +1,49 @@
+"""
+Tests for asynchronous vectorized environments.
+"""
+
+import gym
+import pytest
+import os
+import glob
+import tempfile
+
+from .dummy_vec_env import DummyVecEnv
+from .shmem_vec_env import ShmemVecEnv
+from .subproc_vec_env import SubprocVecEnv
+from .vec_video_recorder import VecVideoRecorder
+
+@pytest.mark.parametrize('klass', (DummyVecEnv, ShmemVecEnv, SubprocVecEnv))
+@pytest.mark.parametrize('num_envs', (1, 4))
+@pytest.mark.parametrize('video_length', (10, 100))
+@pytest.mark.parametrize('video_interval', (1, 50))
+def test_video_recorder(klass, num_envs, video_length, video_interval):
+    """
+    Wrap an existing VecEnv with VevVideoRecorder,
+    Make (video_interval + video_length + 1) steps,
+    then check that the file is present
+    """
+
+    def make_fn():
+        env = gym.make('PongNoFrameskip-v4')
+        return env
+    fns = [make_fn for _ in range(num_envs)]
+    env = klass(fns)
+
+    with tempfile.TemporaryDirectory() as video_path:
+        env = VecVideoRecorder(env, video_path, record_video_trigger=lambda x: x % video_interval == 0, video_length=video_length)
+
+        env.reset()
+        for _ in range(video_interval + video_length + 1):
+            env.step([0] * num_envs)
+        env.close()
+
+
+        recorded_video = glob.glob(os.path.join(video_path, "*.mp4"))
+
+        # first and second step
+        assert len(recorded_video) == 2
+        # Files are not empty
+        assert all(os.stat(p).st_size != 0 for p in recorded_video)
+
+
--- a/baselines/common/vec_env/vec_video_recorder.py
+++ b/baselines/common/vec_env/vec_video_recorder.py
@@ -0,0 +1,89 @@
+import os
+from baselines import logger
+from baselines.common.vec_env import VecEnvWrapper
+from gym.wrappers.monitoring import video_recorder
+
+
+class VecVideoRecorder(VecEnvWrapper):
+    """
+    Wrap VecEnv to record rendered image as mp4 video.
+    """
+
+    def __init__(self, venv, directory, record_video_trigger, video_length=200):
+        """
+        # Arguments
+            venv: VecEnv to wrap
+            directory: Where to save videos
+            record_video_trigger:
+                Function that defines when to start recording.
+                The function takes the current number of step,
+                and returns whether we should start recording or not.
+            video_length: Length of recorded video
+        """
+
+        VecEnvWrapper.__init__(self, venv)
+        self.record_video_trigger = record_video_trigger
+        self.video_recorder = None
+
+        self.directory = os.path.abspath(directory)
+        if not os.path.exists(self.directory): os.mkdir(self.directory)
+
+        self.file_prefix = "vecenv"
+        self.file_infix = '{}'.format(os.getpid())
+        self.step_id = 0
+        self.video_length = video_length
+
+        self.recording = False
+        self.recorded_frames = 0
+
+    def reset(self):
+        obs = self.venv.reset()
+
+        self.start_video_recorder()
+
+        return obs
+
+    def start_video_recorder(self):
+        self.close_video_recorder()
+
+        base_path = os.path.join(self.directory, '{}.video.{}.video{:06}'.format(self.file_prefix, self.file_infix, self.step_id))
+        self.video_recorder = video_recorder.VideoRecorder(
+                env=self.venv,
+                base_path=base_path,
+                metadata={'step_id': self.step_id}
+                )
+
+        self.video_recorder.capture_frame()
+        self.recorded_frames = 1
+        self.recording = True
+
+    def _video_enabled(self):
+        return self.record_video_trigger(self.step_id)
+
+    def step_wait(self):
+        obs, rews, dones, infos = self.venv.step_wait()
+
+        self.step_id += 1
+        if self.recording:
+            self.video_recorder.capture_frame()
+            self.recorded_frames += 1
+            if self.recorded_frames > self.video_length:
+                logger.info("Saving video to ", self.video_recorder.path)
+                self.close_video_recorder()
+        elif self._video_enabled():
+                self.start_video_recorder()
+
+        return obs, rews, dones, infos
+
+    def close_video_recorder(self):
+        if self.recording:
+            self.video_recorder.close()
+        self.recording = False
+        self.recorded_frames = 0
+
+    def close(self):
+        VecEnvWrapper.close(self)
+        self.close_video_recorder()
+
+    def __del__(self):
+        self.close()
--- a/baselines/ddpg/ddpg.py
+++ b/baselines/ddpg/ddpg.py
@@ -12,8 +12,11 @@ import baselines.common.tf_util as U

 from baselines import logger
 import numpy as np
-from mpi4py import MPI

+try:
+    from mpi4py import MPI
+except ImportError:
+    MPI = None

 def learn(network, env,
          seed=None,
@@ -49,7 +52,11 @@ def learn(network, env,
    else:
        nb_epochs = 500

+    if MPI is not None:
        rank = MPI.COMM_WORLD.Get_rank()
+    else:
+        rank = 0
+
    nb_actions = env.action_space.shape[-1]
    assert (np.abs(env.action_space.low) == env.action_space.high).all()  # we assume symmetric actions.

@@ -59,7 +66,6 @@ def learn(network, env,

    action_noise = None
    param_noise = None
-    nb_actions = env.action_space.shape[-1]
    if noise_type is not None:
        for current_noise_type in noise_type.split(','):
            current_noise_type = current_noise_type.strip()
@@ -200,7 +206,11 @@ def learn(network, env,
                            eval_episode_rewards_history.append(eval_episode_reward[d])
                            eval_episode_reward[d] = 0.0

+        if MPI is not None:
            mpi_size = MPI.COMM_WORLD.Get_size()
+        else:
+            mpi_size = 1
+
        # Log stats.
        # XXX shouldn't call np.mean on variable length lists
        duration = time.time() - start_time
@@ -234,7 +244,10 @@ def learn(network, env,
            else:
                raise ValueError('expected scalar, got %s'%x)

-        combined_stats_sums = MPI.COMM_WORLD.allreduce(np.array([ np.array(x).flatten()[0] for x in combined_stats.values()]))
+        combined_stats_sums = np.array([ np.array(x).flatten()[0] for x in combined_stats.values()])
+        if MPI is not None:
+            combined_stats_sums = MPI.COMM_WORLD.allreduce(combined_stats_sums)
+
        combined_stats = {k : v / mpi_size for (k,v) in zip(combined_stats.keys(), combined_stats_sums)}

        # Total statistics.
--- a/baselines/ddpg/ddpg_learner.py
+++ b/baselines/ddpg/ddpg_learner.py
@@ -9,7 +9,10 @@ from baselines import logger
 from baselines.common.mpi_adam import MpiAdam
 import baselines.common.tf_util as U
 from baselines.common.mpi_running_mean_std import RunningMeanStd
+try:
    from mpi4py import MPI
+except ImportError:
+    MPI = None

 def normalize(x, stats):
    if stats is None:
@@ -268,7 +271,7 @@ class DDPG(object):

        if self.action_noise is not None and apply_noise:
            noise = self.action_noise()
-            assert noise.shape == action.shape
+            assert noise.shape == action[0].shape
            action += noise
        action = np.clip(action, self.action_range[0], self.action_range[1])

@@ -358,6 +361,11 @@ class DDPG(object):
        return stats

    def adapt_param_noise(self):
+        try:
+            from mpi4py import MPI
+        except ImportError:
+            MPI = None
+
        if self.param_noise is None:
            return 0.

@@ -371,7 +379,16 @@ class DDPG(object):
            self.param_noise_stddev: self.param_noise.current_stddev,
        })

+        if MPI is not None:
            mean_distance = MPI.COMM_WORLD.allreduce(distance, op=MPI.SUM) / MPI.COMM_WORLD.Get_size()
+        else:
+            mean_distance = distance
+
+        if MPI is not None:
+            mean_distance = MPI.COMM_WORLD.allreduce(distance, op=MPI.SUM) / MPI.COMM_WORLD.Get_size()
+        else:
+            mean_distance = distance
+
        self.param_noise.adapt(mean_distance)
        return mean_distance

--- a/baselines/ppo2/defaults.py
+++ b/baselines/ppo2/defaults.py
@@ -20,3 +20,6 @@ def atari():
        lr=lambda f : f * 2.5e-4,
        cliprange=lambda f : f * 0.1,
    )
+
+def retro():
+    return atari()
--- a/baselines/ppo2/ppo2.py
+++ b/baselines/ppo2/ppo2.py
@@ -10,11 +10,15 @@ from baselines.common import explained_variance, set_global_seeds
 from baselines.common.policies import build_policy
 from baselines.common.runners import AbstractEnvRunner
 from baselines.common.tf_util import get_session, save_variables, load_variables
-from baselines.common.mpi_adam_optimizer import MpiAdamOptimizer

+try:
+    from baselines.common.mpi_adam_optimizer import MpiAdamOptimizer
    from mpi4py import MPI
-from baselines.common.tf_util import initialize
    from baselines.common.mpi_util import sync_from_root
+except ImportError:
+    MPI = None
+
+from baselines.common.tf_util import initialize

 class Model(object):
    """
@@ -93,7 +97,10 @@ class Model(object):
        # 1. Get the model parameters
        params = tf.trainable_variables('ppo2_model')
        # 2. Build our trainer
+        if MPI is not None:
            trainer = MpiAdamOptimizer(MPI.COMM_WORLD, learning_rate=LR, epsilon=1e-5)
+        else:
+            trainer = tf.train.AdamOptimizer(learning_rate=LR, epsilon=1e-5)
        # 3. Calculate the gradients
        grads_and_var = trainer.compute_gradients(loss, params)
        grads, var = zip(*grads_and_var)
@@ -136,9 +143,11 @@ class Model(object):
        self.save = functools.partial(save_variables, sess=sess)
        self.load = functools.partial(load_variables, sess=sess)

-        if MPI.COMM_WORLD.Get_rank() == 0:
+        if MPI is None or MPI.COMM_WORLD.Get_rank() == 0:
            initialize()
        global_variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="")
+
+        if MPI is not None:
            sync_from_root(sess, global_variables) #pylint: disable=E1101

 class Runner(AbstractEnvRunner):
@@ -392,9 +401,9 @@ def learn(*, network, env, total_timesteps, eval_env = None, seed=None, nsteps=2
            logger.logkv('time_elapsed', tnow - tfirststart)
            for (lossval, lossname) in zip(lossvals, model.loss_names):
                logger.logkv(lossname, lossval)
-            if MPI.COMM_WORLD.Get_rank() == 0:
+            if MPI is None or MPI.COMM_WORLD.Get_rank() == 0:
                logger.dumpkvs()
-        if save_interval and (update % save_interval == 0 or update == 1) and logger.get_dir() and MPI.COMM_WORLD.Get_rank() == 0:
+        if save_interval and (update % save_interval == 0 or update == 1) and logger.get_dir() and (MPI is None or MPI.COMM_WORLD.Get_rank() == 0):
            checkdir = osp.join(logger.get_dir(), 'checkpoints')
            os.makedirs(checkdir, exist_ok=True)
            savepath = osp.join(checkdir, '%.5i'%update)
--- a/baselines/run.py
+++ b/baselines/run.py
@@ -6,6 +6,7 @@ from collections import defaultdict
 import tensorflow as tf
 import numpy as np

+from baselines.common.vec_env.vec_video_recorder import VecVideoRecorder
 from baselines.common.vec_env.vec_frame_stack import VecFrameStack
 from baselines.common.cmd_util import common_arg_parser, parse_unknown_args, make_vec_env, make_env
 from baselines.common.tf_util import get_session
@@ -62,6 +63,8 @@ def train(args, extra_args):
    alg_kwargs.update(extra_args)

    env = build_env(args)
+    if args.save_video_interval != 0:
+        env = VecVideoRecorder(env, osp.join(logger.Logger.CURRENT.dir, "videos"), record_video_trigger=lambda x: x % args.save_video_interval == 0, video_length=args.save_video_length)

    if args.network:
        alg_kwargs['network'] = args.network
--- a/baselines/trpo_mpi/trpo_mpi.py
+++ b/baselines/trpo_mpi/trpo_mpi.py
@@ -4,7 +4,6 @@ import baselines.common.tf_util as U
 import tensorflow as tf, numpy as np
 import time
 from baselines.common import colorize
-from mpi4py import MPI
 from collections import deque
 from baselines.common import set_global_seeds
 from baselines.common.mpi_adam import MpiAdam
@@ -13,6 +12,11 @@ from baselines.common.input import observation_placeholder
 from baselines.common.policies import build_policy
 from contextlib import contextmanager

+try:
+    from mpi4py import MPI
+except ImportError:
+    MPI = None
+
 def traj_segment_generator(pi, env, horizon, stochastic):
    # Initialize state variables
    t = 0
@@ -146,9 +150,12 @@ def learn(*,

    '''

-
+    if MPI is not None:
        nworkers = MPI.COMM_WORLD.Get_size()
        rank = MPI.COMM_WORLD.Get_rank()
+    else:
+        nworkers = 1
+        rank = 0

    cpus_per_worker = 1
    U.get_session(config=tf.ConfigProto(
@@ -237,9 +244,13 @@ def learn(*,

    def allmean(x):
        assert isinstance(x, np.ndarray)
+        if MPI is not None:
            out = np.empty_like(x)
            MPI.COMM_WORLD.Allreduce(x, out, op=MPI.SUM)
            out /= nworkers
+        else:
+            out = np.copy(x)
+
        return out

    U.initialize()
@@ -247,7 +258,9 @@ def learn(*,
        pi.load(load_path)

    th_init = get_flat()
+    if MPI is not None:
        MPI.COMM_WORLD.Bcast(th_init, root=0)
+
    set_from_flat(th_init)
    vfadam.sync()
    print("Init param sum", th_init.sum(), flush=True)
@@ -353,7 +366,11 @@ def learn(*,
        logger.record_tabular("ev_tdlam_before", explained_variance(vpredbefore, tdlamret))

        lrlocal = (seg["ep_lens"], seg["ep_rets"]) # local values
+        if MPI is not None:
            listoflrpairs = MPI.COMM_WORLD.allgather(lrlocal) # list of tuples
+        else:
+            listoflrpairs = [lrlocal]
+
        lens, rews = map(flatten_lists, zip(*listoflrpairs))
        lenbuffer.extend(lens)
        rewbuffer.extend(rews)
--- a/docs/viz/viz.ipynb
+++ b/docs/viz/viz.ipynb
--- a/setup.py
+++ b/setup.py
@@ -11,10 +11,14 @@ extras = {
    'test': [
        'filelock',
        'pytest',
+        'pytest-forked',
        'atari-py'
    ],
    'bullet': [
        'pybullet',
+    ],
+    'mpi': [
+        'mpi4py'
    ]
 }

@@ -34,7 +38,6 @@ setup(name='baselines',
          'joblib',
          'dill',
          'progressbar2',
-          'mpi4py',
          'cloudpickle',
          'click',
          'opencv-python'
Author	SHA1	Message	Date
Peter Zhokhov	6b41b6b984	updated links in README and notebook	2018-11-07 16:23:32 -08:00
Peter Zhokhov	9705773eab	replaced vizualization doc with notebook	2018-11-07 16:18:47 -08:00
pzhokhov	7bb405c7a7	Update viz.md	2018-11-07 14:25:35 -08:00
pzhokhov	8b95576a92	more viz + build fixes (#703 ) * viz docs * writing vizualization docs * documenting plot_util * docstrings in plot_util * autopep8 and flake8 * spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc) * rephrased viz.md a little bit * more examples of viz code usage in the docs	2018-11-06 17:02:20 -08:00
Peter Zhokhov	db9563ebf6	more examples of viz code usage in the docs	2018-11-06 15:25:17 -08:00
Peter Zhokhov	b8bc0f8791	more options in plot_util + docs + freezing build fixes	2018-11-06 14:07:53 -08:00
Peter Zhokhov	9d4fb76ef0	making num_envs and video length smaller in test_video_recorder to prevent hanging on travis	2018-11-06 09:58:43 -08:00
Peter Zhokhov	664ec6faf0	catch bugfixes in gym	2018-11-05 19:19:39 -08:00
Peter Zhokhov	3917321fbe	revert over-spellchecking	2018-11-05 17:00:40 -08:00
coord.e	6e607efa90	Add video recorder (#666 ) * Fix: Return the result of rendering from dummyvecenv * Add: Add a video recorder wrapper for vecenv * Change: Use VecVideoRecorder with --video_monitor flag * Change: Overwrite the metadata only when it isn't defined * Add: Define __del__ to make the file correctly closed in exit * Fix: Bump epidode_id in reset() * Fix: Use hasattr to check the existence of .metadata * Fix: Make directory when it doesn't exist * Change: Kepp recording for `video_length` steps, then close Because reset() is not what it is in normal gym.Env * Add: Enable to specify video_length from command line argument * Delete: Delete default value, None, of video_callable * Change: Use self.recorded_frames and self.recording to manage intervals * Add: Log the status of video recording * Fix: Fix saving path * Change: Place metadata in the base VecEnv * Delete: Delete unused imports * Fix: epidode_id => step_id * Fix: Refine the flag name * Change: Unify the flag name folloing to previous change * [WIP] Add: Add a test of VecVideoRecorder * Fix: Use PongNoFrameskip-v0 because SimpleEnv doesn't have render() * Change; Use TemporaryDirectory * Fix: minimal successful test * Add: Test against parallel environments * Add: Test against different type of VecEnvs * Change: Test against different length and interval of video capture * Delete: Reduce the number of tests * Change: Test if the output video is not empty * Add: Add some comments * Fix: Fix the flag name * Add: Add docstrings * Fix: Install ffmpeg in testing container for VecVideoRecorder's test * Fix: Delete unused things * Fix: Replace `video_callable` with `record_video_trigger` * Fix: Improve the explanation of `record_video_trigger` argument * Fix: Close owning vecenv in VecVideoRecorder.close to resolve memory leak	2018-11-05 14:32:17 -08:00
pzhokhov	c74ce02b9d	visualization code docs / bugfixes (#701 ) * viz docs * writing vizualization docs * documenting plot_util * docstrings in plot_util * autopep8 and flake8 * spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc) * rephrased viz.md a little bit	2018-11-05 14:31:15 -08:00
Peter Zhokhov	fa199534c5	rephrased viz.md a little bit	2018-11-05 14:03:13 -08:00
Peter Zhokhov	09b42f8c26	spelling (using default vim spellchecker and ingoring things like dataframe, docstring and etc)	2018-11-05 14:00:19 -08:00
Peter Zhokhov	06421877bf	autopep8 and flake8	2018-11-05 10:04:43 -08:00
Peter Zhokhov	527acf123f	docstrings in plot_util	2018-11-05 10:02:45 -08:00
Peter Zhokhov	1fc5e137b2	Merge branch 'master' of github.com:openai/baselines into peterz_viz	2018-10-31 12:03:25 -07:00
pzhokhov	ab59de6922	mpi-less baselines (#689 ) * make baselines run without mpi wip * squash-merged latest master * further removing MPI references where unnecessary * more MPI removal * syntax and flake8 * MpiAdam becomes regular Adam if Mpi not present * autopep8 * add assertion to test in mpi_adam; fix trpo_mpi failure without MPI on cartpole * mpiless ddpg	2018-10-31 11:15:41 -07:00
Mathieu Poliquin	a071fa7630	Add retro to ppo2 defaults (#682 ) * Adds retro to ppo2 defaults Created defaults for retro, copied from Atari defaults for now. Tested with SuperMarioBros-Nes * ppo2 retro defaults to atari	2018-10-30 10:17:46 -07:00
Mathieu Poliquin	637bf55da7	Use deepmind wrapper for retro (#685 ) * Use deepmind wrapper for retro * moved wrap_deepmind_retro after Monitor wrapper	2018-10-30 10:16:15 -07:00
AurelianTactics	165c622572	DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError (#680 ) * DDPG has unused 'seed' argument DeepQ, PPO2, ACER, trpo_mpi, A2C, and ACKTR have the code for: ``` from baselines.common import set_global_seeds ... def learn(...): ... set_global_seeds(seed) ``` DDPG has the argument 'seed=None' but doesn't have the two lines of code needed to set the global seeds. * DDPG: duplicate variable assignment variable nb_actions assigned same value twice in space of 10 lines nb_actions = env.action_space.shape[-1] * DDPG: noise_type 'normal_x' and 'ou_x' cause assert noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2' cause an assert message and DDPG not to run. Issue is noise following block: ''' if self.action_noise is not None and apply_noise: noise = self.action_noise() assert noise.shape == action.shape action += noise ''' noise is not nested: [number_of_actions] actions is nested: [[number_of_actions]] Can either nest noise or unnest actions * Revert "DDPG: noise_type 'normal_x' and 'ou_x' cause assert" * DDPG: noise_type 'normal_x' and 'ou_x' cause AssertionError noise_type default 'adaptive-param_0.2' works but the arguments that change from parameter noise to actor noise (like 'normal_0.2' and 'ou_0.2') cause an assert message and DDPG not to run. Issue is the following block: ''' if self.action_noise is not None and apply_noise: noise = self.action_noise() assert noise.shape == action.shape action += noise ''' noise is not nested: [number_of_actions] action is nested: [[number_of_actions]] Hence the shapes do not pass the assert line even though the action += noise line is correct	2018-10-30 10:13:39 -07:00
Peter Zhokhov	6c194a8b15	documenting plot_util	2018-10-30 09:45:51 -07:00
Peter Zhokhov	0d0701f594	writing vizualization docs	2018-10-29 16:15:42 -07:00
Peter Zhokhov	be433fdb83	viz docs	2018-10-29 15:53:50 -07:00