add code coverage report

merged benchmarks branch
dummy commit to RUN BENCHMARKS
2018-08-13 10:44:49 -07:00 · 2018-08-13 09:28:10 -07:00 · 2018-08-08 10:45:18 -07:00 · 2018-08-08 10:44:58 -07:00 · 2018-08-03 13:59:58 -07:00 · 2018-08-03 13:31:37 -07:00
15 changed files with 46 additions and 18039 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -5,6 +5,7 @@
 .pytest_cache
 .DS_Store
 .idea
+.coverage

 # Setuptools distribution and build folders.
 /dist/
--- a/README.md
+++ b/README.md
@@ -112,6 +112,10 @@ This should get to the mean reward per episode about 5k. To load and visualize t
 *NOTE:* At the moment Mujoco training uses VecNormalize wrapper for the environment which is not being saved correctly; so loading the models trained on Mujoco will not work well if the environment is recreated. If necessary, you can work around that by replacing RunningMeanStd by TfRunningMeanStd in [baselines/common/vec_env/vec_normalize.py](baselines/common/vec_env/vec_normalize.py#L12). This way, mean and std of environment normalizing wrapper will be saved in tensorflow variables and included in the model file; however, training is slower that way - hence not including it by default


+
+
+
+
 ## Subpackages

 - [A2C](baselines/a2c)
@@ -121,19 +125,10 @@ This should get to the mean reward per episode about 5k. To load and visualize t
 - [DQN](baselines/deepq)
 - [GAIL](baselines/gail)
 - [HER](baselines/her)
- [PPO1](baselines/ppo1) (obsolete version, left here temporarily)
- [PPO2](baselines/ppo2) 
+- [PPO1](baselines/ppo1) (Multi-CPU using MPI)
+- [PPO2](baselines/ppo2) (Optimized for GPU)
 - [TRPO](baselines/trpo_mpi)

-
-
-## Benchmarks
-Results of benchmarks on Mujoco (1M timesteps) and Atari (10M timesteps) are available 
-[here for Mujoco](https://htmlpreview.github.com/?https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm) 
-and
-[here for Atari](https://htmlpreview.github.com/?https://github.com/openai/baselines/blob/master/benchmarks_atari10M.htm) 
-respectively. Note that these results may be not on the latest version of the code, particular commit hash with which results were obtained is specified on the benchmarks page. 
-
 To cite this repository in publications:

    @misc{baselines,
--- a/baselines/a2c/README.md
+++ b/baselines/a2c/README.md
@@ -2,5 +2,4 @@

 - Original paper: https://arxiv.org/abs/1602.01783
 - Baselines blog post: https://blog.openai.com/baselines-acktr-a2c/
- `python -m baselines.run --alg=a2c --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options
- also refer to the repo-wide [README.md](../../README.md#training-models)
+- `python -m baselines.a2c.run_atari` runs the algorithm for 40M frames = 10M timesteps on an Atari game. See help (`-h`) for more options.
--- a/baselines/acer/README.md
+++ b/baselines/acer/README.md
@@ -1,6 +1,4 @@
 # ACER

 - Original paper: https://arxiv.org/abs/1611.01224
- `python -m baselines.run --alg=acer --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options.
- also refer to the repo-wide [README.md](../../README.md#training-models)
-
+- `python -m baselines.acer.run_atari` runs the algorithm for 40M frames = 10M timesteps on an Atari game. See help (`-h`) for more options.
--- a/baselines/acktr/README.md
+++ b/baselines/acktr/README.md
@@ -2,7 +2,4 @@

 - Original paper: https://arxiv.org/abs/1708.05144
 - Baselines blog post: https://blog.openai.com/baselines-acktr-a2c/
- `python -m baselines.run --alg=acktr --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options.
- also refer to the repo-wide [README.md](../../README.md#training-models)
-
-
+- `python -m baselines.acktr.run_atari` runs the algorithm for 40M frames = 10M timesteps on an Atari game. See help (`-h`) for more options.
--- a/baselines/deepq/README.md
+++ b/baselines/deepq/README.md
@@ -9,29 +9,44 @@ Here's a list of commands to run to quickly get a working example:

 ```bash
 # Train model and save the results to cartpole_model.pkl
-python -m baselines.run --alg=deepq --env=CartPole-v0 --save_path=./cartpole_model.pkl --num_timesteps=1e5
+python -m baselines.deepq.experiments.train_cartpole
 # Load the model saved in cartpole_model.pkl and visualize the learned policy
-python -m baselines.run --alg=deepq --env=CartPole-v0 --load_apth=./cartpole_model.pkl --num_timesteps=0 --play
+python -m baselines.deepq.experiments.enjoy_cartpole
 ```

+
+Be sure to check out the source code of [both](experiments/train_cartpole.py) [files](experiments/enjoy_cartpole.py)!
+
 ## If you wish to apply DQN to solve a problem.

 Check out our simple agent trained with one stop shop `deepq.learn` function. 

 - [baselines/deepq/experiments/train_cartpole.py](experiments/train_cartpole.py) - train a Cartpole agent.
+- [baselines/deepq/experiments/train_pong.py](experiments/train_pong.py) - train a Pong agent using convolutional neural networks.

-In particular notice that once `deepq.learn` finishes training it returns `act` function which can be used to select actions in the environment. Once trained you can easily save it and load at later time. Complimentary file `enjoy_cartpole.py` loads and visualizes the learned policy.
+In particular notice that once `deepq.learn` finishes training it returns `act` function which can be used to select actions in the environment. Once trained you can easily save it and load at later time. For both of the files listed above there are complimentary files `enjoy_cartpole.py` and `enjoy_pong.py` respectively, that load and visualize the learned policy.

 ## If you wish to experiment with the algorithm

 ##### Check out the examples

+
 - [baselines/deepq/experiments/custom_cartpole.py](experiments/custom_cartpole.py) - Cartpole training with more fine grained control over the internals of DQN algorithm.
- [baselines/deepq/defaults.py](defaults.py) - settings for training on atari. Run 
+- [baselines/deepq/experiments/atari/train.py](experiments/atari/train.py) - more robust setup for training at scale.
+
+
+##### Download a pretrained Atari agent
+
+For some research projects it is sometimes useful to have an already trained agent handy. There's a variety of models to choose from. You can list them all by running:

 ```bash
-python -m baselines.run --alg=deepq --env=PongNoFrameskip-v4 
+python -m baselines.deepq.experiments.atari.download_model
 ```
-to train on Atari Pong (see more in repo-wide [README.md](../../README.md#training-models))

+Once you pick a model, you can download it and visualize the learned policy. Be sure to pass `--dueling` flag to visualization script when using dueling models.

+```bash
+python -m baselines.deepq.experiments.atari.download_model --blob model-atari-duel-pong-1 --model-dir /tmp/models
+python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-duel-pong-1 --env Pong --dueling
+
+```
--- a/baselines/deepq/build_graph.py
+++ b/baselines/deepq/build_graph.py
@@ -309,7 +309,7 @@ def build_act_with_param_noise(make_obs_ph, q_func, num_actions, scope="deepq",
                         outputs=output_actions,
                         givens={update_eps_ph: -1.0, stochastic_ph: True, reset_ph: False, update_param_noise_threshold_ph: False, update_param_noise_scale_ph: False},
                         updates=updates)
-        def act(ob, reset=False, update_param_noise_threshold=False, update_param_noise_scale=False, stochastic=True, update_eps=-1):
+        def act(ob, reset, update_param_noise_threshold, update_param_noise_scale, stochastic=True, update_eps=-1):
            return _act(ob, stochastic, update_eps, reset, update_param_noise_threshold, update_param_noise_scale)
        return act

--- a/baselines/deepq/deepq.py
+++ b/baselines/deepq/deepq.py
@@ -27,7 +27,7 @@ class ActWrapper(object):
        self.initial_state = None

    @staticmethod
-    def load_act(path):
+    def load_act(self, path):
        with open(path, "rb") as f:
            model_data, act_params = cloudpickle.load(f)
        act = deepq.build_act(**act_params)
@@ -70,7 +70,6 @@ class ActWrapper(object):

    def save(self, path):
        save_state(path)
-        self.save_act(path+".pickle")


 def load_act(path):
@@ -195,9 +194,8 @@ def learn(env,
    # capture the shape outside the closure so that the env object is not serialized
    # by cloudpickle when serializing make_obs_ph

-    observation_space = env.observation_space
    def make_obs_ph(name):
-        return ObservationInput(observation_space, name=name)
+        return ObservationInput(env.observation_space, name=name)

    act, train, update_target, debug = deepq.build_train(
        make_obs_ph=make_obs_ph,
--- a/baselines/deepq/experiments/train_cartpole.py
+++ b/baselines/deepq/experiments/train_cartpole.py
@@ -11,11 +11,12 @@ def callback(lcl, _glb):

 def main():
    env = gym.make("CartPole-v0")
+    model = deepq.models.mlp([64])
    act = deepq.learn(
        env,
-        network='mlp',
+        q_func=model,
        lr=1e-3,
-        total_timesteps=100000,
+        max_timesteps=100000,
        buffer_size=50000,
        exploration_fraction=0.1,
        exploration_final_eps=0.02,
--- a/baselines/ppo2/README.md
+++ b/baselines/ppo2/README.md
@@ -2,7 +2,5 @@

 - Original paper: https://arxiv.org/abs/1707.06347
 - Baselines blog post: https://blog.openai.com/openai-baselines-ppo/
-
- `python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options.
- `python -m baselines.run --alg=ppo2 --env=Ant-v2 --num_timesteps=1e6` runs the algorithm for 1M frames on a Mujoco Ant environment.
- also refer to the repo-wide [README.md](../../README.md#training-models)
+- `python -m baselines.ppo2.run_atari` runs the algorithm for 40M frames = 10M timesteps on an Atari game. See help (`-h`) for more options.
+- `python -m baselines.ppo2.run_mujoco` runs the algorithm for 1M frames on a Mujoco environment.
--- a/baselines/run.py
+++ b/baselines/run.py
@@ -123,18 +123,14 @@ def build_env(args, render=False):
        env = bench.Monitor(env, logger.get_dir())
        env = retro_wrappers.wrap_deepmind_retro(env)
        
-    elif env_type == 'classic_control':
+    elif env_type == 'classic':
        def make_env():
            e = gym.make(env_id)
-            e = bench.Monitor(e, logger.get_dir(), allow_early_resets=True)
            e.seed(seed)
            return e
            
        env = DummyVecEnv([make_env])
-
-    else:
-        raise ValueError('Unknown env_type {}'.format(env_type))
-
+ 
    return env


@@ -153,7 +149,7 @@ def get_env_type(env_id):
    return env_type, env_id

 def get_default_network(env_type):
-    if env_type == 'mujoco' or env_type == 'classic_control':
+    if env_type == 'mujoco' or env_type=='classic':
        return 'mlp'
    if env_type == 'atari':
        return 'cnn'
--- a/baselines/trpo_mpi/README.md
+++ b/baselines/trpo_mpi/README.md
@@ -2,6 +2,5 @@

 - Original paper: https://arxiv.org/abs/1502.05477
 - Baselines blog post https://blog.openai.com/openai-baselines-ppo/
- `mpirun -np 16 python -m baselines.run --alg=trpo_mpi --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options.
- `python -m baselines.run --alg=trpo_mpi --env=Ant-v2 --num_timesteps=1e6` runs the algorithm for 1M timesteps on a Mujoco Ant environment. 
- also refer to the repo-wide [README.md](../../README.md#training-models)
+- `mpirun -np 16 python -m baselines.trpo_mpi.run_atari` runs the algorithm for 40M frames = 10M timesteps on an Atari game. See help (`-h`) for more options.
+- `python -m baselines.trpo_mpi.run_mujoco` runs the algorithm for 1M timesteps on a Mujoco environment.
--- a/benchmarks_atari10M.htm
+++ b/benchmarks_atari10M.htm
--- a/benchmarks_mujoco1M.htm
+++ b/benchmarks_mujoco1M.htm
--- a/setup.py
+++ b/setup.py
@@ -25,7 +25,8 @@ setup(name='baselines',
      extras_require={
        'test': [
            'filelock',
-            'pytest'
+            'pytest',
+            'pytest-cov',
        ]
      },
      description='OpenAI baselines: high quality implementations of reinforcement learning algorithms',
Author	SHA1	Message	Date
Peter Zhokhov	841da92f4d	add code coverage report	2018-08-13 10:44:49 -07:00
Peter Zhokhov	624231827c	merged benchmarks branch	2018-08-13 09:28:10 -07:00
Peter Zhokhov	1e40ec22be	dummy commit to RUN BENCHMARKS	2018-08-08 10:45:18 -07:00
Peter Zhokhov	701a36cdfa	added a note in README about TfRunningMeanStd and serialization of VecNormalize	2018-08-08 10:44:58 -07:00
Peter Zhokhov	5a7f9847d8	flake8 complaints	2018-08-03 13:59:58 -07:00
Peter Zhokhov	b63134e5c5	added acer runner (missing import)	2018-08-03 13:31:37 -07:00
Peter Zhokhov	db314cdeda	Merge branch 'peterz_profile_vec_normalize' into peterz_migrate_rlalgs	2018-08-03 11:47:36 -07:00
Peter Zhokhov	b08c083d91	use VecNormalize with regular RunningMeanStd	2018-08-03 11:44:12 -07:00
Peter Zhokhov	bfbbe66d9e	profiling wip	2018-08-02 11:23:12 -07:00
Peter Zhokhov	1c5c6563b7	reverted VecNormalize to use RunningMeanStd (no tf)	2018-08-02 10:55:09 -07:00
Peter Zhokhov	1fa8c58da5	reverted VecNormalize to use RunningMeanStd (no tf)	2018-08-02 10:54:07 -07:00
Peter Zhokhov	f6d1115ead	reverted running_mean_std to user property decorators for mean, var, count	2018-08-02 10:32:22 -07:00
Peter Zhokhov	f6d5a47bed	use ncpu=1 for mujoco sessions - gives a bit of a performance speedup	2018-08-02 10:24:21 -07:00
Peter Zhokhov	c2df27bee4	non-tf normalization benchmark RUN BENCHMARKS	2018-08-02 09:41:41 -07:00
Peter Zhokhov	974c15756e	changed default ppo2 lr schedule to linear RUN BENCHMARKS	2018-08-01 16:24:44 -07:00
Peter Zhokhov	ad43fd9a35	add defaults	2018-08-01 16:15:59 -07:00
Peter Zhokhov	72c357c638	hardcode names of retro environments	2018-08-01 15:18:59 -07:00
Peter Zhokhov	e00e5ca016	run ppo_mpi benchmarks only RUN BENCHMARKS	2018-08-01 14:56:08 -07:00
Peter Zhokhov	705797f2f0	Merge branch 'peterz_migrate_rlalgs' into peterz_benchmarks	2018-08-01 14:46:40 -07:00
Peter Zhokhov	fcd84aa831	make_atari_env compatible with mpi	2018-08-01 14:46:18 -07:00
Peter Zhokhov	390b51597a	benchmarks on ppo2 only RUN BENCHMARKS	2018-08-01 11:01:50 -07:00
Peter Zhokhov	95104a3592	Merge branch 'peterz_migrate_rlalgs' into peterz_benchmarks	2018-08-01 10:50:29 -07:00
Peter Zhokhov	3528f7b992	save all variables to make sure we save the vec_normalize normalization	2018-08-01 10:12:19 -07:00
Peter Zhokhov	151e48009e	flake8 complaints	2018-07-31 16:25:12 -07:00
Peter Zhokhov	92f33335e9	dummy commit to RUN BENCHMARKS	2018-07-31 15:53:18 -07:00
Peter Zhokhov	af729cff15	dummy commit to RUN BENCHMARKS	2018-07-31 15:37:00 -07:00
Peter Zhokhov	10f815fe1d	fixed import in vec_normalize	2018-07-31 15:19:43 -07:00
Peter Zhokhov	8c4adac898	running_mean_std uses tensorflow variables	2018-07-31 14:45:55 -07:00
Peter Zhokhov	2a93ea8782	serialize variables as a dict, not as a list	2018-07-31 11:13:31 -07:00
Peter Zhokhov	9c48f9fad5	very dummy commit to RUN BENCHMARKS	2018-07-31 10:23:43 -07:00
Peter Zhokhov	348cbb4b71	dummy commit to RUN BENCHMARKS	2018-07-31 09:42:23 -07:00
Peter Zhokhov	a1602ab15f	dummy commit to RUN BENCHMARKS	2018-07-30 17:51:16 -07:00
Peter Zhokhov	e63e69bb14	dummy commit to RUN BENCHMARKS	2018-07-30 17:39:22 -07:00
Peter Zhokhov	385e7e5c0d	dummy commit to RUN BENCHMARKS	2018-07-30 17:21:05 -07:00
Peter Zhokhov	d112a2e49f	added approximate humanoid reward with ppo2 into the README for reference	2018-07-30 16:58:31 -07:00
Peter Zhokhov	e662dd6409	run.py can run algos from both baselines and rl_algs	2018-07-30 16:09:48 -07:00
Peter Zhokhov	efc6bffce3	replaced atari_arg_parser with common_arg_parser	2018-07-30 15:58:56 -07:00
Peter Zhokhov	872181d4c3	re-exported rl_algs - fixed problems with serialization test and test_cartpole	2018-07-30 15:49:48 -07:00
Peter Zhokhov	628ddecf6a	re-exported rl_algs	2018-07-30 12:15:46 -07:00
peter	83a4a4be65	run slow tests	2018-07-26 14:39:25 -07:00
peter	7edac38c73	more stuff from rl-algs	2018-07-26 14:26:57 -07:00
peter	a6dca44115	exported rl-algs	2018-07-26 14:02:04 -07:00