From 0f8d64055453318eee1d875617c365cc43cb96d8 Mon Sep 17 00:00:00 2001
From: Peter Zhokhov <peterz@openai.com>
Date: Thu, 16 Aug 2018 13:15:51 -0700
Subject: [PATCH] updated README files and deepq.train_cartpole example

---
 baselines/a2c/README.md                       |  2 +-
 baselines/acer/README.md                      |  2 ++
 baselines/acktr/README.md                     |  3 +++
 baselines/deepq/README.md                     | 27 +++++--------------
 baselines/deepq/experiments/train_cartpole.py |  5 ++--
 baselines/ppo2/README.md                      |  1 +
 baselines/trpo_mpi/README.md                  |  1 +
 7 files changed, 16 insertions(+), 25 deletions(-)

diff --git a/baselines/a2c/README.md b/baselines/a2c/README.md
index b35f675..915852b 100644
--- a/baselines/a2c/README.md
+++ b/baselines/a2c/README.md
@@ -3,4 +3,4 @@
 - Original paper: https://arxiv.org/abs/1602.01783
 - Baselines blog post: https://blog.openai.com/baselines-acktr-a2c/
 - `python -m baselines.run --alg=a2c --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options
-- also refer to the repo-wide [README.md](../../README.md#training_models)
+- also refer to the repo-wide [README.md](../../README.md#training-models)
diff --git a/baselines/acer/README.md b/baselines/acer/README.md
index 33e24ff..d1ef98c 100644
--- a/baselines/acer/README.md
+++ b/baselines/acer/README.md
@@ -2,3 +2,5 @@
 
 - Original paper: https://arxiv.org/abs/1611.01224
 - `python -m baselines.run --alg=acer --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options.
+- also refer to the repo-wide [README.md](../../README.md#training-models)
+
diff --git a/baselines/acktr/README.md b/baselines/acktr/README.md
index 0458c5a..93692e8 100644
--- a/baselines/acktr/README.md
+++ b/baselines/acktr/README.md
@@ -3,3 +3,6 @@
 - Original paper: https://arxiv.org/abs/1708.05144
 - Baselines blog post: https://blog.openai.com/baselines-acktr-a2c/
 - `python -m baselines.run --alg=acktr --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options.
+- also refer to the repo-wide [README.md](../../README.md#training-models)
+
+
diff --git a/baselines/deepq/README.md b/baselines/deepq/README.md
index 8fa19ab..7b18c90 100644
--- a/baselines/deepq/README.md
+++ b/baselines/deepq/README.md
@@ -9,44 +9,29 @@ Here's a list of commands to run to quickly get a working example:
 
 ```bash
 # Train model and save the results to cartpole_model.pkl
-python -m baselines.deepq.experiments.train_cartpole
+python -m baselines.run --alg=deepq --env=CartPole-v0 --save_path=./cartpole_model.pkl --num_timesteps=1e5
 # Load the model saved in cartpole_model.pkl and visualize the learned policy
-python -m baselines.deepq.experiments.enjoy_cartpole
+python -m baselines.run --alg=deepq --env=CartPole-v0 --load_apth=./cartpole_model.pkl --num_timesteps=0 --play
 ```
 
-
-Be sure to check out the source code of [both](experiments/train_cartpole.py) [files](experiments/enjoy_cartpole.py)!
-
 ## If you wish to apply DQN to solve a problem.
 
 Check out our simple agent trained with one stop shop `deepq.learn` function. 
 
 - [baselines/deepq/experiments/train_cartpole.py](experiments/train_cartpole.py) - train a Cartpole agent.
-- [baselines/deepq/experiments/train_pong.py](experiments/train_pong.py) - train a Pong agent using convolutional neural networks.
 
-In particular notice that once `deepq.learn` finishes training it returns `act` function which can be used to select actions in the environment. Once trained you can easily save it and load at later time. For both of the files listed above there are complimentary files `enjoy_cartpole.py` and `enjoy_pong.py` respectively, that load and visualize the learned policy.
+In particular notice that once `deepq.learn` finishes training it returns `act` function which can be used to select actions in the environment. Once trained you can easily save it and load at later time. Complimentary file `enjoy_cartpole.py` loads and visualizes the learned policy.
 
 ## If you wish to experiment with the algorithm
 
 ##### Check out the examples
 
-
 - [baselines/deepq/experiments/custom_cartpole.py](experiments/custom_cartpole.py) - Cartpole training with more fine grained control over the internals of DQN algorithm.
-- [baselines/deepq/experiments/run_atari.py](experiments/run_atari.py) - more robust setup for training at scale.
-
-
-##### Download a pretrained Atari agent
-
-For some research projects it is sometimes useful to have an already trained agent handy. There's a variety of models to choose from. You can list them all by running:
+- [baselines/deepq/defaults.py](defaults.py) - settings for training on atari. Run 
 
 ```bash
-python -m baselines.deepq.experiments.atari.download_model
+python -m baselines.run --alg=deepq --env=PongNoFrameskip-v4 
 ```
+to train on Atari Pong (see more in repo-wide [README.md](../../README.md#training-models))
 
-Once you pick a model, you can download it and visualize the learned policy. Be sure to pass `--dueling` flag to visualization script when using dueling models.
 
-```bash
-python -m baselines.deepq.experiments.atari.download_model --blob model-atari-duel-pong-1 --model-dir /tmp/models
-python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-duel-pong-1 --env Pong --dueling
-
-```
diff --git a/baselines/deepq/experiments/train_cartpole.py b/baselines/deepq/experiments/train_cartpole.py
index a50c242..cfbbdc9 100644
--- a/baselines/deepq/experiments/train_cartpole.py
+++ b/baselines/deepq/experiments/train_cartpole.py
@@ -11,12 +11,11 @@ def callback(lcl, _glb):
 
 def main():
     env = gym.make("CartPole-v0")
-    model = deepq.models.mlp([64])
     act = deepq.learn(
         env,
-        q_func=model,
+        network='mlp',
         lr=1e-3,
-        max_timesteps=100000,
+        total_timesteps=100000,
         buffer_size=50000,
         exploration_fraction=0.1,
         exploration_final_eps=0.02,
diff --git a/baselines/ppo2/README.md b/baselines/ppo2/README.md
index fd2c139..4d431bc 100644
--- a/baselines/ppo2/README.md
+++ b/baselines/ppo2/README.md
@@ -5,3 +5,4 @@
 
 - `python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options.
 - `python -m baselines.run --alg=ppo2 --env=Ant-v2 --num_timesteps=1e6` runs the algorithm for 1M frames on a Mujoco Ant environment.
+- also refer to the repo-wide [README.md](../../README.md#training-models)
diff --git a/baselines/trpo_mpi/README.md b/baselines/trpo_mpi/README.md
index 75cf841..4cdbb5a 100644
--- a/baselines/trpo_mpi/README.md
+++ b/baselines/trpo_mpi/README.md
@@ -4,3 +4,4 @@
 - Baselines blog post https://blog.openai.com/openai-baselines-ppo/
 - `mpirun -np 16 python -m baselines.run --alg=trpo_mpi --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options.
 - `python -m baselines.run --alg=trpo_mpi --env=Ant-v2 --num_timesteps=1e6` runs the algorithm for 1M timesteps on a Mujoco Ant environment. 
+- also refer to the repo-wide [README.md](../../README.md#training-models)