Compare commits

...

4 Commits

Author SHA1 Message Date
Peter Zhokhov
0f8d640554 updated README files and deepq.train_cartpole example 2018-08-16 13:15:51 -07:00
Peter Zhokhov
44b91f3454 Merge branch 'master' of github.com:openai/baselines into peterz_update_READMEs 2018-08-16 12:26:51 -07:00
Peter Zhokhov
0c2a6936c4 adding a link to repo-wide README 2018-08-16 12:23:06 -07:00
Peter Zhokhov
2614f0f65a update per-algorithm READMEs to reflect new way of running algorithms 2018-08-16 12:18:06 -07:00
7 changed files with 24 additions and 31 deletions

View File

@@ -2,4 +2,5 @@
- Original paper: https://arxiv.org/abs/1602.01783 - Original paper: https://arxiv.org/abs/1602.01783
- Baselines blog post: https://blog.openai.com/baselines-acktr-a2c/ - Baselines blog post: https://blog.openai.com/baselines-acktr-a2c/
- `python -m baselines.a2c.run_atari` runs the algorithm for 40M frames = 10M timesteps on an Atari game. See help (`-h`) for more options. - `python -m baselines.run --alg=a2c --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options
- also refer to the repo-wide [README.md](../../README.md#training-models)

View File

@@ -1,4 +1,6 @@
# ACER # ACER
- Original paper: https://arxiv.org/abs/1611.01224 - Original paper: https://arxiv.org/abs/1611.01224
- `python -m baselines.acer.run_atari` runs the algorithm for 40M frames = 10M timesteps on an Atari game. See help (`-h`) for more options. - `python -m baselines.run --alg=acer --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options.
- also refer to the repo-wide [README.md](../../README.md#training-models)

View File

@@ -2,4 +2,7 @@
- Original paper: https://arxiv.org/abs/1708.05144 - Original paper: https://arxiv.org/abs/1708.05144
- Baselines blog post: https://blog.openai.com/baselines-acktr-a2c/ - Baselines blog post: https://blog.openai.com/baselines-acktr-a2c/
- `python -m baselines.acktr.run_atari` runs the algorithm for 40M frames = 10M timesteps on an Atari game. See help (`-h`) for more options. - `python -m baselines.run --alg=acktr --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options.
- also refer to the repo-wide [README.md](../../README.md#training-models)

View File

@@ -9,44 +9,29 @@ Here's a list of commands to run to quickly get a working example:
```bash ```bash
# Train model and save the results to cartpole_model.pkl # Train model and save the results to cartpole_model.pkl
python -m baselines.deepq.experiments.train_cartpole python -m baselines.run --alg=deepq --env=CartPole-v0 --save_path=./cartpole_model.pkl --num_timesteps=1e5
# Load the model saved in cartpole_model.pkl and visualize the learned policy # Load the model saved in cartpole_model.pkl and visualize the learned policy
python -m baselines.deepq.experiments.enjoy_cartpole python -m baselines.run --alg=deepq --env=CartPole-v0 --load_apth=./cartpole_model.pkl --num_timesteps=0 --play
``` ```
Be sure to check out the source code of [both](experiments/train_cartpole.py) [files](experiments/enjoy_cartpole.py)!
## If you wish to apply DQN to solve a problem. ## If you wish to apply DQN to solve a problem.
Check out our simple agent trained with one stop shop `deepq.learn` function. Check out our simple agent trained with one stop shop `deepq.learn` function.
- [baselines/deepq/experiments/train_cartpole.py](experiments/train_cartpole.py) - train a Cartpole agent. - [baselines/deepq/experiments/train_cartpole.py](experiments/train_cartpole.py) - train a Cartpole agent.
- [baselines/deepq/experiments/train_pong.py](experiments/train_pong.py) - train a Pong agent using convolutional neural networks.
In particular notice that once `deepq.learn` finishes training it returns `act` function which can be used to select actions in the environment. Once trained you can easily save it and load at later time. For both of the files listed above there are complimentary files `enjoy_cartpole.py` and `enjoy_pong.py` respectively, that load and visualize the learned policy. In particular notice that once `deepq.learn` finishes training it returns `act` function which can be used to select actions in the environment. Once trained you can easily save it and load at later time. Complimentary file `enjoy_cartpole.py` loads and visualizes the learned policy.
## If you wish to experiment with the algorithm ## If you wish to experiment with the algorithm
##### Check out the examples ##### Check out the examples
- [baselines/deepq/experiments/custom_cartpole.py](experiments/custom_cartpole.py) - Cartpole training with more fine grained control over the internals of DQN algorithm. - [baselines/deepq/experiments/custom_cartpole.py](experiments/custom_cartpole.py) - Cartpole training with more fine grained control over the internals of DQN algorithm.
- [baselines/deepq/experiments/run_atari.py](experiments/run_atari.py) - more robust setup for training at scale. - [baselines/deepq/defaults.py](defaults.py) - settings for training on atari. Run
##### Download a pretrained Atari agent
For some research projects it is sometimes useful to have an already trained agent handy. There's a variety of models to choose from. You can list them all by running:
```bash ```bash
python -m baselines.deepq.experiments.atari.download_model python -m baselines.run --alg=deepq --env=PongNoFrameskip-v4
``` ```
to train on Atari Pong (see more in repo-wide [README.md](../../README.md#training-models))
Once you pick a model, you can download it and visualize the learned policy. Be sure to pass `--dueling` flag to visualization script when using dueling models.
```bash
python -m baselines.deepq.experiments.atari.download_model --blob model-atari-duel-pong-1 --model-dir /tmp/models
python -m baselines.deepq.experiments.atari.enjoy --model-dir /tmp/models/model-atari-duel-pong-1 --env Pong --dueling
```

View File

@@ -11,12 +11,11 @@ def callback(lcl, _glb):
def main(): def main():
env = gym.make("CartPole-v0") env = gym.make("CartPole-v0")
model = deepq.models.mlp([64])
act = deepq.learn( act = deepq.learn(
env, env,
q_func=model, network='mlp',
lr=1e-3, lr=1e-3,
max_timesteps=100000, total_timesteps=100000,
buffer_size=50000, buffer_size=50000,
exploration_fraction=0.1, exploration_fraction=0.1,
exploration_final_eps=0.02, exploration_final_eps=0.02,

View File

@@ -2,5 +2,7 @@
- Original paper: https://arxiv.org/abs/1707.06347 - Original paper: https://arxiv.org/abs/1707.06347
- Baselines blog post: https://blog.openai.com/openai-baselines-ppo/ - Baselines blog post: https://blog.openai.com/openai-baselines-ppo/
- `python -m baselines.ppo2.run_atari` runs the algorithm for 40M frames = 10M timesteps on an Atari game. See help (`-h`) for more options.
- `python -m baselines.ppo2.run_mujoco` runs the algorithm for 1M frames on a Mujoco environment. - `python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options.
- `python -m baselines.run --alg=ppo2 --env=Ant-v2 --num_timesteps=1e6` runs the algorithm for 1M frames on a Mujoco Ant environment.
- also refer to the repo-wide [README.md](../../README.md#training-models)

View File

@@ -2,5 +2,6 @@
- Original paper: https://arxiv.org/abs/1502.05477 - Original paper: https://arxiv.org/abs/1502.05477
- Baselines blog post https://blog.openai.com/openai-baselines-ppo/ - Baselines blog post https://blog.openai.com/openai-baselines-ppo/
- `mpirun -np 16 python -m baselines.trpo_mpi.run_atari` runs the algorithm for 40M frames = 10M timesteps on an Atari game. See help (`-h`) for more options. - `mpirun -np 16 python -m baselines.run --alg=trpo_mpi --env=PongNoFrameskip-v4` runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (`-h`) for more options.
- `python -m baselines.trpo_mpi.run_mujoco` runs the algorithm for 1M timesteps on a Mujoco environment. - `python -m baselines.run --alg=trpo_mpi --env=Ant-v2 --num_timesteps=1e6` runs the algorithm for 1M timesteps on a Mujoco Ant environment.
- also refer to the repo-wide [README.md](../../README.md#training-models)