Updated example commands to run ppo2 (#534)

The headline mentions PPO, but the command was for A2C
2018-08-24 00:58:27 +02:00
parent cb14da96ca
commit 92b9a37257
1 changed files with 3 additions and 3 deletions
--- a/README.md
+++ b/README.md
@@ -77,14 +77,14 @@ Most of the algorithms in baselines repo are used as follows:
    python -m baselines.run --alg=<name of the algorithm> --env=<environment_id> [additional arguments]
 ```
 ### Example 1. PPO with MuJoCo Humanoid
-For instance, to train a fully-connected network controlling MuJoCo humanoid using a2c for 20M timesteps
+For instance, to train a fully-connected network controlling MuJoCo humanoid using PPO2 for 20M timesteps
 ```bash
-    python -m baselines.run --alg=a2c --env=Humanoid-v2 --network=mlp --num_timesteps=2e7
+    python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7
 ```
 Note that for mujoco environments fully-connected network is default, so we can omit `--network=mlp`
 The hyperparameters for both network and the learning algorithm can be controlled via the command line, for instance:
 ```bash
-    python -m baselines.run --alg=a2c --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --ent_coef=0.1 --num_hidden=32 --num_layers=3 --value_network=copy
+    python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --ent_coef=0.1 --num_hidden=32 --num_layers=3 --value_network=copy
 ```
 will set entropy coeffient to 0.1, and construct fully connected network with 3 layers with 32 hidden units in each, and create a separate network for value function estimation (so that its parameters are not shared with the policy network, but the structure is the same)