added approximate humanoid reward with ppo2 into the README for reference

2018-07-30 16:58:31 -07:00
parent e662dd6409
commit d112a2e49f
1 changed files with 3 additions and 3 deletions
--- a/README.md
+++ b/README.md
@@ -79,7 +79,7 @@ Most of the algorithms in baselines repo are used as follows:
 ### Example 1. PPO with MuJoCo Humanoid
 For instance, to train a fully-connected network controlling MuJoCo humanoid using a2c for 20M timesteps
 ```bash
-    python -m baselines.run --alg=a2c --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --reward_scale=0.1
+    python -m baselines.run --alg=a2c --env=Humanoid-v2 --network=mlp --num_timesteps=2e7
 ```
 Note that for mujoco environments fully-connected network is default, so we can omit `--network=mlp`
 The hyperparameters for both network and the learning algorithm can be controlled via the command line, for instance:
@@ -104,7 +104,7 @@ Let's imagine you'd like to train ppo2 on MuJoCo humanoid, save the model and th
 ```bash
    python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --num-timesteps=2e7 --save_path=~/models/humanoid_20M_ppo2
 ```
-To load and visualize the model, we'll do the following - load the model, train it for 0 steps, and then visualize: 
+This should get to the mean reward per episode about 5k. To load and visualize the model, we'll do the following - load the model, train it for 0 steps, and then visualize: 
 ```bash
    python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --num-timesteps=0 --load_path=~/models/humanoid_20M_ppo2 --play
 ```