From e92a6ad8f4e11ee7c5128f0c9f12c2102bbe9b06 Mon Sep 17 00:00:00 2001 From: wangjksjtu Date: Tue, 28 Aug 2018 03:35:48 +0800 Subject: [PATCH] Update README.md (#537) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 1. Delete repetitive section 2. Align the commands --- README.md | 25 ++++++++----------------- 1 file changed, 8 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index 396d918..a285912 100644 --- a/README.md +++ b/README.md @@ -45,8 +45,8 @@ cd baselines ``` If using virtualenv, create a new virtualenv and activate it ```bash - virtualenv env --python=python3 - . env/bin/activate +virtualenv env --python=python3 +. env/bin/activate ``` Install baselines package ```bash @@ -62,29 +62,20 @@ pip install pytest pytest ``` -## Subpackages - -## Testing the installation -All unit tests in baselines can be run using pytest runner: -``` -pip install pytest -pytest -``` - ## Training models Most of the algorithms in baselines repo are used as follows: ```bash - python -m baselines.run --alg= --env= [additional arguments] +python -m baselines.run --alg= --env= [additional arguments] ``` ### Example 1. PPO with MuJoCo Humanoid For instance, to train a fully-connected network controlling MuJoCo humanoid using PPO2 for 20M timesteps ```bash - python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 +python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 ``` Note that for mujoco environments fully-connected network is default, so we can omit `--network=mlp` The hyperparameters for both network and the learning algorithm can be controlled via the command line, for instance: ```bash - python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --ent_coef=0.1 --num_hidden=32 --num_layers=3 --value_network=copy +python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --network=mlp --num_timesteps=2e7 --ent_coef=0.1 --num_hidden=32 --num_layers=3 --value_network=copy ``` will set entropy coeffient to 0.1, and construct fully connected network with 3 layers with 32 hidden units in each, and create a separate network for value function estimation (so that its parameters are not shared with the policy network, but the structure is the same) @@ -94,7 +85,7 @@ docstring for [baselines/ppo2/ppo2.py/learn()](ppo2/ppo2.py) fir the description ### Example 2. DQN on Atari DQN with Atari is at this point a classics of benchmarks. To run the baselines implementation of DQN on Atari Pong: ``` - python -m baselines.run --alg=deepq --env=PongNoFrameskip-v4 --num_timesteps=1e6 +python -m baselines.run --alg=deepq --env=PongNoFrameskip-v4 --num_timesteps=1e6 ``` ## Saving, loading and visualizing models @@ -102,11 +93,11 @@ The algorithms serialization API is not properly unified yet; however, there is `--save_path` and `--load_path` command-line option loads the tensorflow state from a given path before training, and saves it after the training, respectively. Let's imagine you'd like to train ppo2 on Atari Pong, save the model and then later visualize what has it learnt. ```bash - python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=2e7 --save_path=~/models/pong_20M_ppo2 +python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=2e7 --save_path=~/models/pong_20M_ppo2 ``` This should get to the mean reward per episode about 5k. To load and visualize the model, we'll do the following - load the model, train it for 0 steps, and then visualize: ```bash - python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=0 --load_path=~/models/pong_20M_ppo2 --play +python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num_timesteps=0 --load_path=~/models/pong_20M_ppo2 --play ``` *NOTE:* At the moment Mujoco training uses VecNormalize wrapper for the environment which is not being saved correctly; so loading the models trained on Mujoco will not work well if the environment is recreated. If necessary, you can work around that by replacing RunningMeanStd by TfRunningMeanStd in [baselines/common/vec_env/vec_normalize.py](baselines/common/vec_env/vec_normalize.py#L12). This way, mean and std of environment normalizing wrapper will be saved in tensorflow variables and included in the model file; however, training is slower that way - hence not including it by default