added a note in README about TfRunningMeanStd and serialization of VecNormalize
This commit is contained in:
@@ -100,15 +100,17 @@ DQN with Atari is at this point a classics of benchmarks. To run the baselines i
|
|||||||
## Saving, loading and visualizing models
|
## Saving, loading and visualizing models
|
||||||
The algorithms serialization API is not properly unified yet; however, there is a simple method to save / restore trained models.
|
The algorithms serialization API is not properly unified yet; however, there is a simple method to save / restore trained models.
|
||||||
`--save_path` and `--load_path` command-line option loads the tensorflow state from a given path before training, and saves it after the training, respectively.
|
`--save_path` and `--load_path` command-line option loads the tensorflow state from a given path before training, and saves it after the training, respectively.
|
||||||
Let's imagine you'd like to train ppo2 on MuJoCo humanoid, save the model and then later visualize what has it learnt.
|
Let's imagine you'd like to train ppo2 on Atari Pong, save the model and then later visualize what has it learnt.
|
||||||
```bash
|
```bash
|
||||||
python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --num-timesteps=2e7 --save_path=~/models/humanoid_20M_ppo2
|
python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num-timesteps=2e7 --save_path=~/models/pong_20M_ppo2
|
||||||
```
|
```
|
||||||
This should get to the mean reward per episode about 5k. To load and visualize the model, we'll do the following - load the model, train it for 0 steps, and then visualize:
|
This should get to the mean reward per episode about 5k. To load and visualize the model, we'll do the following - load the model, train it for 0 steps, and then visualize:
|
||||||
```bash
|
```bash
|
||||||
python -m baselines.run --alg=ppo2 --env=Humanoid-v2 --num-timesteps=0 --load_path=~/models/humanoid_20M_ppo2 --play
|
python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4 --num-timesteps=0 --load_path=~/models/pong_20M_ppo2 --play
|
||||||
```
|
```
|
||||||
|
|
||||||
|
*NOTE:* At the moment Mujoco training uses VecNormalize wrapper for the environment which is not being saved correctly; so loading the models trained on Mujoco will not work well if the environment is recreated. If necessary, you can work around that by replacing RunningMeanStd by TfRunningMeanStd in [baselines/common/vec_env/vec_normalize.py](baselines/common/vec_env/vec_normalize.py#L12). This way, mean and std of environment normalizing wrapper will be saved in tensorflow variables and included in the model file; however, training is slower that way - hence not including it by default
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user