From 1870685071b9bc2f76036c4c1ef194ca10306a3e Mon Sep 17 00:00:00 2001 From: pzhokhov Date: Mon, 13 Aug 2018 15:59:43 -0700 Subject: [PATCH] Publish benchmark results (#502) * updated benchmark pages with final rewards * use htmlpreview to render pages * use htmlpreview to render pages * use htmlpreview to render pages * updated README to reflect ppo1 being obsolete * removed navbars from published benchmark pages * fixed link in README --- README.md | 17 +- benchmarks_atari10M.htm | 12351 ++++++++++++++++++++++++++++++++++++++ benchmarks_mujoco1M.htm | 5640 +++++++++++++++++ 3 files changed, 18002 insertions(+), 6 deletions(-) create mode 100644 benchmarks_atari10M.htm create mode 100644 benchmarks_mujoco1M.htm diff --git a/README.md b/README.md index 92b5a5a..ae6e10f 100644 --- a/README.md +++ b/README.md @@ -112,10 +112,6 @@ This should get to the mean reward per episode about 5k. To load and visualize t *NOTE:* At the moment Mujoco training uses VecNormalize wrapper for the environment which is not being saved correctly; so loading the models trained on Mujoco will not work well if the environment is recreated. If necessary, you can work around that by replacing RunningMeanStd by TfRunningMeanStd in [baselines/common/vec_env/vec_normalize.py](baselines/common/vec_env/vec_normalize.py#L12). This way, mean and std of environment normalizing wrapper will be saved in tensorflow variables and included in the model file; however, training is slower that way - hence not including it by default - - - - ## Subpackages - [A2C](baselines/a2c) @@ -125,10 +121,19 @@ This should get to the mean reward per episode about 5k. To load and visualize t - [DQN](baselines/deepq) - [GAIL](baselines/gail) - [HER](baselines/her) -- [PPO1](baselines/ppo1) (Multi-CPU using MPI) -- [PPO2](baselines/ppo2) (Optimized for GPU) +- [PPO1](baselines/ppo1) (obsolete version, left here temporarily) +- [PPO2](baselines/ppo2) - [TRPO](baselines/trpo_mpi) + + +## Benchmarks +Results of benchmarks on Mujoco (1M timesteps) and Atari (10M timesteps) are available +[here for Mujoco](https://htmlpreview.github.com/?https://github.com/openai/baselines/blob/peterz_publish_benchmark_results/benchmarks_mujoco1M.htm) +and +[here for Atari](https://htmlpreview.github.com/?https://github.com/openai/baselines/blob/peterz_publish_benchmark_results/benchmarks_atari10M.htm) +respectively. Note that these results may be not on the latest version of the code, particular commit hash with which results were obtained is specified on the benchmarks page. + To cite this repository in publications: @misc{baselines, diff --git a/benchmarks_atari10M.htm b/benchmarks_atari10M.htm new file mode 100644 index 0000000..2d18869 --- /dev/null +++ b/benchmarks_atari10M.htm @@ -0,0 +1,12351 @@ + + + + + + + + bench viewer + + + + + + + + + + + + + + + +
+ +
+

Atari10M Comparison

+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
bmrun________________________ usermeanBreakoutSeaquestEnduroSpaceInvadersQbertPongBeamRidercommit
+ +
+ ppo2 +
+
       
cron2782.3236.91505.4686.19959.514234.7520.391832.95ea68f3b
+ +
+ acer +
+
       
cron3550.11439.331733.130.01382.5316234.7520.035040.98ea68f3b
+ +
+ a2c +
+
       
cron1386.22289.91737.20.0727.324461.2918.652469.15ea68f3b
+ +
+ trpo_mpi +
+
       
cron786.7918.0834.4737.3548.833285.6216.95766.38ea68f3b
+ +
+ deepq +
+
       
cron745.271.931139.222.2483.351010.79-7.212566.6ea68f3b
+ + +
+

Learning Curves

+ X-axis: timesteps + Y-axis: Reward (avg. 6 seeds) +
+ + + + + + + + + + + + + + + + + + + + + + + 0.0 + + + + + + + + 0.2 + + + + + + + + 0.4 + + + + + + + + 0.6 + + + + + + + + 0.8 + + + + + + + + 1.0 + + + + 1e7 + + + + + + + + + 2000 + + + + + + + + 4000 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + BeamRider + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0.0 + + + + + + + + 0.2 + + + + + + + + 0.4 + + + + + + + + 0.6 + + + + + + + + 0.8 + + + + + + + + 1.0 + + + + 1e7 + + + + + + + + + 0 + + + + + + + + 200 + + + + + + + + 400 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Breakout + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0.0 + + + + + + + + 0.2 + + + + + + + + 0.4 + + + + + + + + 0.6 + + + + + + + + 0.8 + + + + + + + + 1.0 + + + + 1e7 + + + + + + + + + 0 + + + + + + + + 500 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Enduro + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0.0 + + + + + + + + 0.2 + + + + + + + + 0.4 + + + + + + + + 0.6 + + + + + + + + 0.8 + + + + + + + + 1.0 + + + + 1e7 + + + + + + + + + −20 + + + + + + + + 0 + + + + + + + + 20 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Pong + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0.0 + + + + + + + + 0.2 + + + + + + + + 0.4 + + + + + + + + 0.6 + + + + + + + + 0.8 + + + + + + + + 1.0 + + + + 1e7 + + + + + + + + + 0 + + + + + + + + 10000 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Qbert + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0.0 + + + + + + + + 0.2 + + + + + + + + 0.4 + + + + + + + + 0.6 + + + + + + + + 0.8 + + + + + + + + 1.0 + + + + 1e7 + + + + + + + + + 0 + + + + + + + + 500 + + + + + + + + 1000 + + + + + + + + 1500 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Seaquest + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0.0 + + + + + + + + 0.2 + + + + + + + + 0.4 + + + + + + + + 0.6 + + + + + + + + 0.8 + + + + + + + + 1.0 + + + + 1e7 + + + + + + + + + 500 + + + + + + + + 1000 + + + + + + + + 1500 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + SpaceInvaders + + + + + + + + + + + +
+ +
+ + + + + diff --git a/benchmarks_mujoco1M.htm b/benchmarks_mujoco1M.htm new file mode 100644 index 0000000..2c33267 --- /dev/null +++ b/benchmarks_mujoco1M.htm @@ -0,0 +1,5640 @@ + + + + + + + + bench viewer + + + + + + + + + + + + + + + +
+ +
+

Mujoco1M Comparison

+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
bmrun________________________ usermeanHalfCheetahHopperInvertedPendulumSwimmerInvertedDoublePendulumReacherWalker2dcommit
+ +
+ trpo_mpi +
+
       
cron1896.011289.71912.9905.194.966731.63-4.822342.63ea68f3b
+ +
+ ppo2 +
+
       
cron2203.791668.582316.16809.43111.197102.91-6.713424.95ea68f3b
+ + +
+

Learning Curves

+ X-axis: timesteps + Y-axis: Reward (avg. 6 seeds) +
+ + + + + + + + + + + + + + + + + + + + + + + 0 + + + + + + + + 200000 + + + + + + + + 400000 + + + + + + + + 600000 + + + + + + + + 800000 + + + + + + + + 1000000 + + + + + + + + + + 0 + + + + + + + + 1000 + + + + + + + + 2000 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + HalfCheetah + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0 + + + + + + + + 200000 + + + + + + + + 400000 + + + + + + + + 600000 + + + + + + + + 800000 + + + + + + + + 1000000 + + + + + + + + + + 0 + + + + + + + + 1000 + + + + + + + + 2000 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Hopper + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0 + + + + + + + + 200000 + + + + + + + + 400000 + + + + + + + + 600000 + + + + + + + + 800000 + + + + + + + + 1000000 + + + + + + + + + + 0 + + + + + + + + 5000 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + InvertedDoublePendulum + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0 + + + + + + + + 200000 + + + + + + + + 400000 + + + + + + + + 600000 + + + + + + + + 800000 + + + + + + + + 1000000 + + + + + + + + + + 0 + + + + + + + + 500 + + + + + + + + 1000 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + InvertedPendulum + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0 + + + + + + + + 200000 + + + + + + + + 400000 + + + + + + + + 600000 + + + + + + + + 800000 + + + + + + + + 1000000 + + + + + + + + + + −75 + + + + + + + + −50 + + + + + + + + −25 + + + + + + + + 0 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Reacher + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0 + + + + + + + + 200000 + + + + + + + + 400000 + + + + + + + + 600000 + + + + + + + + 800000 + + + + + + + + 1000000 + + + + + + + + + + 50 + + + + + + + + 100 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Swimmer + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 0 + + + + + + + + 200000 + + + + + + + + 400000 + + + + + + + + 600000 + + + + + + + + 800000 + + + + + + + + 1000000 + + + + + + + + + + 0 + + + + + + + + 2000 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Walker2d + + + + + + + + + + + +
+ +
+ + + + +