Atari10M Comparison

algo user mean Enduro SpaceInvaders Qbert Seaquest Pong BeamRider Breakout commit
ppo2
       
cron 1509.38 350.22 557.28 7012.06 1218.87 13.68 1299.25 114.26 7bfbcf1
deepq
       
cron 1012.69 479.75 459.86 3254.83 1164.08 16.49 1582.34 131.46 7bfbcf1
acktr
       
cron 1211.71 0.0 557.19 4429.3 1201.16 9.56 2171.19 113.58 7bfbcf1
acer
       
cron 1457.36 0.0 656.91 6433.38 1065.98 3.11 1959.22 82.94 7bfbcf1
ppo2_mpi
       
cron 1417.92 207.47 459.89 7184.73 1383.38 13.9 594.45 81.61 7bfbcf1
a2c
       
cron 717.73 0.0 463.06 2047.07 1150.66 1.0 1302.61 59.72 7bfbcf1
trpo_mpi
       
cron 625.67 24.83 457.7 2486.18 710.07 2.82 683.11 14.98 7bfbcf1

Learning Curves

X-axis: timesteps, Y-axis: reward (avg. over 6 runs) 0.0 0.2 0.4 0.6 0.8 1.0 1e7 2500 5000 BeamRider 0.0 0.2 0.4 0.6 0.8 1.0 1e7 0 200 400 Breakout 0.0 0.2 0.4 0.6 0.8 1.0 1e7 0 500 Enduro 0.0 0.2 0.4 0.6 0.8 1.0 1e7 −20 0 20 Pong 0.0 0.2 0.4 0.6 0.8 1.0 1e7 0 10000 Qbert 0.0 0.2 0.4 0.6 0.8 1.0 1e7 0 1000 2000 Seaquest 0.0 0.2 0.4 0.6 0.8 1.0 1e7 500 1000 SpaceInvaders