Atari10M Comparison

algo user mean Enduro SpaceInvaders Qbert Seaquest Pong BeamRider Breakout commit
ppo2
       
cron 1509.38 350.22 557.28 7012.06 1218.87 13.68 1299.25 114.26 cbd21ef
deepq
       
cron 1012.69 479.75 459.86 3254.83 1164.08 16.49 1582.34 131.46 cbd21ef
acktr
       
cron 1211.71 0.0 557.19 4429.3 1201.16 9.56 2171.19 113.58 cbd21ef
acer
       
cron 1457.36 0.0 656.91 6433.38 1065.98 3.11 1959.22 82.94 cbd21ef
ppo2_mpi
       
cron 1417.92 207.47 459.89 7184.73 1383.38 13.9 594.45 81.61 cbd21ef
a2c
       
cron 717.73 0.0 463.06 2047.07 1150.66 1.0 1302.61 59.72 cbd21ef
trpo_mpi
       
cron 625.67 24.83 457.7 2486.18 710.07 2.82 683.11 14.98 cbd21ef

Learning Curves

X-axis: timesteps, Y-axis: reward (avg. over 6 runs) 0.0 0.2 0.4 0.6 0.8 1.0 1e7 2500 5000 BeamRider 0.0 0.2 0.4 0.6 0.8 1.0 1e7 0 200 400 Breakout 0.0 0.2 0.4 0.6 0.8 1.0 1e7 0 500 Enduro 0.0 0.2 0.4 0.6 0.8 1.0 1e7 −20 0 20 Pong 0.0 0.2 0.4 0.6 0.8 1.0 1e7 0 10000 Qbert 0.0 0.2 0.4 0.6 0.8 1.0 1e7 0 1000 2000 Seaquest 0.0 0.2 0.4 0.6 0.8 1.0 1e7 500 1000 SpaceInvaders