* update per-algorithm READMEs to reflect new way of running algorithms
* adding a link to repo-wide README
* updated README files and deepq.train_cartpole example
python -m baselines.run --alg=acktr --env=PongNoFrameskip-v4 runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (-h) for more options.