* implement pdfromlatent in BernoulliPdType
* remove env.close() at the end of algorithms
* test case for environment after learn
* closing env in run.py
* fixes for acktr and trpo_mpi
* add make_session with new graph for every call in test_env_after_learn
* remove extra prints from test_env_after_learn
python -m baselines.run --alg=a2c --env=PongNoFrameskip-v4 runs the algorithm for 40M frames = 10M timesteps on an Atari Pong. See help (-h) for more options