baselines

Files

pzhokhov 858afa8d7e Refactor DDPG (#111 )

* run ddpg on Mujoco benchmark RUN BENCHMARKS

* autopep8

* fixed all syntax in refactored ddpg

* a little bit more refactoring

* autopep8

* identity test with ddpg WIP

* enable test_identity with ddpg

* refactored ddpg RUN BENCHMARKS

* autopep8

* include ddpg into style check

* fixing tests RUN BENCHMARKS

* set default seed to None RUN BENCHMARKS

* run tests and benchmarks in separate buildkite steps RUN BENCHMARKS

* cleanup pdb usage

* flake8 and cleanups

* re-enabled all benchmarks in run-benchmarks-new.py

* flake8 complaints

* deepq model builder compatible with network functions returning single tensor

* remove ddpg test with test_discrete_identity

* make ppo_metal use make_vec_env instead of make_atari_env

* make ppo_metal use make_vec_env instead of make_atari_env

* fixed syntax in ppo_metal.run_atari

2018-10-03 14:38:32 -07:00

experiments

continuous action spaces for codegen + some benchmarking (#82 )

2018-09-14 15:43:49 -07:00

__init__.py

refactor a2c, acer, acktr, ppo2, deepq, and trpo_mpi (#490 )

2018-08-13 09:56:44 -07:00

build_graph.py

Several bugfixes for #504 , #505 , #506 related to Classic Control and deepq (#507 )

2018-08-16 12:08:53 -07:00

deepq.py

tighten flake8, autopep8 to fix trailing whitespaces and blank lines with whitespaces (#87 )

2018-09-11 13:18:43 -07:00

defaults.py

refactor a2c, acer, acktr, ppo2, deepq, and trpo_mpi (#490 )

2018-08-13 09:56:44 -07:00

models.py

Refactor DDPG (#111 )

2018-10-03 14:38:32 -07:00

README.md

deduplicate algorithms in rl-algs and baselines (#18 )

2018-08-17 13:54:11 -07:00

replay_buffer.py

prioritized experience replay bug (#527 )

2018-09-20 16:16:44 -07:00

utils.py

fix DQN learning bug (#632 )

2018-10-03 14:37:40 -07:00

README.md

If you are curious.

Train a Cartpole agent and watch it play once it converges!

Here's a list of commands to run to quickly get a working example:

# Train model and save the results to cartpole_model.pkl
python -m baselines.run --alg=deepq --env=CartPole-v0 --save_path=./cartpole_model.pkl --num_timesteps=1e5
# Load the model saved in cartpole_model.pkl and visualize the learned policy
python -m baselines.run --alg=deepq --env=CartPole-v0 --load_path=./cartpole_model.pkl --num_timesteps=0 --play

If you wish to apply DQN to solve a problem.

Check out our simple agent trained with one stop shop deepq.learn function.

baselines/deepq/experiments/train_cartpole.py - train a Cartpole agent.

In particular notice that once deepq.learn finishes training it returns act function which can be used to select actions in the environment. Once trained you can easily save it and load at later time. Complimentary file enjoy_cartpole.py loads and visualizes the learned policy.

If you wish to experiment with the algorithm

Check out the examples

baselines/deepq/experiments/custom_cartpole.py - Cartpole training with more fine grained control over the internals of DQN algorithm.
baselines/deepq/defaults.py - settings for training on atari. Run

python -m baselines.run --alg=deepq --env=PongNoFrameskip-v4

to train on Atari Pong (see more in repo-wide README.md)