Compare commits
49 Commits
peterz_ubu
...
peterz_ben
Author | SHA1 | Date | |
---|---|---|---|
|
ea68f3b7e6 | ||
|
ca721a4be6 | ||
|
72f3572a10 | ||
|
b9cd941471 | ||
|
0899b71ede | ||
|
cc8c9541fb | ||
|
cb32522394 | ||
|
1e40ec22be | ||
|
701a36cdfa | ||
|
5a7f9847d8 | ||
|
b63134e5c5 | ||
|
db314cdeda | ||
|
b08c083d91 | ||
|
bfbbe66d9e | ||
|
1c5c6563b7 | ||
|
1fa8c58da5 | ||
|
f6d1115ead | ||
|
f6d5a47bed | ||
|
c2df27bee4 | ||
|
974c15756e | ||
|
ad43fd9a35 | ||
|
72c357c638 | ||
|
e00e5ca016 | ||
|
705797f2f0 | ||
|
fcd84aa831 | ||
|
390b51597a | ||
|
95104a3592 | ||
|
3528f7b992 | ||
|
151e48009e | ||
|
92f33335e9 | ||
|
af729cff15 | ||
|
10f815fe1d | ||
|
8c4adac898 | ||
|
2a93ea8782 | ||
|
9c48f9fad5 | ||
|
348cbb4b71 | ||
|
a1602ab15f | ||
|
e63e69bb14 | ||
|
385e7e5c0d | ||
|
d112a2e49f | ||
|
e662dd6409 | ||
|
efc6bffce3 | ||
|
872181d4c3 | ||
|
628ddecf6a | ||
|
83a4a4be65 | ||
|
7edac38c73 | ||
|
a6dca44115 | ||
|
622915c473 | ||
|
a1d3c18ec0 |
@@ -1,4 +1,4 @@
|
||||
FROM ubuntu:18.04
|
||||
FROM ubuntu:16.04
|
||||
|
||||
RUN apt-get -y update && apt-get -y install git wget python-dev python3-dev libopenmpi-dev python-pip zlib1g-dev cmake python-opencv
|
||||
ENV CODE_DIR /root/code
|
||||
|
17
README.md
17
README.md
@@ -112,6 +112,10 @@ This should get to the mean reward per episode about 5k. To load and visualize t
|
||||
*NOTE:* At the moment Mujoco training uses VecNormalize wrapper for the environment which is not being saved correctly; so loading the models trained on Mujoco will not work well if the environment is recreated. If necessary, you can work around that by replacing RunningMeanStd by TfRunningMeanStd in [baselines/common/vec_env/vec_normalize.py](baselines/common/vec_env/vec_normalize.py#L12). This way, mean and std of environment normalizing wrapper will be saved in tensorflow variables and included in the model file; however, training is slower that way - hence not including it by default
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
## Subpackages
|
||||
|
||||
- [A2C](baselines/a2c)
|
||||
@@ -121,19 +125,10 @@ This should get to the mean reward per episode about 5k. To load and visualize t
|
||||
- [DQN](baselines/deepq)
|
||||
- [GAIL](baselines/gail)
|
||||
- [HER](baselines/her)
|
||||
- [PPO1](baselines/ppo1) (obsolete version, left here temporarily)
|
||||
- [PPO2](baselines/ppo2)
|
||||
- [PPO1](baselines/ppo1) (Multi-CPU using MPI)
|
||||
- [PPO2](baselines/ppo2) (Optimized for GPU)
|
||||
- [TRPO](baselines/trpo_mpi)
|
||||
|
||||
|
||||
|
||||
## Benchmarks
|
||||
Results of benchmarks on Mujoco (1M timesteps) and Atari (10M timesteps) are available
|
||||
[here for Mujoco](https://htmlpreview.github.com/?https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm)
|
||||
and
|
||||
[here for Atari](https://htmlpreview.github.com/?https://github.com/openai/baselines/blob/master/benchmarks_atari10M.htm)
|
||||
respectively. Note that these results may be not on the latest version of the code, particular commit hash with which results were obtained is specified on the benchmarks page.
|
||||
|
||||
To cite this repository in publications:
|
||||
|
||||
@misc{baselines,
|
||||
|
@@ -32,7 +32,7 @@ In particular notice that once `deepq.learn` finishes training it returns `act`
|
||||
|
||||
|
||||
- [baselines/deepq/experiments/custom_cartpole.py](experiments/custom_cartpole.py) - Cartpole training with more fine grained control over the internals of DQN algorithm.
|
||||
- [baselines/deepq/experiments/run_atari.py](experiments/run_atari.py) - more robust setup for training at scale.
|
||||
- [baselines/deepq/experiments/atari/train.py](experiments/atari/train.py) - more robust setup for training at scale.
|
||||
|
||||
|
||||
##### Download a pretrained Atari agent
|
||||
|
12351
benchmarks_atari10M.htm
12351
benchmarks_atari10M.htm
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user