Commit Graph

64 Commits

Author SHA1 Message Date
andrew
00573cf5e9 add x, y axis name 2017-12-12 18:54:03 -08:00
andrew
11604f7cc9 add download link to readme and add description to python file 2017-12-07 12:08:20 -08:00
Andrew
000033973b Update gail-result.md 2017-12-03 15:50:24 -08:00
andrew
6090ee8292 add comparison for expert/BC/gail 2017-12-03 15:46:52 -08:00
andrew
7954327c5f add behavior cloning learn/eval code 2017-12-03 13:55:44 -08:00
andrew
8495890534 add gail, file_writer for tf.summary, and allow specifying var_list for tf.train.Saver 2017-12-03 01:49:42 -08:00
John Schulman
b05be68c55 add missing files, fix Issue #209 2017-11-16 22:14:30 -08:00
John Schulman
2dd7d307d7 Add ACER, PPO2, and results_plotter.py 2017-11-16 10:02:32 -08:00
John Schulman
6a3cbb4bc5 switch append mode to write mode 2017-10-25 22:20:30 -04:00
John Schulman
bb40378118 change atari preprocessing to use faster opencv
some logger changes
2017-10-25 09:21:29 -04:00
John Schulman
4993286230 Merge pull request #160 from mkarutz/fixFrameStackingA2C
Fixes frame stacking in A2C and ACKTR for multi-channel observations
2017-10-09 14:12:28 -07:00
Malcolm Karutz
cc8818f49e Fixes frame stacking in A2C and ACKTR for multi-channel observation spaces. 2017-10-09 13:08:41 +11:00
John Schulman
3eb71a0ece Merge pull request #151 from emansim/master
Fixes the NaN issues in ACKTR + bug in run_mujoco.py
2017-09-30 14:51:56 -07:00
Elman Mansimov
f8663eaf11 fixes acktr_cont issues 2017-09-30 17:21:04 -04:00
John Schulman
699919f1cf Merge pull request #64 from jhumplik/master
Use standardized advantages in trpo.
2017-09-07 01:57:04 -07:00
John Schulman
498b4cfead Merge pull request #128 from louiehelm/louiehelm-patch-1
Fix command lines
2017-09-06 01:04:47 -07:00
Louie Helm
589387403b fix ppo command in readme 2017-09-05 06:06:19 -07:00
Louie Helm
3d3ea6cb16 fix trpo command in readme 2017-09-05 06:04:37 -07:00
John Schulman
902ffcb767 Merge pull request #120 from hamzamerzic/tensorflow_global_variable
Deprecated VARIABLES -> GLOBAL_VARIABLES.
2017-08-28 21:27:23 -07:00
Hamza Merzic
a7320b80c0 Deprecated VARIABLES -> GLOBAL_VARIABLES. 2017-08-28 16:51:48 +02:00
John Schulman
4e2a570eb4 Merge pull request #104 from stevenschmatz/patch-1
Fix relative links in README.md
2017-08-27 22:54:52 -07:00
John Schulman
6f39148452 fix gym req 2017-08-27 22:49:50 -07:00
John Schulman
2f30833043 Merge branch 'master' of github.com:openai/baselines 2017-08-27 22:36:44 -07:00
John Schulman
00cdeff35e add __init__.py 2017-08-27 22:36:24 -07:00
John Schulman
410ef38898 Merge pull request #103 from learnercys/master
Adding links to source files
2017-08-27 22:31:46 -07:00
John Schulman
aa6e58bdf1 fix readmes 2017-08-27 22:22:14 -07:00
John Schulman
d9f194f797 Fix atari wrapper (affecting a2c perf) and pposgd mujoco performance
- removed vf clipping in pposgd - that was severely degrading performance on mujoco because it didn’t account for scale of returns
- switched adam epsilon in pposgd_simple
- brought back no-ops in atari wrapper (oops)
- added readmes
- revamped run_X_benchmark scripts to have standard form
- cleaned up DDPG a little, removed deprecated SimpleMonitor and non-idiomatic usage of logger
2017-08-27 22:14:59 -07:00
Steven Schmatz
06b071c105 Fix relative links in README.md 2017-08-18 13:35:22 -04:00
John Schulman
3f676f7d1e ACKTR + A2C 2017-08-18 09:25:39 -07:00
Carlos Hernandez
b7966b31a5 Adding links to source files 2017-08-18 01:16:00 -06:00
Matthias Plappert
882251878f Parameter space noise for DQN and DDPG (#75)
* Export param noise

* Update documentation

* Final finishing touches
2017-07-27 08:10:59 -07:00
Jan Humplik
4862140cea Use standardized advantages in trpo. 2017-07-23 22:42:55 +02:00
Peter Welinder
df82a15fd3 Fix broken links in DQN readme 2017-07-23 09:58:10 -07:00
Jonas Schneider
5dc00628fe readme fiddling 2017-07-20 09:00:24 -07:00
John Schulman
79b4a8a88e Merge pull request #60 from openai/ppo-trpo
ppo and trpo
2017-07-20 08:55:43 -07:00
John Schulman
da99706046 ppo and trpo 2017-07-20 08:52:35 -07:00
Szymon Sidor
80f94f8ec5 bump version 2017-07-12 14:48:05 -07:00
Szymon Sidor
2b1b437908 Update simple.py 2017-07-12 23:42:36 +02:00
Szymon Sidor
04cd0dcf64 Merge pull request #52 from farbeiza/patch-1
Effectively apply weights from the replay buffer
2017-07-12 23:37:28 +02:00
Szymon Sidor
248aad1c3b Merge pull request #39 from mirceamironenco/master
Fix TF graph variables deprecation
2017-07-12 23:32:24 +02:00
Fernando Arbeiza
d76cd1297a Effectively apply weights from the replay buffer
It seems that the weights retrieved from the replay buffer are not applied when training the model. Is there any reason for that or am I missing something?

In any case, I have added a parameter in order for them to be used; just in case it is useful.
2017-07-11 11:09:51 +02:00
MironencoMircea
91b10857d8 Fixed TF graph variables deprecation 2017-06-28 15:48:45 +02:00
Szymon Sidor
0778e9f10f Merge pull request #28 from zach-nervana/patch-1
remove unnecessary initialization of variable resized_screen
2017-06-23 17:05:25 -07:00
Szymon Sidor
59c7887e6b Merge pull request #26 from LinZichuan/master
Update setup.py
2017-06-23 17:02:05 -07:00
Szymon Sidor
3d235ae7b8 Merge pull request #33 from cxxgtxy/master
Fix README since BreakOut pretrained model doesn't match the correct …
2017-06-23 16:59:55 -07:00
cxx
5e73387494 Fix README since BreakOut pretrained model doesn't match the correct tensor shape. Therefore, Pong is used instead. 2017-06-16 15:38:42 +08:00
Zach Dwiel
ec38bf460e remove unnecessary initialization of variable resized_screen 2017-06-09 08:53:10 -04:00
Zichuan Lin
ef1a2402fc Update setup.py 2017-06-07 17:29:38 +08:00
Szymon Sidor
184440ffd3 Merge pull request #22 from ngc92/doc_fixes
docstring and comment fixes
2017-06-04 00:41:34 -07:00
Szymon Sidor
fba0ac30ca Merge pull request #15 from tiagosgc/patch-1
Update README.md
2017-06-04 00:40:58 -07:00