Commit Graph

56 Commits

Author SHA1 Message Date
Jonathan Raiman
a4fba209c4 remove ref to simple_bench + rank variable not used [mpi gone] 2017-10-25 14:08:01 -07:00
John Schulman
bb40378118 change atari preprocessing to use faster opencv
some logger changes
2017-10-25 09:21:29 -04:00
John Schulman
4993286230 Merge pull request #160 from mkarutz/fixFrameStackingA2C
Fixes frame stacking in A2C and ACKTR for multi-channel observations
2017-10-09 14:12:28 -07:00
Malcolm Karutz
cc8818f49e Fixes frame stacking in A2C and ACKTR for multi-channel observation spaces. 2017-10-09 13:08:41 +11:00
John Schulman
3eb71a0ece Merge pull request #151 from emansim/master
Fixes the NaN issues in ACKTR + bug in run_mujoco.py
2017-09-30 14:51:56 -07:00
Elman Mansimov
f8663eaf11 fixes acktr_cont issues 2017-09-30 17:21:04 -04:00
John Schulman
699919f1cf Merge pull request #64 from jhumplik/master
Use standardized advantages in trpo.
2017-09-07 01:57:04 -07:00
John Schulman
498b4cfead Merge pull request #128 from louiehelm/louiehelm-patch-1
Fix command lines
2017-09-06 01:04:47 -07:00
Louie Helm
589387403b fix ppo command in readme 2017-09-05 06:06:19 -07:00
Louie Helm
3d3ea6cb16 fix trpo command in readme 2017-09-05 06:04:37 -07:00
John Schulman
902ffcb767 Merge pull request #120 from hamzamerzic/tensorflow_global_variable
Deprecated VARIABLES -> GLOBAL_VARIABLES.
2017-08-28 21:27:23 -07:00
Hamza Merzic
a7320b80c0 Deprecated VARIABLES -> GLOBAL_VARIABLES. 2017-08-28 16:51:48 +02:00
John Schulman
4e2a570eb4 Merge pull request #104 from stevenschmatz/patch-1
Fix relative links in README.md
2017-08-27 22:54:52 -07:00
John Schulman
6f39148452 fix gym req 2017-08-27 22:49:50 -07:00
John Schulman
2f30833043 Merge branch 'master' of github.com:openai/baselines 2017-08-27 22:36:44 -07:00
John Schulman
00cdeff35e add __init__.py 2017-08-27 22:36:24 -07:00
John Schulman
410ef38898 Merge pull request #103 from learnercys/master
Adding links to source files
2017-08-27 22:31:46 -07:00
John Schulman
aa6e58bdf1 fix readmes 2017-08-27 22:22:14 -07:00
John Schulman
d9f194f797 Fix atari wrapper (affecting a2c perf) and pposgd mujoco performance
- removed vf clipping in pposgd - that was severely degrading performance on mujoco because it didn’t account for scale of returns
- switched adam epsilon in pposgd_simple
- brought back no-ops in atari wrapper (oops)
- added readmes
- revamped run_X_benchmark scripts to have standard form
- cleaned up DDPG a little, removed deprecated SimpleMonitor and non-idiomatic usage of logger
2017-08-27 22:14:59 -07:00
Steven Schmatz
06b071c105 Fix relative links in README.md 2017-08-18 13:35:22 -04:00
John Schulman
3f676f7d1e ACKTR + A2C 2017-08-18 09:25:39 -07:00
Carlos Hernandez
b7966b31a5 Adding links to source files 2017-08-18 01:16:00 -06:00
Matthias Plappert
882251878f Parameter space noise for DQN and DDPG (#75)
* Export param noise

* Update documentation

* Final finishing touches
2017-07-27 08:10:59 -07:00
Jan Humplik
4862140cea Use standardized advantages in trpo. 2017-07-23 22:42:55 +02:00
Peter Welinder
df82a15fd3 Fix broken links in DQN readme 2017-07-23 09:58:10 -07:00
Jonas Schneider
5dc00628fe readme fiddling 2017-07-20 09:00:24 -07:00
John Schulman
79b4a8a88e Merge pull request #60 from openai/ppo-trpo
ppo and trpo
2017-07-20 08:55:43 -07:00
John Schulman
da99706046 ppo and trpo 2017-07-20 08:52:35 -07:00
Szymon Sidor
80f94f8ec5 bump version 2017-07-12 14:48:05 -07:00
Szymon Sidor
2b1b437908 Update simple.py 2017-07-12 23:42:36 +02:00
Szymon Sidor
04cd0dcf64 Merge pull request #52 from farbeiza/patch-1
Effectively apply weights from the replay buffer
2017-07-12 23:37:28 +02:00
Szymon Sidor
248aad1c3b Merge pull request #39 from mirceamironenco/master
Fix TF graph variables deprecation
2017-07-12 23:32:24 +02:00
Fernando Arbeiza
d76cd1297a Effectively apply weights from the replay buffer
It seems that the weights retrieved from the replay buffer are not applied when training the model. Is there any reason for that or am I missing something?

In any case, I have added a parameter in order for them to be used; just in case it is useful.
2017-07-11 11:09:51 +02:00
MironencoMircea
91b10857d8 Fixed TF graph variables deprecation 2017-06-28 15:48:45 +02:00
Szymon Sidor
0778e9f10f Merge pull request #28 from zach-nervana/patch-1
remove unnecessary initialization of variable resized_screen
2017-06-23 17:05:25 -07:00
Szymon Sidor
59c7887e6b Merge pull request #26 from LinZichuan/master
Update setup.py
2017-06-23 17:02:05 -07:00
Szymon Sidor
3d235ae7b8 Merge pull request #33 from cxxgtxy/master
Fix README since BreakOut pretrained model doesn't match the correct …
2017-06-23 16:59:55 -07:00
cxx
5e73387494 Fix README since BreakOut pretrained model doesn't match the correct tensor shape. Therefore, Pong is used instead. 2017-06-16 15:38:42 +08:00
Zach Dwiel
ec38bf460e remove unnecessary initialization of variable resized_screen 2017-06-09 08:53:10 -04:00
Zichuan Lin
ef1a2402fc Update setup.py 2017-06-07 17:29:38 +08:00
Szymon Sidor
184440ffd3 Merge pull request #22 from ngc92/doc_fixes
docstring and comment fixes
2017-06-04 00:41:34 -07:00
Szymon Sidor
fba0ac30ca Merge pull request #15 from tiagosgc/patch-1
Update README.md
2017-06-04 00:40:58 -07:00
Szymon Sidor
584261a94a Merge pull request #14 from quanvuong/master
Consistent initial type (float) for episode_rewards
2017-06-04 00:40:42 -07:00
Szymon Sidor
9c10c2fc27 Merge pull request #13 from ppwwyyxx/patch-1
Update setup.py
2017-06-04 00:40:31 -07:00
ngc92
02919483f2 docstring and comment fixes 2017-06-02 01:43:51 +02:00
Tiago Carvalho
1f3c3e33e7 Update README.md 2017-05-31 12:14:28 +01:00
Quan Vuong
86054f7a98 Consistent initial type (float) for episode_rewards 2017-05-30 11:49:25 +08:00
Yuxin Wu
709c327c40 Update setup.py
`PongNoFrameskip-v4` seems to require `gym>=0.9.1`
2017-05-29 19:39:25 -07:00
Szymon Sidor
fc2bbed4da Merge pull request #11 from yenchenlin/fix-typo
Fix typos
2017-05-28 12:56:46 -07:00
YenChenLin
4fd1d21845 Fix typo 2017-05-28 13:13:47 -04:00