Commit Graph

24 Commits

Author SHA1 Message Date
Jie Tang
77568accd7 Thread episode lengths through when scoring, add tests 2017-02-13 12:29:11 -08:00
Jie Tang
5ca80a3141 Fix format of solves for TotalReward 2016-12-21 20:52:38 -08:00
Jie Tang
b33fc9fb85 Fix a bug when the benchmark result is empty 2016-12-13 22:05:29 -08:00
Jie Tang
54ead345dc Bunch of refactoring since TotalReward and RewardPerTime scoring are quite
similar
2016-12-13 22:01:50 -08:00
Jie Tang
f63bb2e1aa Add RewardPerTime scoring function and tests 2016-12-13 22:01:01 -08:00
Jie Tang
5dba36c68d Fix benchmark score compute when a monitor file is empty 2016-11-30 22:38:11 -08:00
Jie Tang
c3283adda0 Fix broken benchmark scoring when handling eval episodes, add a test 2016-10-28 11:48:49 -07:00
Jie Tang
9255e2264c Bring back total environment wall time 2016-10-27 22:49:16 -07:00
Jie Tang
1cc33eb081 Fix bug shadowing initial timesteps, update tests 2016-10-27 22:25:54 -07:00
Jie Tang
271ef783c6 Fix fencepost error in scoring, make unit test actually catch this 2016-10-27 21:35:23 -07:00
Jie Tang
9244bd5001 Properly compute total time 2016-10-27 20:22:49 -07:00
Jie Tang
44ce715dfa Add total reward scoring, tests, propagate solved 2016-10-27 20:22:26 -07:00
Jie Tang
6037456a14 Comment scoring rule 2016-10-27 20:22:22 -07:00
Jie Tang
71af1191e0 Fix some bugs with new partial benchmark scoring 2016-10-27 12:09:49 -07:00
Jie Tang
f7a45f6953 py2 numerical compatibility 2016-10-26 16:57:26 -07:00
Jie Tang
3c341c279d Move / rename benchmark scoring function 2016-10-25 21:55:54 -07:00
Jie Tang
53cde23ece Fix bug in max_seconds scoring. Refactor null_score, add tests for it all 2016-10-25 21:55:54 -07:00
Jie Tang
859144868f Implement benchmark scoring on gym side 2016-10-25 21:55:50 -07:00
Jie Tang
bee6be5632 Typo in source indexes 2016-10-20 22:57:33 -07:00
Jie Tang
2dba05ac0a Minor bug computing sources 2016-10-20 22:50:13 -07:00
Greg Brockman
88f94587a2 Update benchmark spec (#385)
* Update benchmark spec

* Update format of benchmark again

* Add support for max_seconds to benchmark

* Bump version
2016-10-20 17:25:29 -07:00
Greg Brockman
45038020ae Assign floor for any missing episodes 2016-09-23 02:08:11 -07:00
Greg Brockman
2b3f965faa Fix scoring when fewer episodes are provided 2016-09-23 01:47:42 -07:00
Greg Brockman
934b2acbb7 Add benchmark support (#338)
* Warn if seed doesn't return a list

* Add preliminary BenchmarkRun support

* Add experimental benchmark registration

* Flesh out interface

* Add preliminary BenchmarkRun support

* Warn if seed doesn't return a list

* Add experimental benchmark registration

* Flesh out interface

* Make benchmarkrun upload recursive

* Add evaluation episodes

* Add benchmark scoring

* Tweak reward locations

* Tweak scoring

* Clear default metadata in Wrapper

* Improve scoring

* Expose registry; fix test

* Add initial_reset_timestamp

* Add back algorithm; fix tests
2016-09-23 01:04:26 -07:00