Commit Graph

146 Commits

Author SHA1 Message Date
Philippe Tillet
7b75b68edc dirty but working warp-splitting 2019-08-06 21:07:13 -07:00
Philippe Tillet
494bfa7671 didn't break correctness of existing HMMA 2019-08-06 17:34:00 -07:00
Philippe Tillet
cf256a636c fixup 2019-08-06 16:44:16 -07:00
Philippe Tillet
5efdb7978e more improvements and regressions 2019-08-06 16:21:20 -07:00
Philippe Tillet
26c9849462 [ir][instructions] added permutations option for trans 2019-08-05 21:19:13 -07:00
Philippe Tillet
d62e581ab3 basic split-k across warps working for GEMM 2019-08-05 19:33:28 -07:00
Philippe Tillet
d869d9a924 [codegen][selection] more flexible instruction selection for reduce_inst 2019-08-04 16:34:36 -07:00
Philippe Tillet
6be532c6a2 [codegen][selection] adding support for reduction along arbitrary axis 2019-08-02 21:29:36 -07:00
Philippe Tillet
d9945692a9 [dnn] better specification of recompilation key 2019-08-02 17:42:48 -07:00
Philippe Tillet
3b92ddf7e6 [codegen/reassociation] now recursively takes pointer arguments into account as well 2019-07-31 18:41:56 -07:00
Philippe Tillet
f7bd976fc7 [dnn/blocksparse] added heuristics for block-sparse dot 2019-07-31 17:12:36 -07:00
Philippe Tillet
bb32ac56c9 [codegen/optimize_dce.cpp] fixed bugs whereby barriers were removed by DCE 2019-07-31 15:11:10 -07:00
Philippe Tillet
080bf1af88 [dnn/blocksparse/dot]: BlocksparseDx also working 2019-07-30 11:42:31 -07:00
Philippe Tillet
dc11f70fad [dnn/blocksparse] FPROP test passes! 2019-07-29 17:06:20 -07:00
Philippe Tillet
17cb2db356 [dnn/blocksparse/dot] prototype version seems to pass basic test 2019-07-27 21:21:36 -07:00
Philippe Tillet
2a377bc8b1 [ir] deleted mask/merge instructions; will be replaced by masked_load/store and select 2019-07-25 15:06:15 -07:00
Philippe Tillet
6ce82dfcdb FINALLY 2019-07-23 22:19:57 -07:00
Philippe Tillet
b7fadb9986 more stuff 2019-07-23 21:22:47 -07:00
Philippe Tillet
397d76156b progress on re-association 2019-07-23 17:21:24 -07:00
Philippe Tillet
38b3771c26 some reassociation 2019-07-23 14:43:18 -07:00
Philippe Tillet
c448876178 better benchmarking 2019-07-22 19:26:12 -07:00
Philippe Tillet
b1d81a5802 more work on heuristics 2019-07-21 18:11:54 -07:00
Philippe Tillet
484e3871cf [dnn/shift] added base pointer for a, b 2019-07-20 23:00:27 -07:00
Philippe Tillet
d159455f7b [codegen/alignment_info] better alignment information 2019-07-20 21:44:18 -07:00
Philippe Tillet
28c250216c [dnn/gemm] added some bounds checking 2019-07-19 21:32:55 -07:00
Philippe Tillet
5215fb0424 [codegen] some more optimizations 2019-07-19 20:29:03 -07:00
Philippe Tillet
71594da66f [dnn/gemm]: fixed leading dimension in transposed variants 2019-07-18 16:35:48 -07:00
Philippe Tillet
f0d8306437 [codegen/alignment_info] better handling of constants 2019-07-18 16:12:06 -07:00
Philippe Tillet
86f70f8224 [codegen/selection] performance fix-up when A is transposed for hmma 2019-07-17 21:46:23 -07:00
Philippe Tillet
2f0817b2cd [codegen/selection] tensor cores now used for transposed layotus 2019-07-17 17:20:38 -07:00
Philippe Tillet
bfa39b8992 preparing the field for tensor cores transposes 2019-07-17 13:20:33 -07:00
Philippe Tillet
791c91ee63 [dnn/shift] bugfix in static shape division 2019-07-17 11:39:17 -07:00
Philippe Tillet
a55b098e88 [dnn/shift] now using constant divisions 2019-07-16 21:05:21 -07:00
Philippe Tillet
07c964919c [dnn/shift] now strictly only shifting the interior 2019-07-16 20:18:48 -07:00
Philippe Tillet
ec24e1e7df trying to remove interior logic 2019-07-16 18:47:50 -07:00
Philippe Tillet
5f6dd23fc2 [dnn/dot] reverted back to peak tensorcores performance 2019-07-16 16:14:58 -07:00
Philippe Tillet
28959fe165 [runtime/jit] made auto-tuning silent 2019-07-16 14:41:38 -07:00
Philippe Tillet
7d1797cd32 ugh 2019-07-16 12:59:27 -07:00
Philippe Tillet
3e7a3ed67a [dnn/shift]: added support for fp16 2019-07-13 21:05:34 -07:00
Philippe Tillet
f74dcb7e30 [dnn/batchnorm]: added some more code in Triton-C batchnorm implementations 2019-07-08 20:18:20 -07:00
Philippe Tillet
fa3270dcf2 [codegen/selection] bugfix in code generation for reduction instructions 2019-07-08 18:53:37 -07:00
Philippe Tillet
f9db0449b7 [dnn] Adding batchnorm 2019-07-08 18:44:37 -07:00
Philippe Tillet
8fc253946c [codegen] shift: added sketch for shift-convolution backpropagation 2019-07-02 16:39:07 -07:00
Philippe Tillet
6cfb575d29 [lang] fixup in cast type 2019-06-30 17:43:18 -07:00
Philippe Tillet
c172bd518b more stuff 2019-06-30 16:55:02 -07:00
Philippe Tillet
9a86bc51e1 [language] added alignment metadata for variables 2019-06-29 13:58:46 -07:00
Philippe Tillet
d8c3d58593 more optimization 2019-06-28 20:22:52 -07:00
Philippe Tillet
f4dedb522c fixup 2019-06-27 17:05:48 -07:00
Philippe Tillet
6300ec5080 [examples] added conv2d op in tensorflow 2019-06-26 18:50:53 -07:00
Philippe Tillet
25e9a10917 changed auto-tuner parameter ranges 2019-06-25 19:27:49 -07:00