Commit Graph

95 Commits

Author SHA1 Message Date
Philippe Tillet
d9945692a9 [dnn] better specification of recompilation key 2019-08-02 17:42:48 -07:00
Philippe Tillet
bb32ac56c9 [codegen/optimize_dce.cpp] fixed bugs whereby barriers were removed by DCE 2019-07-31 15:11:10 -07:00
Philippe Tillet
dc11f70fad [dnn/blocksparse] FPROP test passes! 2019-07-29 17:06:20 -07:00
Philippe Tillet
17cb2db356 [dnn/blocksparse/dot] prototype version seems to pass basic test 2019-07-27 21:21:36 -07:00
Philippe Tillet
2a377bc8b1 [ir] deleted mask/merge instructions; will be replaced by masked_load/store and select 2019-07-25 15:06:15 -07:00
Philippe Tillet
397d76156b progress on re-association 2019-07-23 17:21:24 -07:00
Philippe Tillet
38b3771c26 some reassociation 2019-07-23 14:43:18 -07:00
Philippe Tillet
c448876178 better benchmarking 2019-07-22 19:26:12 -07:00
Philippe Tillet
b1d81a5802 more work on heuristics 2019-07-21 18:11:54 -07:00
Philippe Tillet
484e3871cf [dnn/shift] added base pointer for a, b 2019-07-20 23:00:27 -07:00
Philippe Tillet
5215fb0424 [codegen] some more optimizations 2019-07-19 20:29:03 -07:00
Philippe Tillet
71594da66f [dnn/gemm]: fixed leading dimension in transposed variants 2019-07-18 16:35:48 -07:00
Philippe Tillet
86f70f8224 [codegen/selection] performance fix-up when A is transposed for hmma 2019-07-17 21:46:23 -07:00
Philippe Tillet
2f0817b2cd [codegen/selection] tensor cores now used for transposed layotus 2019-07-17 17:20:38 -07:00
Philippe Tillet
7d1797cd32 ugh 2019-07-16 12:59:27 -07:00
Philippe Tillet
3e7a3ed67a [dnn/shift]: added support for fp16 2019-07-13 21:05:34 -07:00
Philippe Tillet
f74dcb7e30 [dnn/batchnorm]: added some more code in Triton-C batchnorm implementations 2019-07-08 20:18:20 -07:00
Philippe Tillet
fa3270dcf2 [codegen/selection] bugfix in code generation for reduction instructions 2019-07-08 18:53:37 -07:00
Philippe Tillet
f9db0449b7 [dnn] Adding batchnorm 2019-07-08 18:44:37 -07:00
Philippe Tillet
8fc253946c [codegen] shift: added sketch for shift-convolution backpropagation 2019-07-02 16:39:07 -07:00
Philippe Tillet
d8c3d58593 more optimization 2019-06-28 20:22:52 -07:00
Philippe Tillet
6300ec5080 [examples] added conv2d op in tensorflow 2019-06-26 18:50:53 -07:00
Philippe Tillet
64513fb407 [codegen] added fallback when tensor cores cannot be used 2019-06-25 15:49:58 -07:00
Philippe Tillet
62000738f0 [codegen] renamed axis_info -> alignment_info 2019-06-25 15:10:47 -07:00
Philippe Tillet
d52abc9379 [codegen] bugfix in alignment inference 2019-06-25 15:06:15 -07:00
Philippe Tillet
edc31cabb0 [codegen] rough template for axis_info pass 2019-06-24 18:57:32 -07:00
Philippe Tillet
72867d17d4 more cleaning 2019-06-24 12:37:13 -07:00
Philippe Tillet
f7dcea1187 Now doing double-buffering 2019-06-13 19:48:02 -07:00
Philippe Tillet
36e3667a9a removed shared conflicts for 8x32x4 and 32x8x4 configurations 2019-06-13 17:51:54 -07:00
Philippe Tillet
21a9b92c87 disabling interleaving 2019-06-13 17:16:00 -07:00
Philippe Tillet
d487cf31ce trying 128 bits loads 2019-06-12 21:07:01 -07:00
Philippe Tillet
1c6372711b added interleaving 2019-06-12 20:30:28 -07:00
Philippe Tillet
a6b580ec05 interleaving fails with B 2019-06-12 19:46:43 -07:00
Philippe Tillet
1b5a742a88 [triton/codegen] added shared memory padding for HMMA arguments and vectorized loads 2019-06-11 19:51:08 -07:00
Philippe Tillet
7d50b87681 [selection/codegen] bugfix in distributed tile indices initialization 2019-06-11 10:45:19 -07:00
Philippe Tillet
06b5992509 [feature] added basic tensor core support 2019-06-11 10:24:49 -07:00
Philippe Tillet
d074a166e2 [feature] basic tensor core utilization works 2019-06-08 14:39:45 -07:00
Philippe Tillet
5f3d48c1d0 [tensor cores] added basic codegen template for using wmma 2019-06-07 21:19:47 -07:00
Philippe Tillet
ec4c6aaaaa Added inline PTX for mma.sync 2019-06-07 19:39:33 -07:00
Philippe Tillet
6fce9f28ae added fragmented axis 2019-06-07 10:32:56 -07:00
Philippe Tillet
cdf5a0d011 [codegen/tune]: added fragmentation types 2019-06-06 16:48:32 -07:00
Philippe Tillet
f6fe9492e4 [dnn/conv] added triton-c code for wgrad 2019-05-11 18:09:23 -04:00
Philippe Tillet
f80441017c [codegen] added leading dimension padding for transposition in shared
memory
2019-05-06 11:53:35 -04:00
Philippe Tillet
4813bb007c [codegen] bugfix in builder insert point for predicated instructions 2019-05-04 12:09:27 -04:00
Philippe Tillet
af58b8bd81 [triton-c] predicate in assignment statement now propagates to rhs
computations
2019-04-27 14:00:15 -04:00
Philippe Tillet
4b77b764ba [triton-c] added support for while loops 2019-04-26 15:08:02 -04:00
Philippe Tillet
3413aad582 [general] major overhaul of triton-c/triton-ir/triton-jit:
- Added alloc const
- Added atomics
- Pruning tuning space
- Added example for dot/conv/shift
- Bugfixes
2019-04-25 16:18:15 -04:00
Philippe Tillet
bc2a257d5c [code generation] more flexibility in backend selection 2019-03-27 11:29:42 -07:00
Philippe Tillet
e04253c0dd [code generation] basic CPU backend 2019-03-27 11:13:36 -07:00
Philippe Tillet
9d6fc1c051 [code generation] bugfix in single buffering 2019-03-26 15:55:48 -07:00