Philippe Tillet
|
26c9849462
|
[ir][instructions] added permutations option for trans
|
2019-08-05 21:19:13 -07:00 |
|
Philippe Tillet
|
d62e581ab3
|
basic split-k across warps working for GEMM
|
2019-08-05 19:33:28 -07:00 |
|
Philippe Tillet
|
d869d9a924
|
[codegen][selection] more flexible instruction selection for reduce_inst
|
2019-08-04 16:34:36 -07:00 |
|
Philippe Tillet
|
6be532c6a2
|
[codegen][selection] adding support for reduction along arbitrary axis
|
2019-08-02 21:29:36 -07:00 |
|
Philippe Tillet
|
d9945692a9
|
[dnn] better specification of recompilation key
|
2019-08-02 17:42:48 -07:00 |
|
Philippe Tillet
|
3b92ddf7e6
|
[codegen/reassociation] now recursively takes pointer arguments into account as well
|
2019-07-31 18:41:56 -07:00 |
|
Philippe Tillet
|
f7bd976fc7
|
[dnn/blocksparse] added heuristics for block-sparse dot
|
2019-07-31 17:12:36 -07:00 |
|
Philippe Tillet
|
bb32ac56c9
|
[codegen/optimize_dce.cpp] fixed bugs whereby barriers were removed by DCE
|
2019-07-31 15:11:10 -07:00 |
|
Philippe Tillet
|
080bf1af88
|
[dnn/blocksparse/dot]: BlocksparseDx also working
|
2019-07-30 11:42:31 -07:00 |
|
Philippe Tillet
|
17cb2db356
|
[dnn/blocksparse/dot] prototype version seems to pass basic test
|
2019-07-27 21:21:36 -07:00 |
|
Philippe Tillet
|
2a377bc8b1
|
[ir] deleted mask/merge instructions; will be replaced by masked_load/store and select
|
2019-07-25 15:06:15 -07:00 |
|
Philippe Tillet
|
397d76156b
|
progress on re-association
|
2019-07-23 17:21:24 -07:00 |
|
Philippe Tillet
|
c448876178
|
better benchmarking
|
2019-07-22 19:26:12 -07:00 |
|
Philippe Tillet
|
b1d81a5802
|
more work on heuristics
|
2019-07-21 18:11:54 -07:00 |
|
Philippe Tillet
|
d159455f7b
|
[codegen/alignment_info] better alignment information
|
2019-07-20 21:44:18 -07:00 |
|
Philippe Tillet
|
28c250216c
|
[dnn/gemm] added some bounds checking
|
2019-07-19 21:32:55 -07:00 |
|
Philippe Tillet
|
5215fb0424
|
[codegen] some more optimizations
|
2019-07-19 20:29:03 -07:00 |
|
Philippe Tillet
|
f0d8306437
|
[codegen/alignment_info] better handling of constants
|
2019-07-18 16:12:06 -07:00 |
|
Philippe Tillet
|
86f70f8224
|
[codegen/selection] performance fix-up when A is transposed for hmma
|
2019-07-17 21:46:23 -07:00 |
|
Philippe Tillet
|
2f0817b2cd
|
[codegen/selection] tensor cores now used for transposed layotus
|
2019-07-17 17:20:38 -07:00 |
|
Philippe Tillet
|
bfa39b8992
|
preparing the field for tensor cores transposes
|
2019-07-17 13:20:33 -07:00 |
|
Philippe Tillet
|
791c91ee63
|
[dnn/shift] bugfix in static shape division
|
2019-07-17 11:39:17 -07:00 |
|
Philippe Tillet
|
a55b098e88
|
[dnn/shift] now using constant divisions
|
2019-07-16 21:05:21 -07:00 |
|
Philippe Tillet
|
07c964919c
|
[dnn/shift] now strictly only shifting the interior
|
2019-07-16 20:18:48 -07:00 |
|
Philippe Tillet
|
ec24e1e7df
|
trying to remove interior logic
|
2019-07-16 18:47:50 -07:00 |
|
Philippe Tillet
|
5f6dd23fc2
|
[dnn/dot] reverted back to peak tensorcores performance
|
2019-07-16 16:14:58 -07:00 |
|
Philippe Tillet
|
28959fe165
|
[runtime/jit] made auto-tuning silent
|
2019-07-16 14:41:38 -07:00 |
|
Philippe Tillet
|
f9db0449b7
|
[dnn] Adding batchnorm
|
2019-07-08 18:44:37 -07:00 |
|
Philippe Tillet
|
8fc253946c
|
[codegen] shift: added sketch for shift-convolution backpropagation
|
2019-07-02 16:39:07 -07:00 |
|
Philippe Tillet
|
6cfb575d29
|
[lang] fixup in cast type
|
2019-06-30 17:43:18 -07:00 |
|
Philippe Tillet
|
c172bd518b
|
more stuff
|
2019-06-30 16:55:02 -07:00 |
|
Philippe Tillet
|
d8c3d58593
|
more optimization
|
2019-06-28 20:22:52 -07:00 |
|
Philippe Tillet
|
f4dedb522c
|
fixup
|
2019-06-27 17:05:48 -07:00 |
|
Philippe Tillet
|
6300ec5080
|
[examples] added conv2d op in tensorflow
|
2019-06-26 18:50:53 -07:00 |
|
Philippe Tillet
|
25e9a10917
|
changed auto-tuner parameter ranges
|
2019-06-25 19:27:49 -07:00 |
|
Philippe Tillet
|
d945ce5e1b
|
Now showing valid parameter for NN
|
2019-06-25 19:18:43 -07:00 |
|
Philippe Tillet
|
64513fb407
|
[codegen] added fallback when tensor cores cannot be used
|
2019-06-25 15:49:58 -07:00 |
|
Philippe Tillet
|
06b5992509
|
[feature] added basic tensor core support
|
2019-06-11 10:24:49 -07:00 |
|
Philippe Tillet
|
6045209d5b
|
Now find correct tuning configuration
|
2019-06-06 20:13:26 -07:00 |
|
Philippe Tillet
|
0a0b48e9a2
|
adding hmma tuning parameters
|
2019-06-06 19:51:02 -07:00 |
|
Philippe Tillet
|
81eba3e1ec
|
ugh
|
2019-06-06 19:36:41 -07:00 |
|
Philippe Tillet
|
cdf5a0d011
|
[codegen/tune]: added fragmentation types
|
2019-06-06 16:48:32 -07:00 |
|
Philippe Tillet
|
30833c18f1
|
[codegen/tune] bugfix in heuristics for nano-tile sizes
|
2019-05-04 01:32:34 -04:00 |
|
Philippe Tillet
|
3413aad582
|
[general] major overhaul of triton-c/triton-ir/triton-jit:
- Added alloc const
- Added atomics
- Pruning tuning space
- Added example for dot/conv/shift
- Bugfixes
|
2019-04-25 16:18:15 -04:00 |
|
Philippe Tillet
|
0c607c9392
|
[examples] normalize benchmark by max_clock / current_clock
|
2019-03-28 07:58:37 -04:00 |
|
Philippe Tillet
|
2c3ae0675e
|
[JIT] re-added nvidia compatibility
|
2019-03-27 21:12:01 -04:00 |
|
Philippe Tillet
|
fdf8559806
|
[general] added missing files
|
2019-03-27 20:01:35 -04:00 |
|
Philippe Tillet
|
bc2a257d5c
|
[code generation] more flexibility in backend selection
|
2019-03-27 11:29:42 -07:00 |
|
Philippe Tillet
|
e04253c0dd
|
[code generation] basic CPU backend
|
2019-03-27 11:13:36 -07:00 |
|
Philippe Tillet
|
8d35c98920
|
[code generation] search space pruning
|
2019-03-25 14:10:24 -07:00 |
|