Commit Graph

451 Commits

Author SHA1 Message Date
Phil Tillet
c98c889d7f . 2023-01-09 19:08:51 -08:00
Phil Tillet
fc1007278d . 2023-01-09 18:45:44 -08:00
Phil Tillet
0c101e0c33 . 2023-01-09 16:30:28 -08:00
Phil Tillet
3fefcd78d4 . 2023-01-09 16:29:45 -08:00
Phil Tillet
137e866bd2 more work 2023-01-09 16:20:10 -08:00
Phil Tillet
8ebb593bbb more work 2023-01-09 15:45:06 -08:00
Phil Tillet
6c750b6856 Added verifier for trans 2023-01-08 14:29:17 -08:00
Phil Tillet
42421fabc5 . 2023-01-06 20:35:57 -08:00
Phil Tillet
600bcefb12 more optimizations 2023-01-06 20:27:49 -08:00
Philippe Tillet
18c7a72973 more pass template 2023-01-06 14:26:06 -08:00
Phil Tillet
a81345f7c1 SinkConversionsFromShared template 2023-01-06 13:01:08 -08:00
Philippe Tillet
874ee11ab5 More optimizations 2023-01-06 11:04:20 -08:00
Philippe Tillet
e6f1a9ad34 commenting dq but not load/store 2023-01-05 23:25:41 -08:00
Philippe Tillet
6f997f4ecb dq now mma 2023-01-05 21:14:55 -08:00
Phil Tillet
520b69fe70 more reassociation 2023-01-05 16:05:11 -08:00
Phil Tillet
764134ee34 trying to decrease register pressure 2023-01-05 13:02:38 -08:00
Phil Tillet
1bde80b1e8 Added ptx code 2023-01-04 17:23:16 -08:00
Phil Tillet
268d2cd18d better convert + write-back 2023-01-04 17:12:35 -08:00
Phil Tillet
29a1e20b58 tweak convert + trans 2023-01-04 17:12:28 -08:00
Phil Tillet
36da342893 . 2023-01-04 11:25:03 -08:00
Phil Tillet
e70e1e76b4 swizzling 2023-01-04 11:21:19 -08:00
Phil Tillet
e3c3d9fc65 16 spills 2023-01-04 00:01:22 -08:00
Phil Tillet
ee86ea9c90 100 spills 2023-01-03 20:52:00 -08:00
Phil Tillet
645fa5c1cd . 2023-01-03 18:34:05 -08:00
Phil Tillet
8df1fa5e5b Merge remote-tracking branch 'origin/master' into phil/fused-attention-perf-fixup 2023-01-03 18:31:34 -08:00
Keren Zhou
8460ea3df1 [Frontend] Fix import for libdevice (#1028)
This is a hotfix for issue 1 in
https://github.com/openai/triton/issues/1017
2023-01-03 15:48:05 -08:00
Phil Tillet
737e43a627 more tests 2023-01-03 09:48:08 -08:00
Phil Tillet
5c01c567b9 . 2023-01-02 23:13:12 -08:00
Phil Tillet
05920e0b8b reduced some spilling 2023-01-02 19:28:54 -08:00
Phil Tillet
c11fe351e1 . 2023-01-02 19:16:06 -08:00
Phil Tillet
b246d85fad trying to figure out spilling root cause 2022-12-30 15:21:00 -08:00
Phil Tillet
4dce8dd709 Merge remote-tracking branch 'origin/master' into phil/fused-attention-perf-fixup 2022-12-30 11:53:49 -08:00
Phil Tillet
7388fb1de9 manual ttgir in bwd pass 2022-12-29 15:53:38 -08:00
fdrocha
194ba103b1 [BUILD] Fixed error when compiling in systems with multiple versions of python installed (#1019) 2022-12-29 15:10:34 -08:00
Phil Tillet
71e3143eaf . 2022-12-29 14:40:27 -08:00
Phil Tillet
54ae3e8d6e cleanup 2022-12-28 13:42:43 -08:00
Phil Tillet
7aba2a60d6 trying out another change 2022-12-27 21:51:51 -08:00
Phil Tillet
eefc9d1274 Added TTGIR kernel 2022-12-27 21:49:28 -08:00
Phil Tillet
0d6e6cf578 trying more things 2022-12-27 20:58:31 -08:00
Philippe Tillet
4182e90862 less math 2022-12-24 00:31:05 -08:00
Keren Zhou
fd2da4aff6 [BACKEND] Support splat constant on the DotOperandLayout (#1008) 2022-12-22 00:48:46 -08:00
Sharad Vikram
925d3d7f98 [FRONTEND] Export broadcast and broadcast_to in triton.language (#1007) 2022-12-22 01:57:33 +00:00
Philippe Tillet
033e82060d . 2022-12-21 14:02:10 -08:00
Phil Tillet
88e572e54d . 2022-12-21 13:54:30 -08:00
Keren Zhou
b5aafb0dab [FRONTEND] Fix 3d indexing (#1006) 2022-12-21 12:52:32 -08:00
Philippe Tillet
20100a7254 Merge triton-mlir branch - Complete rewrite of the backend from scratch (#1004)
This PR merges the `triton-mlir` branch, in which we have been quietly
rewriting the Triton backend from scratch to increase maintainability,
stability and ultimately performance. Changes to the runtime are
minimal, and this new version aims to remain backward-compatible with
the previous commit. The legacy backend is now officially deprecated,
but can still be accessed via the `legacy-backend` tag.

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
Co-authored-by: Yan Chunwei <yanchunwei@outlook.com>
Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com>
Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com>
Co-authored-by: Yan Da <dyanab@connect.ust.hk>
Co-authored-by: Jun Yang <yangjunpro@gmail.com>
Co-authored-by: Ian Bearman <ianb@microsoft.com>
Co-authored-by: Jason Ansel <jansel@jansel.net>
Co-authored-by: Qingyi Liu <qingyil@nvidia.com>
Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com>
Co-authored-by: Chenggang Zhao <lyricz@yeah.net>
Co-authored-by: ben-zhang-609 <benzh609@gmail.com>
Co-authored-by: dongdongl <dongdongl@nvidia.com>
2022-12-21 01:30:50 -08:00
Yang Hau
8650b4d1cb [DRIVER] Fix typos (#939) 2022-12-02 11:13:46 -08:00
Crutcher Dunnavant
44f577984d Fix format double substitution bug: {i} => {{i}} (#886)
The previous `{i}` was silently expanding to the `i` from the
enumeration loop on `regular_args` (when it wasn't empty).
2022-11-20 11:44:42 -08:00
Crutcher Dunnavant
0e4691e6dd [FRONTEND] Fix ExternLibrary(format=) bug; type annotate build_extern.py (#883)
Ran mypy over `build_extern.py`, cleaned up type annotations.

Found a fixed a bug where `ExternLibrary(format=)` was being ignored.
2022-11-17 18:45:30 +01:00
Natalia Gimelshein
0d7e753227 [TESTING] use torch.int for autotuning cache (#840)
For stupid reasons, ops on int8 are 3 times slower than on int, and for
another set of stupid reasons we are not using cudaMemset for `zero_`,
so using `int8` buffer in `do_bench` makes it slow.

Co-authored-by: Philippe Tillet <phil@openai.com>
2022-11-04 18:05:16 -07:00