Commit Graph

25 Commits

Author SHA1 Message Date
Phil Tillet
520b69fe70 more reassociation 2023-01-05 16:05:11 -08:00
Phil Tillet
764134ee34 trying to decrease register pressure 2023-01-05 13:02:38 -08:00
Phil Tillet
268d2cd18d better convert + write-back 2023-01-04 17:12:35 -08:00
Phil Tillet
36da342893 . 2023-01-04 11:25:03 -08:00
Phil Tillet
645fa5c1cd . 2023-01-03 18:34:05 -08:00
Phil Tillet
5c01c567b9 . 2023-01-02 23:13:12 -08:00
Phil Tillet
05920e0b8b reduced some spilling 2023-01-02 19:28:54 -08:00
Phil Tillet
c11fe351e1 . 2023-01-02 19:16:06 -08:00
Phil Tillet
b246d85fad trying to figure out spilling root cause 2022-12-30 15:21:00 -08:00
Phil Tillet
7388fb1de9 manual ttgir in bwd pass 2022-12-29 15:53:38 -08:00
Phil Tillet
71e3143eaf . 2022-12-29 14:40:27 -08:00
Phil Tillet
54ae3e8d6e cleanup 2022-12-28 13:42:43 -08:00
Phil Tillet
eefc9d1274 Added TTGIR kernel 2022-12-27 21:49:28 -08:00
Phil Tillet
0d6e6cf578 trying more things 2022-12-27 20:58:31 -08:00
Philippe Tillet
4182e90862 less math 2022-12-24 00:31:05 -08:00
Philippe Tillet
033e82060d . 2022-12-21 14:02:10 -08:00
Phil Tillet
88e572e54d . 2022-12-21 13:54:30 -08:00
Philippe Tillet
20100a7254 Merge triton-mlir branch - Complete rewrite of the backend from scratch (#1004)
This PR merges the `triton-mlir` branch, in which we have been quietly
rewriting the Triton backend from scratch to increase maintainability,
stability and ultimately performance. Changes to the runtime are
minimal, and this new version aims to remain backward-compatible with
the previous commit. The legacy backend is now officially deprecated,
but can still be accessed via the `legacy-backend` tag.

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
Co-authored-by: Yan Chunwei <yanchunwei@outlook.com>
Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com>
Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com>
Co-authored-by: Yan Da <dyanab@connect.ust.hk>
Co-authored-by: Jun Yang <yangjunpro@gmail.com>
Co-authored-by: Ian Bearman <ianb@microsoft.com>
Co-authored-by: Jason Ansel <jansel@jansel.net>
Co-authored-by: Qingyi Liu <qingyil@nvidia.com>
Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com>
Co-authored-by: Chenggang Zhao <lyricz@yeah.net>
Co-authored-by: ben-zhang-609 <benzh609@gmail.com>
Co-authored-by: dongdongl <dongdongl@nvidia.com>
2022-12-21 01:30:50 -08:00
Phil Tillet
b244db06da [TUTORIALS] Attention tutorial fixup 2022-09-30 19:31:43 -07:00
Phil Wang
7394d732ad [DOCS] support for variable head dimensions in flash attention triton tutorial (#623) 2022-08-15 19:16:49 -07:00
Philippe Tillet
86cab58d89 [CI] Changed dev wheel date to UTC time to match CRON schedule (#587) 2022-07-18 14:54:13 -07:00
Phil Tillet
5b04331dd2 [TUTORIALS] Added more credits in fused attention tutorial 2022-07-13 23:48:58 -07:00
Phil Tillet
971f5782b4 [tutorials] Added flash attention credits in tutorial 2022-07-11 18:56:48 -07:00
Philippe Tillet
d5eb9bc230 [tutorial] Added bwd in fused attention example (#579)
Doesn't work on V100
2022-07-11 15:43:46 -07:00
Philippe Tillet
5b4c8f221e [BACKEND] Compiler improvements (#557)
This PR adds several optimization capabilities in the compiler backend:
- Now using inline PTX for `tl.store`, making it possible to use things like evict_last
- For A100, mma layout can be directly converted to shared memory
- For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major.
- Fixed liveness analysis; this was broken.
- Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop.
- `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.
2022-06-27 11:49:19 -07:00