Philippe Tillet
66fa2f2975
.
2023-01-09 23:11:51 -08:00
Philippe Tillet
b162c44d59
.
...
y
2023-01-09 22:58:40 -08:00
Phil Tillet
2fa0dfbce9
.
2023-01-09 22:50:38 -08:00
Philippe Tillet
ff04a5e9b6
.
2023-01-09 22:11:00 -08:00
Phil Tillet
d88353a5a4
.
2023-01-09 20:14:06 -08:00
Phil Tillet
fc1007278d
.
2023-01-09 18:45:44 -08:00
Phil Tillet
3fefcd78d4
.
2023-01-09 16:29:45 -08:00
Phil Tillet
137e866bd2
more work
2023-01-09 16:20:10 -08:00
Phil Tillet
8ebb593bbb
more work
2023-01-09 15:45:06 -08:00
Phil Tillet
6c750b6856
Added verifier for trans
2023-01-08 14:29:17 -08:00
Phil Tillet
600bcefb12
more optimizations
2023-01-06 20:27:49 -08:00
Philippe Tillet
18c7a72973
more pass template
2023-01-06 14:26:06 -08:00
Phil Tillet
a81345f7c1
SinkConversionsFromShared template
2023-01-06 13:01:08 -08:00
Phil Tillet
520b69fe70
more reassociation
2023-01-05 16:05:11 -08:00
Phil Tillet
764134ee34
trying to decrease register pressure
2023-01-05 13:02:38 -08:00
Phil Tillet
268d2cd18d
better convert + write-back
2023-01-04 17:12:35 -08:00
Phil Tillet
36da342893
.
2023-01-04 11:25:03 -08:00
Phil Tillet
645fa5c1cd
.
2023-01-03 18:34:05 -08:00
Phil Tillet
5c01c567b9
.
2023-01-02 23:13:12 -08:00
Phil Tillet
05920e0b8b
reduced some spilling
2023-01-02 19:28:54 -08:00
Phil Tillet
c11fe351e1
.
2023-01-02 19:16:06 -08:00
Phil Tillet
b246d85fad
trying to figure out spilling root cause
2022-12-30 15:21:00 -08:00
Phil Tillet
7388fb1de9
manual ttgir in bwd pass
2022-12-29 15:53:38 -08:00
Phil Tillet
71e3143eaf
.
2022-12-29 14:40:27 -08:00
Phil Tillet
54ae3e8d6e
cleanup
2022-12-28 13:42:43 -08:00
Phil Tillet
eefc9d1274
Added TTGIR kernel
2022-12-27 21:49:28 -08:00
Phil Tillet
0d6e6cf578
trying more things
2022-12-27 20:58:31 -08:00
Philippe Tillet
4182e90862
less math
2022-12-24 00:31:05 -08:00
Philippe Tillet
033e82060d
.
2022-12-21 14:02:10 -08:00
Phil Tillet
88e572e54d
.
2022-12-21 13:54:30 -08:00
Philippe Tillet
20100a7254
Merge triton-mlir
branch - Complete rewrite of the backend from scratch ( #1004 )
...
This PR merges the `triton-mlir` branch, in which we have been quietly
rewriting the Triton backend from scratch to increase maintainability,
stability and ultimately performance. Changes to the runtime are
minimal, and this new version aims to remain backward-compatible with
the previous commit. The legacy backend is now officially deprecated,
but can still be accessed via the `legacy-backend` tag.
Co-authored-by: Keren Zhou <kerenzhou@openai.com >
Co-authored-by: Yan Chunwei <yanchunwei@outlook.com >
Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com >
Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com >
Co-authored-by: Yan Da <dyanab@connect.ust.hk >
Co-authored-by: Jun Yang <yangjunpro@gmail.com >
Co-authored-by: Ian Bearman <ianb@microsoft.com >
Co-authored-by: Jason Ansel <jansel@jansel.net >
Co-authored-by: Qingyi Liu <qingyil@nvidia.com >
Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com >
Co-authored-by: Chenggang Zhao <lyricz@yeah.net >
Co-authored-by: ben-zhang-609 <benzh609@gmail.com >
Co-authored-by: dongdongl <dongdongl@nvidia.com >
2022-12-21 01:30:50 -08:00
Chenggang Zhao
f16138d447
[Frontend] Interface fixes for libdevice ( #830 )
...
- Unifying several interfaces with different types to a single one, e.g.
`fsub_ru` and `dsub_ru` -> `sub_ru`;
- Minor bug fix: `fast_pow` is incorrectly classified into the `pow`
interface, of which arguments are the same as `powf`;
- Explicit interfaces for casting functions, e.g. decoupling
`ll2float_ru` to `ll2float_ru` and `ull2float_ru`;
- Removing interfaces that are not in NVIDIA's official documents, e.g.
`fmaf_ieee_rn`, which is confusing together with `fmaf_rn`.
Note that this PR for the master branch is different from #829 , which is
for the MLIR branch.
2022-11-01 10:51:58 -07:00
Chris
9a11a567ce
[DOCS] Fixed typos in 01-vector-add.py ( #751 )
2022-10-09 18:12:46 -07:00
Phil Tillet
b244db06da
[TUTORIALS] Attention tutorial fixup
2022-09-30 19:31:43 -07:00
Shintaro Iwasaki
c668d6596e
[DOCS] Fix spelling ( #664 )
...
This PR applies minor spelling fix in comments and string literals to
`master`. It shouldn't hurt anything.
2022-09-16 12:26:40 -07:00
Yunxing Dai
59a8e25f43
[DOCS] Fix typo ( #650 )
2022-09-14 12:17:05 -07:00
Phil Wang
7394d732ad
[DOCS] support for variable head dimensions in flash attention triton tutorial ( #623 )
2022-08-15 19:16:49 -07:00
Keren Zhou
af85f5fa46
[FRONTEND] Refresh cache when the source code of outlined functions are changed ( #590 )
2022-07-20 17:34:07 -07:00
Philippe Tillet
86cab58d89
[CI] Changed dev wheel date to UTC time to match CRON schedule ( #587 )
2022-07-18 14:54:13 -07:00
Phil Tillet
5b04331dd2
[TUTORIALS] Added more credits in fused attention tutorial
2022-07-13 23:48:58 -07:00
Keren Zhou
4912916c11
[FRONTEND] Added support for element-wise function defined in external LLVM bitcode (e.g., libdevice) ( #562 )
2022-07-13 15:52:21 -07:00
Phil Tillet
971f5782b4
[tutorials] Added flash attention credits in tutorial
2022-07-11 18:56:48 -07:00
Philippe Tillet
d5eb9bc230
[tutorial] Added bwd in fused attention example ( #579 )
...
Doesn't work on V100
2022-07-11 15:43:46 -07:00
Natalia Gimelshein
1bbb2430d9
[TUTORIALS] adjust heuristics for dwdb kernel ( #565 )
2022-06-29 17:00:22 -07:00
Philippe Tillet
5b4c8f221e
[BACKEND] Compiler improvements ( #557 )
...
This PR adds several optimization capabilities in the compiler backend:
- Now using inline PTX for `tl.store`, making it possible to use things like evict_last
- For A100, mma layout can be directly converted to shared memory
- For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major.
- Fixed liveness analysis; this was broken.
- Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop.
- `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.
2022-06-27 11:49:19 -07:00
Philippe Tillet
751e325d2e
[TUTORIALS] Fixed typo
2022-06-05 13:33:21 -07:00
Philippe Tillet
801c8a4c92
[TUTORIALS] Fixed typo
2022-06-05 12:32:07 -07:00
Philippe Tillet
8876e53206
[BACKEND] Restored reduction bugfixes
2022-06-03 11:38:52 -07:00
Philippe Tillet
a60374a597
Revert "[BACKEND] Various bug fixes; making reductions faster ( #533 )".
...
This is a more stable commit that produce bitwise identical code to earlier
versions. Using commits after this one may lead to slightly different numerics
2022-06-03 11:36:06 -07:00
Philippe Tillet
3e7500dfe6
[BACKEND] Various bug fixes; making reductions faster ( #533 )
2022-05-31 17:14:44 -07:00