Jason Ansel
e647402fd3
Fix warning in generated C code ( #667 )
2022-09-18 12:57:32 -07:00
Philippe Tillet
4a77dfb042
[FRONTEND] Complete rewrite of the runtime ( #644 )
...
This PR completely rewrites the runtime of Triton to be more lean and
clearly separate the compilation step from the just-in-time caching logic.
This should substantially reduce launch overhead.
2022-09-18 08:51:48 -07:00
Ian Bearman
889d9e34a1
[REPO] update gitignore ( #666 )
...
Update `.gitignore` to include `.vs` and `.vscode`
2022-09-17 14:25:28 -07:00
Shintaro Iwasaki
c668d6596e
[DOCS] Fix spelling ( #664 )
...
This PR applies minor spelling fix in comments and string literals to
`master`. It shouldn't hurt anything.
2022-09-16 12:26:40 -07:00
Sophia Wisdom
4580a04710
[FRONTEND] Improve error message for CPU tensors ( #654 )
...
Redo of #651 against master. Fixes #525 by catching CUDA error when we
check pytorch tensor size and rethrowing a more informative error that
says why we failed.
2022-09-14 14:26:42 -07:00
Philippe Tillet
cfbbc7b43a
[CI] Added V100 tag to disambiguate self-hosted runners ( #653 )
2022-09-14 13:47:50 -07:00
Yunxing Dai
59a8e25f43
[DOCS] Fix typo ( #650 )
2022-09-14 12:17:05 -07:00
Da Yan
437ced38c2
fp8 <> bf16 conversion ( #637 )
...
Co-authored-by: Philippe Tillet <phil@openai.com >
2022-08-30 14:20:12 -07:00
Da Yan
210a296699
[BACKEND] bf16 flash-attention ( #636 )
2022-08-26 20:40:55 -07:00
Daniil Fukalov
fe0c29b9ec
Fix inconsistent struct declaration instead of class. ( #632 )
...
Looks like typo.
2022-08-26 16:20:21 -07:00
Phil Wang
7394d732ad
[DOCS] support for variable head dimensions in flash attention triton tutorial ( #623 )
2022-08-15 19:16:49 -07:00
Da Yan
3e2953f357
Allow multiple_of and max_contiguous to accept n-d values ( #617 )
2022-08-10 09:59:32 -07:00
Daniil Fukalov
cc79376222
Fix deprectaion warning on CreateGEP(Value *, ArrayRef<Value *>, const Twine &) ( #608 )
...
This variant of CreateGEP() is already removed in LLVM 14.
2022-08-07 17:10:18 -07:00
Daniil Fukalov
7b91c7befd
Fix "warning: control reaches end of non-void function". ( #607 )
2022-08-02 16:12:48 -07:00
Sharad Vikram
968f59027e
Expose module.print
in pybind ( #604 )
2022-07-29 21:36:08 -07:00
Anton Kostin
923d468187
Update LICENSE ( #602 )
2022-07-25 09:30:03 -07:00
Jason Ansel
027321cdcf
[FRONTEND] Make tl.rand() 1-exclusive ( #601 )
2022-07-24 17:47:23 -07:00
Jason Ansel
e02e56dc63
[FRONTEND] Add missing rfloordiv ( #598 )
...
* [FRONTEND] Add missing rfloordiv
* fix tests
2022-07-23 21:54:12 -07:00
Philippe Tillet
ab56d310dd
[BACKEND][IR] Fixed up internal dtype size for booleans (1bit -> 8bit) ( #600 )
2022-07-23 20:08:03 -07:00
Da Yan
f28caddbf8
[FRONTEND] Allow tl.where to select pointers ( #595 )
2022-07-21 09:54:27 -07:00
Keren Zhou
af85f5fa46
[FRONTEND] Refresh cache when the source code of outlined functions are changed ( #590 )
2022-07-20 17:34:07 -07:00
daadaada
9b2bc88d11
[BACKEND] Better bf16 support ( #588 )
2022-07-19 21:22:37 -07:00
Philippe Tillet
86cab58d89
[CI] Changed dev wheel date to UTC time to match CRON schedule ( #587 )
2022-07-18 14:54:13 -07:00
Phil Tillet
5b04331dd2
[TUTORIALS] Added more credits in fused attention tutorial
2022-07-13 23:48:58 -07:00
Jason Ansel
0a3f3d5f25
[PACKAGING] Include triton/language/libdevice.10.bc in package data ( #582 )
2022-07-13 23:45:27 -07:00
Keren Zhou
4912916c11
[FRONTEND] Added support for element-wise function defined in external LLVM bitcode (e.g., libdevice) ( #562 )
2022-07-13 15:52:21 -07:00
Phil Tillet
971f5782b4
[tutorials] Added flash attention credits in tutorial
2022-07-11 18:56:48 -07:00
Philippe Tillet
d5eb9bc230
[tutorial] Added bwd in fused attention example ( #579 )
...
Doesn't work on V100
2022-07-11 15:43:46 -07:00
Jason Ansel
c9a2b9c7d4
[FRONTEND] Add missing args to get_simd_tflops() ( #578 )
2022-07-11 14:37:59 -07:00
Philippe Tillet
4a399a7e40
[BACKEND] Fix some bugs (atomics, a segfault...) ( #577 )
...
This should fix #558 , #573 and #574
2022-07-06 20:03:04 -07:00
vesuppi
22105bc33b
[FRONTEND] Added type check in semantic arange ( #572 )
2022-07-03 15:25:37 -07:00
Keren Zhou
4bf509889b
[BUILD] Change the default build type to Release ( #571 )
2022-07-01 12:17:22 -07:00
Keren Zhou
a74cce375f
[FRONTEND] Raise broadcast error ( #555 )
2022-06-30 17:32:07 -07:00
Philippe Tillet
f733327ba4
[BACKEND][CODEGEN] Disabling L2 residency control by default ( #570 )
2022-06-29 17:05:13 -07:00
Natalia Gimelshein
1bbb2430d9
[TUTORIALS] adjust heuristics for dwdb kernel ( #565 )
2022-06-29 17:00:22 -07:00
Kashif Rasul
1895ceaa2d
[TUTORIAL] Fix f-string for older python ( #569 )
...
fixes issue #568
2022-06-29 09:39:10 -07:00
Philippe Tillet
feb7a2a0dc
[FRONTEND] Hotfix for store
argument order ( #567 )
2022-06-28 00:24:02 -07:00
Philippe Tillet
5b4c8f221e
[BACKEND] Compiler improvements ( #557 )
...
This PR adds several optimization capabilities in the compiler backend:
- Now using inline PTX for `tl.store`, making it possible to use things like evict_last
- For A100, mma layout can be directly converted to shared memory
- For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major.
- Fixed liveness analysis; this was broken.
- Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop.
- `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.
2022-06-27 11:49:19 -07:00
Keren Zhou
87413bc925
[BACKEND] Fix layout convert for non-contiguous input ( #564 )
2022-06-25 23:12:03 -07:00
Keren Zhou
d345ddf837
[DOCS] Separate atomic cas from other atomic operations since operands are very different ( #559 )
2022-06-22 17:51:17 -07:00
Keren Zhou
b02bac41ba
[CI] Change cache dir ( #561 )
2022-06-22 11:44:35 -07:00
Keren Zhou
a428cf0bb2
[FRONTEND] Fix pytorch warning. ( #560 )
...
UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc').
2022-06-20 20:12:09 -07:00
Keren Zhou
b5e728cb14
Add argmin argmax ( #552 )
2022-06-15 13:55:20 -07:00
Jason Ansel
6b9756532f
[BACKEND] Remove print in coalesce.cc ( #551 )
2022-06-15 13:13:20 -07:00
Madeleine Thompson
8ce2c12e33
[PYTHON] move ephemeral files to homedir ( #549 )
...
This prevents potential conflicts with other users on shared machines.
2022-06-13 19:37:52 -07:00
Keren Zhou
93209c07e0
[BACKEND][CODEGEN] Fix reduce uint ( #547 )
2022-06-13 16:43:57 -07:00
Philippe Tillet
58c8889235
[FRONTEND] Fix scanline layout ( #548 )
2022-06-13 16:21:10 -07:00
Natalia Gimelshein
7094657aa9
[FRONTEND] fix bool conversion of floating types ( #545 )
2022-06-13 15:52:37 -07:00
Keren Zhou
38573d1261
[FRONTEND] Return allocated registers and spilled registers for users ( #541 )
2022-06-07 18:37:12 -07:00
Mengchi Zhang
2cdc6d35c4
[FRONTEND] Give col_per_thread an initial value to make the compiler happy ( #535 )
...
Signed-off-by: Mengchi Zhang <mengchi@fb.com >
2022-06-06 12:48:23 -07:00