Philippe Tillet
bb0f9235d1
[OPTIMIZER] Made layout simplification pass efficient for fused attention kernels ( #790 )
2022-10-21 16:52:15 -07:00
Shintaro Iwasaki
0d22d2bc03
[TritonMLIR] Disallow 0D tensor ( #788 )
2022-10-19 10:34:32 -07:00
Yan Chunwei
4464646efb
[Triton-MLIR][BACKEND] Fix masked load store op vector size ( #785 )
...
Correct the Load/Store Op's vector size with the mask's alignment
correctly considered.
Some cases:
```mlir
// num_warp = 2
// block_size = 128
func @vecadd_mask_align_16(%a_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %b_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32},
%out_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %n_elements: i32 {tt.divisibility = 16 : i32}) {
// mask = make_range(128) < n_element
}
```
This should get the vec=2 `ld`/`st` instructions.
While the following example
```mlir
// num_warp = 2
// block_size = 128
func @vecadd_mask_align_16(%a_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %b_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32},
%out_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %n_elements: i32) {
// mask = make_range(128) < n_element
}
```
it should get the vec=1 `ld`/`st` instructions.
2022-10-18 11:43:50 +08:00
Shintaro Iwasaki
5898352f97
[Triton-IR] Fix LoadOp definition ( #771 ) ( #777 )
2022-10-13 18:53:00 -07:00
Philippe Tillet
623c99609f
[Triton-IR] Added type inference and verifier for Triton-IR operations ( #767 )
2022-10-11 18:16:41 -07:00
goostavz
e843257295
[Backend] Fix a bug in emitIndicesForBlocked ( #740 )
2022-10-04 21:29:59 -07:00
Keren Zhou
289ff293cc
[Triton-MLIR] Generate LLVM/PTX code for async ops ( #735 )
2022-10-04 09:37:00 -07:00
goostavz
f9d7f2f126
[Triton-MLIR][Backend] Support ConvertLayout blocked->shared and a few fixes related with mma( #716 )
2022-10-03 19:33:25 +08:00
goostavz
61b61755e5
[Triton-MLIR][Backend] Support layout conversion between mmaLayout and blockedLayout ( #693 )
2022-09-27 03:58:47 +00:00
Keren Zhou
ecd1bc33df
[Triton-MLIR] Keren/code gen for extract slice and alloc tensor ( #692 )
...
Co-authored-by: gzhu <goostavz@outlook.com >
2022-09-23 19:38:14 +00:00
Yan Chunwei
922155f1d2
[BACKEND] add dot conversion (mma version=2) ( #672 )
...
LLVM Conversion for Dot op.
Due to the lack of `convert_layout`, currently, the dot only supports
the following combination of operands
- `$a` in shared layout
- `$b` in shared layout
- `$c` in MMA layout(but only Splat-like, leaving the generic cases to
`convert_layout`)
This PR focus on `mma.16816` related logic support, leaving the other
cases to the following PR.
Co-authored-by: Philippe Tillet <phil@openai.com >
2022-09-22 20:43:54 -07:00
goostavz
15bfd0cb79
[BACKEND] Support of ConvertLayoutOp from blocked to blocked and SliceLayout with blocked parent ( #658 )
2022-09-17 14:58:42 -07:00
Shintaro Iwasaki
43be75ad42
[FRONTEND] Add scalar type support for some ops ( #661 )
...
This PR adds basic support for scalar-type inputs to some ops (cast and pointer arithmetics) for Triton-MLIR. Also renames getelementptr -> addptr
2022-09-15 16:12:52 -07:00
Yan Chunwei
2a852044d9
[BACKEND] Add C++ tests for PTXFormat and some tiny refinement ( #109 )
...
This PR does
1. Add some C++ tests for `PTXFormat`
2. Enhance the functionality of `PTXFormat`, make a `PTXInstr` instance
can be called multiple times similar as a C function.
2022-09-09 09:15:07 -07:00
Yan Chunwei
a9464f4993
[Backend] Vectorize Load/Store Ops ( #86 )
...
This PR does the following things:
- Code refactoring on Load and Store op codegen, rewrite with same logic
and share much code
- Support the vectorized load/store
2022-09-06 12:28:09 -07:00
Shintaro Iwasaki
d01353de07
[CI] add assert-enabled MLIR option ( #78 )
...
This deprecates the use of release-build LLVM hosted by the LLVM project, which makes debugging harder for developers.
This PR implements the following solution:
1. Create LLVM release tarballs with assert enabled on our own (using Docker)
2. Host them in our own GitHub repositories
3. Use our LLVM for CI and/or development if `TRITON_USE_ASSERT_ENABLED_LLVM=1` is set.
2022-08-31 18:55:32 -07:00
Shintaro Iwasaki
0ebef11c77
[TritonIR] Make mask operand optional ( #74 )
2022-08-22 22:00:17 -07:00
goostavz
de2dd04c8a
[BACKEND] two minor bugfix on StoreOpLowering and kernel launch & support optional other in LoadOpLowering ( #69 )
...
* [BACKEND] two minor bugfix on StoreOpLowering and kernel launch & support optional other in LoadOpLowering
* Clean code
Co-authored-by: goostavz <gzhu@nvidia.com >
Co-authored-by: Yan Chunwei <yanchunwei@outlook.com >
2022-08-22 21:47:09 -07:00
Yan Chunwei
10ba51c3bb
[FRONTEND] add python e2e launch empty kernel test ( #68 )
2022-08-19 10:46:01 -07:00
Shintaro Iwasaki
9aa00249a6
[TritonIR] make other optional and remove isOtherUnspecified ( #67 )
...
[Triton] make other optional and remove isOtherUnspecified
2022-08-18 18:19:55 -07:00
Shintaro Iwasaki
d69ce77b19
[FRONTEND] add an attr for masked load without explicit other ( #55 )
2022-08-18 09:51:37 -07:00
goostavz
fc58250a06
[BACKEND] Add backend support of arith::AddIOp, arith::AddFOp, GetProgramIdOp & GEPOp and bugfix for SplatOp, StoreOp, FuncOp ( #60 )
...
Add backend support of arith::AddIOp, arith::AddFOp, GetProgramIdOp, GEPOp and bugfix for SplatOp, StoreOp, FuncOp
Co-authored-by: gzhu <gzhu@nvidia.com >
2022-08-18 20:46:45 +08:00
Yan Chunwei
95bbac41e7
[BACKEND] Add LLVM-translation for store and splat ops ( #47 )
2022-08-15 00:46:37 -07:00
goostavz
993ba7035a
[BACKEND] Codegen bringup, index calculation of blocked_layout & support of LoadOp, BroadcastOp, ViewOp & MakeRangeOp ( #38 )
...
Co-authored-by: gzhu <gzhu@nvidia.com >
2022-08-14 19:58:59 -07:00
Yan Chunwei
83ef74f248
[BACKEND] Extracting numWarps from tritonGPU module ( #39 )
2022-08-08 09:40:20 -07:00
Yan Chunwei
b988bae813
Init TritonGPU to LLVM dialect conversion ( #32 )
...
* add toLLVM pass
* update num-warps setting in mlir
2022-08-04 10:15:45 +08:00
Phil Tillet
65237f6117
[PACKAGING] Added FileCheck
2022-07-07 16:53:19 -07:00
Yan Da
49d1821149
conversion test
2022-06-08 16:19:15 +08:00