Philippe Tillet
91a9773b38
[OPTIMIZER] Minor bugfixes that affected matmul codegen performance ( #834 )
2022-11-02 22:58:09 -07:00
Ian Bearman
f2106d0aa2
[BUILD] Fix Warnings and Enable Warnings as Errors ( #794 )
2022-10-28 12:36:09 -07:00
goostavz
c4726333bf
[Triton-MLIR] Minor fixes related with scf/swizzling support ( #791 )
...
1, Disable static loop unrolling in the frontend by default;
2, A minor fix in axisAnalysis in order to support scf;
3, A minor fix in TritonGPUToLLVM to support swizzling.
2022-10-21 11:46:28 +08:00
Yan Chunwei
4464646efb
[Triton-MLIR][BACKEND] Fix masked load store op vector size ( #785 )
...
Correct the Load/Store Op's vector size with the mask's alignment
correctly considered.
Some cases:
```mlir
// num_warp = 2
// block_size = 128
func @vecadd_mask_align_16(%a_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %b_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32},
%out_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %n_elements: i32 {tt.divisibility = 16 : i32}) {
// mask = make_range(128) < n_element
}
```
This should get the vec=2 `ld`/`st` instructions.
While the following example
```mlir
// num_warp = 2
// block_size = 128
func @vecadd_mask_align_16(%a_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %b_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32},
%out_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %n_elements: i32) {
// mask = make_range(128) < n_element
}
```
it should get the vec=1 `ld`/`st` instructions.
2022-10-18 11:43:50 +08:00
Yan Chunwei
3a84278530
[Triton-MLIR][BACKEND] Refine dot conversion ( #710 )
...
This PR does
1. Refine the dot conversion
2. some other tiny code refinement
2022-09-27 14:38:34 +08:00
goostavz
15bfd0cb79
[BACKEND] Support of ConvertLayoutOp from blocked to blocked and SliceLayout with blocked parent ( #658 )
2022-09-17 14:58:42 -07:00
Shintaro Iwasaki
43be75ad42
[FRONTEND] Add scalar type support for some ops ( #661 )
...
This PR adds basic support for scalar-type inputs to some ops (cast and pointer arithmetics) for Triton-MLIR. Also renames getelementptr -> addptr
2022-09-15 16:12:52 -07:00
Yan Chunwei
a9464f4993
[Backend] Vectorize Load/Store Ops ( #86 )
...
This PR does the following things:
- Code refactoring on Load and Store op codegen, rewrite with same logic
and share much code
- Support the vectorized load/store
2022-09-06 12:28:09 -07:00
Philippe Tillet
78ebbe24c7
[FRONTEND] Added ExpandDimsOp
primitive ( #36 )
2022-08-04 18:41:06 -07:00
Philippe Tillet
3236642e8f
[OPTIMIZER] Added memory coalescing pass ( #31 )
2022-07-31 20:59:31 -07:00
Philippe Tillet
d1593e6ca8
[TritonGPU] Improved documentation and semantics of layout encodings ( #30 )
2022-07-31 13:59:44 -07:00
Philippe Tillet
6d62d88d4f
[CI] run clang-format ( #24 )
2022-07-26 17:25:03 -07:00
Philippe Tillet
a633d2b403
[Analysis] Added Axis Info Analysis ( #8 )
2022-07-19 13:38:48 -07:00