Yan Chunwei
877844de4f
[Triton-MLIR][BACKEND] add convert_layout[shared->dot_op] converstion to adapt DotOperand layout ( #786 )
...
This PR helps to
1. Adapt the existing DotOp conversion to the design of the new
DotOperand layout,
2. Making the DotOp conversion work with both shared-layout inputs case
and dotoperand-layout inputs case for further upstream switch.
2022-10-24 11:40:13 +08:00
Philippe Tillet
bb0f9235d1
[OPTIMIZER] Made layout simplification pass efficient for fused attention kernels ( #790 )
2022-10-21 16:52:15 -07:00
Philippe Tillet
623c99609f
[Triton-IR] Added type inference and verifier for Triton-IR operations ( #767 )
2022-10-11 18:16:41 -07:00
Keren Zhou
289ff293cc
[Triton-MLIR] Generate LLVM/PTX code for async ops ( #735 )
2022-10-04 09:37:00 -07:00
goostavz
f9d7f2f126
[Triton-MLIR][Backend] Support ConvertLayout blocked->shared and a few fixes related with mma( #716 )
2022-10-03 19:33:25 +08:00
Yan Chunwei
3a84278530
[Triton-MLIR][BACKEND] Refine dot conversion ( #710 )
...
This PR does
1. Refine the dot conversion
2. some other tiny code refinement
2022-09-27 14:38:34 +08:00
goostavz
61b61755e5
[Triton-MLIR][Backend] Support layout conversion between mmaLayout and blockedLayout ( #693 )
2022-09-27 03:58:47 +00:00
Keren Zhou
ecd1bc33df
[Triton-MLIR] Keren/code gen for extract slice and alloc tensor ( #692 )
...
Co-authored-by: gzhu <goostavz@outlook.com >
2022-09-23 19:38:14 +00:00
Yan Chunwei
922155f1d2
[BACKEND] add dot conversion (mma version=2) ( #672 )
...
LLVM Conversion for Dot op.
Due to the lack of `convert_layout`, currently, the dot only supports
the following combination of operands
- `$a` in shared layout
- `$b` in shared layout
- `$c` in MMA layout(but only Splat-like, leaving the generic cases to
`convert_layout`)
This PR focus on `mma.16816` related logic support, leaving the other
cases to the following PR.
Co-authored-by: Philippe Tillet <phil@openai.com >
2022-09-22 20:43:54 -07:00
goostavz
15bfd0cb79
[BACKEND] Support of ConvertLayoutOp from blocked to blocked and SliceLayout with blocked parent ( #658 )
2022-09-17 14:58:42 -07:00
Keren Zhou
16aed94ff5
[Analysis/Allocation] Allocation passes now assumes that slices always alias ( #108 )
...
This code in this branch assumes the `src` operand in
`insert_slice_async` always aliases the result, which shouldn't hold for
generally cases but is just a workaround to make the pipeline pass work.
I'm also working on the complete analysis in another
[branch](https://github.com/openai/triton-mlir/tree/keren/analyze-slice ).
2022-09-09 12:03:41 -07:00
Keren Zhou
328b87aec6
Keren/tensor slice insert alloc ( #94 )
...
This branch defines three new triton_gpu operations to partially solve #87 . Below is an overview:
```
%tensor = triton_gpu.alloc_tensor : tensor<2x16x16xf16, #A>
%b = triton_gpu.insert_slice_async %a_ptr, %tensor, %offset {axis = 0 : i32, cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<16x16x!tt.ptr<f16>, #AL> -> tensor<2x16x16xf16, #A>
%c = triton_gpu.extract_slice %b, %offset {axis = 0 : i32} : tensor<2x16x16xf16, #A> -> tensor<16x16xf16, #A>
```
We plan to fully replace `copy_async` with `insert_slice_async`. **This hasn't been done yet.**
2022-09-01 12:37:17 -07:00
Shintaro Iwasaki
84aa7d025a
[TritonIR] simplify Load/StoreOps when mask is true/false ( #79 )
...
* [TritonIR] fix Load/Store/CopyAsyncOp's parsers
* [TritonIR] simplify Load/StoreOps when mask is true/false
* [TEST] adds tests to check load/store simplification
2022-08-24 12:55:49 -07:00
Philippe Tillet
192be76b3c
[OPTIMIZER] Rewrite patterns for layout conversions ( #64 )
2022-08-18 12:49:37 -07:00
Shintaro Iwasaki
2ba9a83465
[BUILD] fix minor issues with MLIR assert enabled ( #46 )
2022-08-11 21:20:47 -07:00
Philippe Tillet
d1593e6ca8
[TritonGPU] Improved documentation and semantics of layout encodings ( #30 )
2022-07-31 13:59:44 -07:00
Philippe Tillet
432c3df265
[BUILD] MacOS can now build compiler and run MLIR tests ( #25 )
2022-07-27 01:32:10 -07:00
Philippe Tillet
6d62d88d4f
[CI] run clang-format ( #24 )
2022-07-26 17:25:03 -07:00
Keren Zhou
96cc6fb563
[TritonGPU] Pretty printer for layouts ( #21 )
2022-07-26 10:50:11 -07:00
Yan Da
63e6a85901
Fix blocked layout parser
2022-07-15 15:19:11 +08:00
Yan Da
9d1b5e3f79
special encoding for broadcast
2022-06-18 21:16:45 +08:00
Yan Da
7b09b5f9e9
the pipeline pass now generates and accepts valid IR
2022-06-07 19:34:59 +08:00
Yan Da
366dddc3bc
update mma encoding & triton-opt
2022-06-06 21:03:58 +08:00
Yan Da
7807f64ef3
rename sharded_layout => blocked_layout
2022-06-05 16:14:59 +08:00
Yan Da
d5eca56cf3
more TritonGPU unit tests
2022-06-05 14:25:09 +08:00
Da Yan
e36a54eb86
more progress on the definition of layouts
2022-05-31 11:43:21 +00:00
Yan Da
441fd7c3cc
assembly format
2022-05-25 17:53:24 +08:00
Yan Da
9b670cfb9f
Add ReduceOp
2022-05-25 14:15:36 +08:00
Yan Da
a2c9f919a8
TritonGPU verifier
2022-05-24 19:48:56 +08:00
Yan Da
96876a46d1
More progress on Triton=>TritonGPU conversion (works for matmul)
2022-05-09 21:19:53 +08:00
Yan Da
3ad7bee35e
More conversion patterns
2022-05-04 12:50:02 +08:00
Yan Da
75d32e2442
More on TritonGPU conversion
2022-05-02 21:51:00 +08:00
Yan Da
1428185c9c
More progress on TritonGPUTypeConverter & TritonGPUConversionTarget
2022-05-01 22:06:54 +08:00
Yan Da
2239ac1998
more progress on TritonGPU
2022-04-28 18:51:31 +08:00