triton

Author	SHA1	Message	Date
Keren Zhou	fdd59900f7	[Triton-MLIR] Replace triton.extract_slice with tensor.extract_slice and support more general tensor slicing (#837 ) ## Features - Allow taking a block of tensor slice, as long as each dimension is contiguous (unit stride). - Fix some problems in `insert_slice_async`'s semantic. - More general verification for ops that return shared layout encoding. ## Known Limitations - `insert_slice_async` still uses the old semantic. May submit another PR later to support similar semantic like `tensor.extract_slice`. - No encoding verification for `tensor.extract_slice`. - 3d tensor ops are broken. - Strided accesses are not allowed. - May cause a little performance slowdown since we are passing strides as values but not constants (e.g., int). It would be difficult to pass strides as attributes when we have control flows. A block argument is possible to accept tensors with different strides.	2022-11-06 22:59:03 -08:00
Philippe Tillet	12d60cb4a3	[BACKEND] Added support for 1D conversion blocked -> slice (#831 )	2022-11-01 13:19:58 -07:00
Yan Chunwei	031c2ae77b	[Triton-MLIR][BACKEND] Port the mma<v1> conversion (#815 ) This PR does - port the mma<v1> related code, and support dot conversion and convert_layout[shared->dot_op<mma<v1>>] - add a lit test for dot v1	2022-11-01 09:42:14 +08:00
Ian Bearman	f2106d0aa2	[BUILD] Fix Warnings and Enable Warnings as Errors (#794 )	2022-10-28 12:36:09 -07:00
Philippe Tillet	ac0f6793cc	[BACKEND] Added support for scalars in LoadOp / StoreOp / ElementwiseOp (#814 ) Also fixed various errors that showed up in `test_core.py`, and added more TODOs for open (hopefully relatively minor) issues	2022-10-28 16:17:55 +08:00
Qingyi Liu	42db3538e4	[Triton-MLIR][Backend] Add ReduceOpConversion into TritonGPUToLLVM conversion (#774 ) What is done in this PR: - [x] Add `ConvertLayout`, `getSizePerThread` and `getShapePerCTA` implementation for `SliceEncodingAttr` - [x] Split `emitIndices` into two phases: `emitBaseIndexForBlockedLayout` and `emitOffsetForBlockedLayout` - [x] Add `ReduceOpConversion::matchAndRewriteBasic` implementation - [x] Add `ReduceOpConversion::matchAndRewriteFast` implementation with ptx instruction `shfl.sync` - [x] Add support for scalar value in `StoreOpConversion` - [x] Add Reduce1d and Reduce2d unit tests and pass all unit tests Co-authored-by: Qingyi Liu <liuqingyi1993@gmail.com>	2022-10-28 11:07:45 +08:00
Yan Chunwei	877844de4f	[Triton-MLIR][BACKEND] add convert_layout[shared->dot_op] converstion to adapt DotOperand layout (#786 ) This PR helps to 1. Adapt the existing DotOp conversion to the design of the new DotOperand layout, 2. Making the DotOp conversion work with both shared-layout inputs case and dotoperand-layout inputs case for further upstream switch.	2022-10-24 11:40:13 +08:00
Philippe Tillet	bb0f9235d1	[OPTIMIZER] Made layout simplification pass efficient for fused attention kernels (#790 )	2022-10-21 16:52:15 -07:00
Philippe Tillet	623c99609f	[Triton-IR] Added type inference and verifier for Triton-IR operations (#767 )	2022-10-11 18:16:41 -07:00
Keren Zhou	289ff293cc	[Triton-MLIR] Generate LLVM/PTX code for async ops (#735 )	2022-10-04 09:37:00 -07:00
goostavz	f9d7f2f126	[Triton-MLIR][Backend] Support ConvertLayout blocked->shared and a few fixes related with mma(#716 )	2022-10-03 19:33:25 +08:00
Yan Chunwei	3a84278530	[Triton-MLIR][BACKEND] Refine dot conversion (#710 ) This PR does 1. Refine the dot conversion 2. some other tiny code refinement	2022-09-27 14:38:34 +08:00
goostavz	61b61755e5	[Triton-MLIR][Backend] Support layout conversion between mmaLayout and blockedLayout (#693 )	2022-09-27 03:58:47 +00:00
Keren Zhou	ecd1bc33df	[Triton-MLIR] Keren/code gen for extract slice and alloc tensor (#692 ) Co-authored-by: gzhu <goostavz@outlook.com>	2022-09-23 19:38:14 +00:00
Yan Chunwei	922155f1d2	[BACKEND] add dot conversion (mma version=2) (#672 ) LLVM Conversion for Dot op. Due to the lack of `convert_layout`, currently, the dot only supports the following combination of operands - `$a` in shared layout - `$b` in shared layout - `$c` in MMA layout(but only Splat-like, leaving the generic cases to `convert_layout`) This PR focus on `mma.16816` related logic support, leaving the other cases to the following PR. Co-authored-by: Philippe Tillet <phil@openai.com>	2022-09-22 20:43:54 -07:00
goostavz	15bfd0cb79	[BACKEND] Support of ConvertLayoutOp from blocked to blocked and SliceLayout with blocked parent (#658 )	2022-09-17 14:58:42 -07:00
Keren Zhou	16aed94ff5	[Analysis/Allocation] Allocation passes now assumes that slices always alias (#108 ) This code in this branch assumes the `src` operand in `insert_slice_async` always aliases the result, which shouldn't hold for generally cases but is just a workaround to make the pipeline pass work. I'm also working on the complete analysis in another [branch](https://github.com/openai/triton-mlir/tree/keren/analyze-slice).	2022-09-09 12:03:41 -07:00
Keren Zhou	328b87aec6	Keren/tensor slice insert alloc (#94 ) This branch defines three new triton_gpu operations to partially solve #87. Below is an overview: ``` %tensor = triton_gpu.alloc_tensor : tensor<2x16x16xf16, #A> %b = triton_gpu.insert_slice_async %a_ptr, %tensor, %offset {axis = 0 : i32, cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<16x16x!tt.ptr<f16>, #AL> -> tensor<2x16x16xf16, #A> %c = triton_gpu.extract_slice %b, %offset {axis = 0 : i32} : tensor<2x16x16xf16, #A> -> tensor<16x16xf16, #A> ``` We plan to fully replace `copy_async` with `insert_slice_async`. This hasn't been done yet.	2022-09-01 12:37:17 -07:00
Shintaro Iwasaki	84aa7d025a	[TritonIR] simplify Load/StoreOps when mask is true/false (#79 ) * [TritonIR] fix Load/Store/CopyAsyncOp's parsers * [TritonIR] simplify Load/StoreOps when mask is true/false * [TEST] adds tests to check load/store simplification	2022-08-24 12:55:49 -07:00
Philippe Tillet	192be76b3c	[OPTIMIZER] Rewrite patterns for layout conversions (#64 )	2022-08-18 12:49:37 -07:00
Shintaro Iwasaki	2ba9a83465	[BUILD] fix minor issues with MLIR assert enabled (#46 )	2022-08-11 21:20:47 -07:00
Philippe Tillet	d1593e6ca8	[TritonGPU] Improved documentation and semantics of layout encodings (#30 )	2022-07-31 13:59:44 -07:00
Philippe Tillet	432c3df265	[BUILD] MacOS can now build compiler and run MLIR tests (#25 )	2022-07-27 01:32:10 -07:00
Philippe Tillet	6d62d88d4f	[CI] run clang-format (#24 )	2022-07-26 17:25:03 -07:00
Keren Zhou	96cc6fb563	[TritonGPU] Pretty printer for layouts (#21 )	2022-07-26 10:50:11 -07:00
Yan Da	63e6a85901	Fix blocked layout parser	2022-07-15 15:19:11 +08:00
Yan Da	9d1b5e3f79	special encoding for broadcast	2022-06-18 21:16:45 +08:00
Yan Da	7b09b5f9e9	the pipeline pass now generates and accepts valid IR	2022-06-07 19:34:59 +08:00
Yan Da	366dddc3bc	update mma encoding & triton-opt	2022-06-06 21:03:58 +08:00
Yan Da	7807f64ef3	rename sharded_layout => blocked_layout	2022-06-05 16:14:59 +08:00
Yan Da	d5eca56cf3	more TritonGPU unit tests	2022-06-05 14:25:09 +08:00
Da Yan	e36a54eb86	more progress on the definition of layouts	2022-05-31 11:43:21 +00:00
Yan Da	441fd7c3cc	assembly format	2022-05-25 17:53:24 +08:00
Yan Da	9b670cfb9f	Add ReduceOp	2022-05-25 14:15:36 +08:00
Yan Da	a2c9f919a8	TritonGPU verifier	2022-05-24 19:48:56 +08:00
Yan Da	96876a46d1	More progress on Triton=>TritonGPU conversion (works for matmul)	2022-05-09 21:19:53 +08:00
Yan Da	3ad7bee35e	More conversion patterns	2022-05-04 12:50:02 +08:00
Yan Da	75d32e2442	More on TritonGPU conversion	2022-05-02 21:51:00 +08:00
Yan Da	1428185c9c	More progress on TritonGPUTypeConverter & TritonGPUConversionTarget	2022-05-01 22:06:54 +08:00
Yan Da	2239ac1998	more progress on TritonGPU	2022-04-28 18:51:31 +08:00

40 Commits