triton

Author	SHA1	Message	Date
Philippe Tillet	a4ff0c362c	[FRONTEND] Fix issues with atomics (#849 )	2022-11-06 20:52:11 -08:00
Philippe Tillet	dc0588a898	[OPTIMIZER] Improved layout simplification pass so it handles swizzled layouts better (#789 ) Note: uncommented `test_gemm`, since backend has an issue with swizzling. This will get uncommented in a subsequent PR.	2022-10-20 19:03:37 -07:00
Yan Chunwei	4464646efb	[Triton-MLIR][BACKEND] Fix masked load store op vector size (#785 ) Correct the Load/Store Op's vector size with the mask's alignment correctly considered. Some cases: ```mlir // num_warp = 2 // block_size = 128 func @vecadd_mask_align_16(%a_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %b_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %out_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %n_elements: i32 {tt.divisibility = 16 : i32}) { // mask = make_range(128) < n_element } ``` This should get the vec=2 `ld`/`st` instructions. While the following example ```mlir // num_warp = 2 // block_size = 128 func @vecadd_mask_align_16(%a_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %b_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %out_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %n_elements: i32) { // mask = make_range(128) < n_element } ``` it should get the vec=1 `ld`/`st` instructions.	2022-10-18 11:43:50 +08:00
goostavz	e948a618b3	[Triton-MLIR] fix a tiny bug in coalesce pass (#782 )	2022-10-16 20:29:55 -07:00
Keren Zhou	16aed94ff5	[Analysis/Allocation] Allocation passes now assumes that slices always alias (#108 ) This code in this branch assumes the `src` operand in `insert_slice_async` always aliases the result, which shouldn't hold for generally cases but is just a workaround to make the pipeline pass work. I'm also working on the complete analysis in another [branch](https://github.com/openai/triton-mlir/tree/keren/analyze-slice).	2022-09-09 12:03:41 -07:00
Yan Chunwei	a9464f4993	[Backend] Vectorize Load/Store Ops (#86 ) This PR does the following things: - Code refactoring on Load and Store op codegen, rewrite with same logic and share much code - Support the vectorized load/store	2022-09-06 12:28:09 -07:00
Philippe Tillet	a0bab9748e	[OPTIMIZER] Coalesce pass no longer takes a `num-warps` argument (#99 ) Improved design to avoid inconsistent `num-warps` value between the pass and the parent module of the operation it processes.	2022-09-05 18:09:02 -07:00
Philippe Tillet	192be76b3c	[OPTIMIZER] Rewrite patterns for layout conversions (#64 )	2022-08-18 12:49:37 -07:00
Philippe Tillet	3236642e8f	[OPTIMIZER] Added memory coalescing pass (#31 )	2022-07-31 20:59:31 -07:00

9 Commits