[Triton-MLIR][BACKEND] Support $c from mma layout in dot (#798)
This PR does 1. Support the case where $c holding a mma layout, this should be useful in forloop in k-axis in GEMM 2. Fix the `unrealized_conversion_cast` in ConvertLayout[shared->dot_op] Known issue 1. There is some IO conflict in GEMM with a k-forloop, it is temporarily solved by [adding a barrier](https://github.com/openai/triton/pull/798/files#diff-8a9a5a7f4a025fb1299af29d190d5626bd9000406d3ea47c49679272d3d6abe9R3028) in dot conversion, but we are still working on it, will get a more generic fix for it in the following PR. 2. The parallel pass will result in a buggy instruction result type ```mlir %1049 = llvm.inline_asm has_side_effects asm_dialect = att operand_attrs = [] "cp.async.commit_group ;", "" : () -> !llvm.void %1050 = builtin.unrealized_conversion_cast %1049 : !llvm.void to !llvm.ptr<f16, 3> ``` So we temporarily disable it.
This commit is contained in:
@@ -872,7 +872,9 @@ def make_tritongpu_ir(mod, num_warps):
|
||||
def optimize_tritongpu_ir(mod, num_stages):
|
||||
pm = _triton.ir.pass_manager(mod.context)
|
||||
pm.enable_debug()
|
||||
pm.add_tritongpu_pipeline_pass(num_stages)
|
||||
# Get error in backend due to wrong conversion in expanding async-related instruction.
|
||||
# TODO[Superjomn]: Open it when fixed.
|
||||
# pm.add_tritongpu_pipeline_pass(num_stages)
|
||||
pm.add_canonicalizer_pass()
|
||||
pm.add_cse_pass()
|
||||
pm.add_coalesce_pass()
|
||||
|
Reference in New Issue
Block a user