[Triton-MLIR][BACKEND] Support $c from mma layout in dot (#798)

This PR does 1. Support the case where $c holding a mma layout, this should be useful in forloop in k-axis in GEMM 2. Fix the `unrealized_conversion_cast` in ConvertLayout[shared->dot_op] Known issue 1. There is some IO conflict in GEMM with a k-forloop, it is temporarily solved by [adding a barrier](https://github.com/openai/triton/pull/798/files#diff-8a9a5a7f4a025fb1299af29d190d5626bd9000406d3ea47c49679272d3d6abe9R3028) in dot conversion, but we are still working on it, will get a more generic fix for it in the following PR. 2. The parallel pass will result in a buggy instruction result type ```mlir %1049 = llvm.inline_asm has_side_effects asm_dialect = att operand_attrs = [] "cp.async.commit_group ;", "" : () -> !llvm.void %1050 = builtin.unrealized_conversion_cast %1049 : !llvm.void to !llvm.ptr<f16, 3> ``` So we temporarily disable it.
2022-10-26 10:33:04 +08:00
parent a2cbe7af91
commit 4dc2396ca0
3 changed files with 226 additions and 64 deletions
--- a/python/triton/compiler.py
+++ b/python/triton/compiler.py
@@ -872,7 +872,9 @@ def make_tritongpu_ir(mod, num_warps):
 def optimize_tritongpu_ir(mod, num_stages):
    pm = _triton.ir.pass_manager(mod.context)
    pm.enable_debug()
-    pm.add_tritongpu_pipeline_pass(num_stages)
+    # Get error in backend due to wrong conversion in expanding async-related instruction.
+    # TODO[Superjomn]: Open it when fixed.
+    # pm.add_tritongpu_pipeline_pass(num_stages)
    pm.add_canonicalizer_pass()
    pm.add_cse_pass()
    pm.add_coalesce_pass()