This PR does 1. Support the case where $c holding a mma layout, this should be useful in forloop in k-axis in GEMM 2. Fix the `unrealized_conversion_cast` in ConvertLayout[shared->dot_op] Known issue 1. There is some IO conflict in GEMM with a k-forloop, it is temporarily solved by [adding a barrier](https://github.com/openai/triton/pull/798/files#diff-8a9a5a7f4a025fb1299af29d190d5626bd9000406d3ea47c49679272d3d6abe9R3028) in dot conversion, but we are still working on it, will get a more generic fix for it in the following PR. 2. The parallel pass will result in a buggy instruction result type ```mlir %1049 = llvm.inline_asm has_side_effects asm_dialect = att operand_attrs = [] "cp.async.commit_group ;", "" : () -> !llvm.void %1050 = builtin.unrealized_conversion_cast %1049 : !llvm.void to !llvm.ptr<f16, 3> ``` So we temporarily disable it.
The file is empty.