[Triton-MLIR][Backend] Fix the order in linear/delinear and a few bugs in reduce conversion (#851)

1, fix the order in linearize/delinearize, which fix the error of order in emitIndices; 2, fix the selecting of fast implementation in reduce codegen; 3, fix the redundant barrier in reduce codegen; 4, fix the index mapping of the second round of warp_shuffle in shuffle version of reduce codegen. Co-authored-by: Keren Zhou <kerenzhou@openai.com>
2022-11-09 02:10:09 +08:00
parent 303790da88
commit 080b4addf8
4 changed files with 282 additions and 246 deletions
--- a/lib/Analysis/Allocation.cpp
+++ b/lib/Analysis/Allocation.cpp
@@ -77,7 +77,7 @@ SmallVector<unsigned> getScratchConfigForReduce(triton::ReduceOp op) {
  auto srcShape = srcTy.getShape();
  auto axis = op.axis();

-  bool fastReduce = axis == 1; // FIXME(Qingyi): The fastest-changing dimension
+  bool fastReduce = axis == srcLayout.getOrder()[0];

  SmallVector<unsigned> smemShape;
  for (auto d : srcShape)