[Triton-MLIR][Backend] Fix the order in linear/delinear and a few bugs in reduce conversion (#851)

1, fix the order in linearize/delinearize, which fix the error of order
in emitIndices;
2, fix the selecting of fast implementation in reduce codegen;
3, fix the redundant barrier in reduce codegen;
4, fix the index mapping of the second round of warp_shuffle in shuffle
version of reduce codegen.

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
This commit is contained in:
goostavz
2022-11-09 02:10:09 +08:00
committed by GitHub
parent 303790da88
commit 080b4addf8
4 changed files with 282 additions and 246 deletions

View File

@@ -77,7 +77,7 @@ SmallVector<unsigned> getScratchConfigForReduce(triton::ReduceOp op) {
auto srcShape = srcTy.getShape();
auto axis = op.axis();
bool fastReduce = axis == 1; // FIXME(Qingyi): The fastest-changing dimension
bool fastReduce = axis == srcLayout.getOrder()[0];
SmallVector<unsigned> smemShape;
for (auto d : srcShape)