[Triton-MLIR][Backend] Fix the order in linear/delinear and a few bugs in reduce conversion (#851)
1, fix the order in linearize/delinearize, which fix the error of order in emitIndices; 2, fix the selecting of fast implementation in reduce codegen; 3, fix the redundant barrier in reduce codegen; 4, fix the index mapping of the second round of warp_shuffle in shuffle version of reduce codegen. Co-authored-by: Keren Zhou <kerenzhou@openai.com>
This commit is contained in:
@@ -77,7 +77,7 @@ SmallVector<unsigned> getScratchConfigForReduce(triton::ReduceOp op) {
|
||||
auto srcShape = srcTy.getShape();
|
||||
auto axis = op.axis();
|
||||
|
||||
bool fastReduce = axis == 1; // FIXME(Qingyi): The fastest-changing dimension
|
||||
bool fastReduce = axis == srcLayout.getOrder()[0];
|
||||
|
||||
SmallVector<unsigned> smemShape;
|
||||
for (auto d : srcShape)
|
||||
|
Reference in New Issue
Block a user