Commit Graph

6 Commits

Author SHA1 Message Date
Qingyi Liu
9d31998a9d [Triton-MLIR][BACKEND] Add argmin / argmax implementation for ReduceOp (#918) 2022-11-27 22:59:27 -08:00
Philippe Tillet
2aa538ec2e [BACKEND] Added support for mma layouts in reductions (#863)
Validated hackily by manually modifying the reduction .ttgir in my local
cache. There will be a follow-up PR adding some better testing
infrastructure to test out conversions and reductions on arbitrary
layouts.
2022-11-10 09:58:07 -08:00
goostavz
080b4addf8 [Triton-MLIR][Backend] Fix the order in linear/delinear and a few bugs in reduce conversion (#851)
1, fix the order in linearize/delinearize, which fix the error of order
in emitIndices;
2, fix the selecting of fast implementation in reduce codegen;
3, fix the redundant barrier in reduce codegen;
4, fix the index mapping of the second round of warp_shuffle in shuffle
version of reduce codegen.

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
2022-11-08 10:10:09 -08:00
Philippe Tillet
976cf12af1 [OPTIMIZER] Fixed memory coalescing (#847) 2022-11-07 06:22:18 -08:00
Qingyi Liu
cdc0ec5077 [Triton-MLIR][Backend] Fix reduce conversion and unit tests for int dtypes (#826) 2022-11-01 17:42:59 +08:00
Qingyi Liu
42db3538e4 [Triton-MLIR][Backend] Add ReduceOpConversion into TritonGPUToLLVM conversion (#774)
What is done in this PR:
- [x] Add `ConvertLayout`, `getSizePerThread` and `getShapePerCTA`
implementation for `SliceEncodingAttr`
- [x] Split `emitIndices` into two phases:
`emitBaseIndexForBlockedLayout` and `emitOffsetForBlockedLayout`
- [x] Add `ReduceOpConversion::matchAndRewriteBasic` implementation
- [x] Add `ReduceOpConversion::matchAndRewriteFast` implementation with
ptx instruction `shfl.sync`
- [x] Add support for scalar value in `StoreOpConversion`
- [x] Add Reduce1d and Reduce2d unit tests and pass all unit tests

Co-authored-by: Qingyi Liu <liuqingyi1993@gmail.com>
2022-10-28 11:07:45 +08:00