[Triton-MLIR][Backend] Fix many problems to get the pipeline working (#809)
1. Rewrite code generation of insert_slice_async. 2. Correct the wrong index passed to extract_slice in pipeline. 3. Add a prologue in pipeline to wait for dangling cp.asyncs. 4. Move scf to cf conversion inside TritonGPUToLLVM because we need to perform membar before scf to cf. It shouldn't be a technical limitation and could be improved by a more general membar analysis. 5. Use an attribute to memoize the shared memory size and support dynamic shared memory. 6. Prevent the combine pass to reorder insert_slice and extract_slice across async_wait Co-authored-by: Superjomn <yanchunwei@outlook.com>
This commit is contained in:
@@ -292,13 +292,11 @@ struct PTXCpAsyncWaitGroupInstr : public PTXCpAsyncInstrBase {
|
||||
|
||||
struct PTXCpAsyncLoadInstr : public PTXCpAsyncInstrBase {
|
||||
explicit PTXCpAsyncLoadInstr(PTXBuilder *builder,
|
||||
triton::CacheModifier modifier,
|
||||
triton::EvictionPolicy policy)
|
||||
triton::CacheModifier modifier)
|
||||
: PTXCpAsyncInstrBase(builder) {
|
||||
o(triton::stringifyCacheModifier(modifier).str());
|
||||
o("shared");
|
||||
o("global");
|
||||
o("L2::" + triton::stringifyEvictionPolicy(policy).str());
|
||||
}
|
||||
};
|
||||
|
||||
|
Reference in New Issue
Block a user