Keren/tensor slice insert alloc (#94)

This branch defines three new triton_gpu operations to partially solve #87. Below is an overview: ``` %tensor = triton_gpu.alloc_tensor : tensor<2x16x16xf16, #A> %b = triton_gpu.insert_slice_async %a_ptr, %tensor, %offset {axis = 0 : i32, cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<16x16x!tt.ptr<f16>, #AL> -> tensor<2x16x16xf16, #A> %c = triton_gpu.extract_slice %b, %offset {axis = 0 : i32} : tensor<2x16x16xf16, #A> -> tensor<16x16xf16, #A> ``` We plan to fully replace `copy_async` with `insert_slice_async`. **This hasn't been done yet.**
2022-09-01 12:37:17 -07:00
parent d01353de07
commit 328b87aec6
10 changed files with 260 additions and 40 deletions
--- a/bin/triton-translate.cpp
+++ b/bin/triton-translate.cpp
@@ -37,8 +37,8 @@ OwningOpRef<ModuleOp> loadMLIRModule(llvm::StringRef inputFilename,

  mlir::DialectRegistry registry;
  registry
-      .insert<TritonDialect, gpu::TritonGPUDialect, arith::ArithmeticDialect,
-              StandardOpsDialect, scf::SCFDialect>();
+      .insert<TritonDialect, triton::gpu::TritonGPUDialect,
+              arith::ArithmeticDialect, StandardOpsDialect, scf::SCFDialect>();

  context.appendDialectRegistry(registry);