triton

Files

Keren Zhou 328b87aec6 Keren/tensor slice insert alloc (#94 )

This branch defines three new triton_gpu operations to partially solve #87. Below is an overview:

```
%tensor = triton_gpu.alloc_tensor : tensor<2x16x16xf16, #A>
%b = triton_gpu.insert_slice_async %a_ptr, %tensor, %offset {axis = 0 : i32, cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<16x16x!tt.ptr<f16>, #AL> -> tensor<2x16x16xf16, #A>
%c = triton_gpu.extract_slice %b, %offset {axis = 0 : i32} : tensor<2x16x16xf16, #A> -> tensor<16x16xf16, #A>
```

We plan to fully replace `copy_async` with `insert_slice_async`. **This hasn't been done yet.**

2022-09-01 12:37:17 -07:00

Triton

[TritonIR] simplify Load/StoreOps when mask is true/false (#79 )

2022-08-24 12:55:49 -07:00

TritonGPU

Keren/tensor slice insert alloc (#94 )

2022-09-01 12:37:17 -07:00

CMakeLists.txt

more progress on TritonGPU

2022-04-28 18:51:31 +08:00