donproc
f63be0e9b5
[TRITON-MLIR][BACKEND]support atomic_cas ( #914 )
...
1. support atomics-cas
2. add xchg support in atomic_rmw
Co-authored-by: dongdongl <dongdongl@nvidia.com >
2022-11-25 12:02:08 +08:00
donproc
5eee738df7
[Triton-MLIR][FRONTEND] [BACKEND] fix atomics ( #879 )
...
minor fix to backend and frontend of atomics, we can pass 1 test without
mask and the shape aligned with CTA size now
Co-authored-by: dongdongl <dongdongl@nvidia.com >
2022-11-16 12:25:15 +08:00
Chenggang Zhao
57fd1864a7
[Triton-MLIR] Support FP8 ( #864 )
...
Co-authored-by: Superjomn <yanchunwei@outlook.com >
2022-11-10 15:53:06 +08:00
Da Yan
4946167241
[Triton-MLIR] tt.dot
operands now must have DotOperand layout; also added prefetch pass prototype ( #712 )
...
Co-authored-by: Jokeren <kerenzhou@openai.com >
Co-authored-by: Phil Tillet <phil@openai.com >
Co-authored-by: Superjomn <yanchunwei@outlook.com >
2022-11-10 05:57:27 +00:00
Philippe Tillet
a4ff0c362c
[FRONTEND] Fix issues with atomics ( #849 )
2022-11-06 20:52:11 -08:00
ben-zhang-609
5feb6e24f9
[Triton-MLIR]Add ptx vprintf support ( #825 )
...
Not know how to write unit test for this feature.
Co-authored-by: Yan Chunwei <yanchunwei@outlook.com >
2022-11-02 16:39:09 +08:00
Ian Bearman
f2106d0aa2
[BUILD] Fix Warnings and Enable Warnings as Errors ( #794 )
2022-10-28 12:36:09 -07:00
Philippe Tillet
3e6cc6d66c
[FRONTEND] Made more tests pass ( #805 )
2022-10-26 17:47:33 -07:00
Philippe Tillet
a2cbe7af91
[FRONTEND] Enhanced support for binary operators ( #801 )
...
Disabled modulo test (due to change in behavior for `frem` in nvptx
between llvm-11 and llvm-14) and bfloat16 (will require some work to
emulate in software similar to how it's done in `master`)
2022-10-24 19:47:01 -07:00
Philippe Tillet
bb0f9235d1
[OPTIMIZER] Made layout simplification pass efficient for fused attention kernels ( #790 )
2022-10-21 16:52:15 -07:00
Shintaro Iwasaki
5898352f97
[Triton-IR] Fix LoadOp definition ( #771 ) ( #777 )
2022-10-13 18:53:00 -07:00
Da Yan
963d031247
[Triton-IR] Fix LoadOp Triton->TritonGPU conversion ( #775 )
2022-10-13 12:57:39 -07:00
Philippe Tillet
623c99609f
[Triton-IR] Added type inference and verifier for Triton-IR operations ( #767 )
2022-10-11 18:16:41 -07:00
Shintaro Iwasaki
43be75ad42
[FRONTEND] Add scalar type support for some ops ( #661 )
...
This PR adds basic support for scalar-type inputs to some ops (cast and pointer arithmetics) for Triton-MLIR. Also renames getelementptr -> addptr
2022-09-15 16:12:52 -07:00
Shintaro Iwasaki
3c635449e5
[Triton] Support math and libdevice ops ( #91 )
...
This PR adds basic math ops by using `MathDialect` and `libdevice` ops by using `extern_elementwise`. This is needed to compile some tutorial code (e.g., `softmax`). This PR implements only interface till PTX (so from frontend to TritonGPU-MLIR)
- Currently till TritonGPU. It cannot be lowered to PTX now.
- No special optimizations (e.g., constant folding etc) are applied.
- 14.x does not define folders for many operators for math ops, but 15.x seems to increase its coverage: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.0-rc3/mlir/include/mlir/Dialect/Math/IR/MathOps.td
- No constant folding etc for `libdevice` ops.
```py
import triton
import triton.language as tl
import sys
@triton.jit
def add_kernel(
x_ptr,
y_ptr,
BLOCK_SIZE: tl.constexpr,
):
offsets = tl.arange(0, BLOCK_SIZE)
x = tl.load(x_ptr + offsets)
x = tl.sin(x)
output = tl.libdevice.sin(x)
output = tl.libdevice.fdiv_rn(output, output)
output = tl.libdevice.fmaf_rd(output, output, output)
tl.store(y_ptr + offsets, output)
if __name__ == "__main__" and len(sys.argv) >= 2:
signature = "*fp32,*fp32"
constants = {'BLOCK_SIZE': 1024}
output = triton.compile(add_kernel, signature, device=0, constants=constants, output="ttgir")
print(output)
```
->
```llvm
#blocked = #triton_gpu.blocked<{sizePerThread = [1], threadsPerWarp = [32], warpsPerCTA = [4], order = [0]}>
module attributes {"triton_gpu.num-warps" = 4 : i32} {
func @add_kernel__Pfp32_Pfp32__2c1024(%arg0: !tt.ptr<f32>, %arg1: !tt.ptr<f32>) {
%0 = tt.make_range {end = 1024 : i32, start = 0 : i32} : tensor<1024xi32, #blocked>
%1 = tt.splat %arg0 : (!tt.ptr<f32>) -> tensor<1024x!tt.ptr<f32>, #blocked>
%2 = tt.getelementptr %1, %0 : tensor<1024x!tt.ptr<f32>, #blocked>
%3 = tt.load %2 {cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<1024xf32, #blocked>
%4 = math.sin %3 : tensor<1024xf32, #blocked>
%5 = tt.ext_elemwise %4 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_sinf"} : tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked>
%6 = tt.ext_elemwise %5, %5 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_fdiv_rn"} : tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked>
%7 = tt.ext_elemwise %6, %6, %6 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_fmaf_rd"} : tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked>
%8 = tt.splat %arg1 : (!tt.ptr<f32>) -> tensor<1024x!tt.ptr<f32>, #blocked>
%9 = tt.getelementptr %8, %0 : tensor<1024x!tt.ptr<f32>, #blocked>
tt.store %9, %7 : tensor<1024xf32, #blocked>
return
}
}
```
2022-09-01 16:34:27 -07:00
goostavz
bedbf221c0
[BACKEND] Support optional mask in TritonGPUToLLVM ( #80 )
...
Co-authored-by: gzhu <gzhu@nvidia.com >
2022-08-24 17:51:37 -07:00
Shintaro Iwasaki
0ebef11c77
[TritonIR] Make mask operand optional ( #74 )
2022-08-22 22:00:17 -07:00
Shintaro Iwasaki
9aa00249a6
[TritonIR] make other optional and remove isOtherUnspecified ( #67 )
...
[Triton] make other optional and remove isOtherUnspecified
2022-08-18 18:19:55 -07:00
Philippe Tillet
192be76b3c
[OPTIMIZER] Rewrite patterns for layout conversions ( #64 )
2022-08-18 12:49:37 -07:00
Shintaro Iwasaki
d69ce77b19
[FRONTEND] add an attr for masked load without explicit other ( #55 )
2022-08-18 09:51:37 -07:00
Yan Chunwei
83ef74f248
[BACKEND] Extracting numWarps from tritonGPU module ( #39 )
2022-08-08 09:40:20 -07:00
Philippe Tillet
78ebbe24c7
[FRONTEND] Added ExpandDimsOp
primitive ( #36 )
2022-08-04 18:41:06 -07:00
Philippe Tillet
d1593e6ca8
[TritonGPU] Improved documentation and semantics of layout encodings ( #30 )
2022-07-31 13:59:44 -07:00
Philippe Tillet
6d62d88d4f
[CI] run clang-format ( #24 )
2022-07-26 17:25:03 -07:00
Philippe Tillet
a633d2b403
[Analysis] Added Axis Info Analysis ( #8 )
2022-07-19 13:38:48 -07:00
Yan Da
9d1b5e3f79
special encoding for broadcast
2022-06-18 21:16:45 +08:00
Yan Da
53cf93ce6a
Revert "Remove TypeConverter from TritonToTritonGPU conversion"
...
This reverts commit 64d0b87ef0
.
2022-06-18 14:57:41 +08:00
Yan Da
64d0b87ef0
Remove TypeConverter from TritonToTritonGPU conversion
2022-06-18 14:34:59 +08:00
Yan Da
117a402c1b
more comments to TypeConverter & update warpTileSize
2022-06-08 16:20:07 +08:00
Yan Da
7b09b5f9e9
the pipeline pass now generates and accepts valid IR
2022-06-07 19:34:59 +08:00
Yan Da
9308e9c90c
A more general pipeliner
2022-05-25 21:52:51 +08:00
Yan Da
e6f89a5777
Fix ReduceOp conversion
2022-05-25 16:03:06 +08:00
Yan Da
9b670cfb9f
Add ReduceOp
2022-05-25 14:15:36 +08:00
Yan Da
96876a46d1
More progress on Triton=>TritonGPU conversion (works for matmul)
2022-05-09 21:19:53 +08:00
Yan Da
0c5319eed9
More progress on SCF type conversion
2022-05-05 20:56:55 +08:00
Yan Da
26c59e4718
More on SCF conversion
2022-05-04 21:50:32 +08:00
Yan Da
a96fe07e1c
DotOp conversion
2022-05-04 15:56:24 +08:00
Yan Da
2d281cbc0a
ConstantOp conversion pattern
2022-05-04 15:35:43 +08:00
Yan Da
b9279d2e3b
More progress on TritonGPU conversion
2022-05-04 14:54:31 +08:00
Yan Da
3ad7bee35e
More conversion patterns
2022-05-04 12:50:02 +08:00
Yan Da
5f08e2fdae
More arith patterns
2022-05-02 22:31:29 +08:00
Yan Da
75d32e2442
More on TritonGPU conversion
2022-05-02 21:51:00 +08:00
Yan Da
1428185c9c
More progress on TritonGPUTypeConverter & TritonGPUConversionTarget
2022-05-01 22:06:54 +08:00
Yan Da
4ece9fd1f3
Move dependentDialects from .cpp to .td
2022-05-01 13:06:51 +08:00
Phil Tillet
d9017f8593
add basic template for legalizing arithmetic op
2022-04-30 20:42:25 -07:00
Phil Tillet
2c6a213131
[TRITONGPU] Added template for Triton -> TritonGPU conversion
2022-04-30 16:00:39 -07:00