Shintaro Iwasaki
3c635449e5
[Triton] Support math and libdevice ops ( #91 )
...
This PR adds basic math ops by using `MathDialect` and `libdevice` ops by using `extern_elementwise`. This is needed to compile some tutorial code (e.g., `softmax`). This PR implements only interface till PTX (so from frontend to TritonGPU-MLIR)
- Currently till TritonGPU. It cannot be lowered to PTX now.
- No special optimizations (e.g., constant folding etc) are applied.
- 14.x does not define folders for many operators for math ops, but 15.x seems to increase its coverage: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.0-rc3/mlir/include/mlir/Dialect/Math/IR/MathOps.td
- No constant folding etc for `libdevice` ops.
```py
import triton
import triton.language as tl
import sys
@triton.jit
def add_kernel(
x_ptr,
y_ptr,
BLOCK_SIZE: tl.constexpr,
):
offsets = tl.arange(0, BLOCK_SIZE)
x = tl.load(x_ptr + offsets)
x = tl.sin(x)
output = tl.libdevice.sin(x)
output = tl.libdevice.fdiv_rn(output, output)
output = tl.libdevice.fmaf_rd(output, output, output)
tl.store(y_ptr + offsets, output)
if __name__ == "__main__" and len(sys.argv) >= 2:
signature = "*fp32,*fp32"
constants = {'BLOCK_SIZE': 1024}
output = triton.compile(add_kernel, signature, device=0, constants=constants, output="ttgir")
print(output)
```
->
```llvm
#blocked = #triton_gpu.blocked<{sizePerThread = [1], threadsPerWarp = [32], warpsPerCTA = [4], order = [0]}>
module attributes {"triton_gpu.num-warps" = 4 : i32} {
func @add_kernel__Pfp32_Pfp32__2c1024(%arg0: !tt.ptr<f32>, %arg1: !tt.ptr<f32>) {
%0 = tt.make_range {end = 1024 : i32, start = 0 : i32} : tensor<1024xi32, #blocked>
%1 = tt.splat %arg0 : (!tt.ptr<f32>) -> tensor<1024x!tt.ptr<f32>, #blocked>
%2 = tt.getelementptr %1, %0 : tensor<1024x!tt.ptr<f32>, #blocked>
%3 = tt.load %2 {cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<1024xf32, #blocked>
%4 = math.sin %3 : tensor<1024xf32, #blocked>
%5 = tt.ext_elemwise %4 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_sinf"} : tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked>
%6 = tt.ext_elemwise %5, %5 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_fdiv_rn"} : tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked>
%7 = tt.ext_elemwise %6, %6, %6 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_fmaf_rd"} : tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked>
%8 = tt.splat %arg1 : (!tt.ptr<f32>) -> tensor<1024x!tt.ptr<f32>, #blocked>
%9 = tt.getelementptr %8, %0 : tensor<1024x!tt.ptr<f32>, #blocked>
tt.store %9, %7 : tensor<1024xf32, #blocked>
return
}
}
```
2022-09-01 16:34:27 -07:00
Shintaro Iwasaki
d01353de07
[CI] add assert-enabled MLIR option ( #78 )
...
This deprecates the use of release-build LLVM hosted by the LLVM project, which makes debugging harder for developers.
This PR implements the following solution:
1. Create LLVM release tarballs with assert enabled on our own (using Docker)
2. Host them in our own GitHub repositories
3. Use our LLVM for CI and/or development if `TRITON_USE_ASSERT_ENABLED_LLVM=1` is set.
2022-08-31 18:55:32 -07:00
goostavz
bedbf221c0
[BACKEND] Support optional mask in TritonGPUToLLVM ( #80 )
...
Co-authored-by: gzhu <gzhu@nvidia.com >
2022-08-24 17:51:37 -07:00
Yan Chunwei
1b513c9866
[BACKEND] Refactoring codegen for LoadOp with PTXFormat ( #77 )
...
This PR does following things:
Enhance the PTXFormat by
Introducing PTXBuilder to enable multiple instructions in a single asm program
override PTXInstr's operator() method to enable instr(opr0, opr1) style of setting operands for an instruction
Refactor the PTX code used in LoadOpConversion with PTXFormat
Authored-by: goostavz <gzhu@nvidia.com >
2022-08-23 15:51:13 -07:00
Shintaro Iwasaki
0ebef11c77
[TritonIR] Make mask operand optional ( #74 )
2022-08-22 22:00:17 -07:00
goostavz
de2dd04c8a
[BACKEND] two minor bugfix on StoreOpLowering and kernel launch & support optional other in LoadOpLowering ( #69 )
...
* [BACKEND] two minor bugfix on StoreOpLowering and kernel launch & support optional other in LoadOpLowering
* Clean code
Co-authored-by: goostavz <gzhu@nvidia.com >
Co-authored-by: Yan Chunwei <yanchunwei@outlook.com >
2022-08-22 21:47:09 -07:00
Yan Chunwei
10ba51c3bb
[FRONTEND] add python e2e launch empty kernel test ( #68 )
2022-08-19 10:46:01 -07:00
Shintaro Iwasaki
9aa00249a6
[TritonIR] make other optional and remove isOtherUnspecified ( #67 )
...
[Triton] make other optional and remove isOtherUnspecified
2022-08-18 18:19:55 -07:00
Philippe Tillet
192be76b3c
[OPTIMIZER] Rewrite patterns for layout conversions ( #64 )
2022-08-18 12:49:37 -07:00
Shintaro Iwasaki
d69ce77b19
[FRONTEND] add an attr for masked load without explicit other ( #55 )
2022-08-18 09:51:37 -07:00
goostavz
fc58250a06
[BACKEND] Add backend support of arith::AddIOp, arith::AddFOp, GetProgramIdOp & GEPOp and bugfix for SplatOp, StoreOp, FuncOp ( #60 )
...
Add backend support of arith::AddIOp, arith::AddFOp, GetProgramIdOp, GEPOp and bugfix for SplatOp, StoreOp, FuncOp
Co-authored-by: gzhu <gzhu@nvidia.com >
2022-08-18 20:46:45 +08:00
Yan Chunwei
b1673caaf6
[FRONTEND] Expose end-to-end compile to python frontend ( #58 )
2022-08-17 10:42:48 -07:00
Yan Chunwei
95bbac41e7
[BACKEND] Add LLVM-translation for store and splat ops ( #47 )
2022-08-15 00:46:37 -07:00
goostavz
993ba7035a
[BACKEND] Codegen bringup, index calculation of blocked_layout & support of LoadOp, BroadcastOp, ViewOp & MakeRangeOp ( #38 )
...
Co-authored-by: gzhu <gzhu@nvidia.com >
2022-08-14 19:58:59 -07:00
Shintaro Iwasaki
2ba9a83465
[BUILD] fix minor issues with MLIR assert enabled ( #46 )
2022-08-11 21:20:47 -07:00
Yan Chunwei
83ef74f248
[BACKEND] Extracting numWarps from tritonGPU module ( #39 )
2022-08-08 09:40:20 -07:00
Yan Chunwei
920723cf3d
[BACKEND] add triton-translate to translate mlir to llvmir or PTX code ( #37 )
2022-08-07 22:34:36 -07:00
Philippe Tillet
78ebbe24c7
[FRONTEND] Added ExpandDimsOp
primitive ( #36 )
2022-08-04 18:41:06 -07:00
Yan Chunwei
b988bae813
Init TritonGPU to LLVM dialect conversion ( #32 )
...
* add toLLVM pass
* update num-warps setting in mlir
2022-08-04 10:15:45 +08:00
Philippe Tillet
d1593e6ca8
[TritonGPU] Improved documentation and semantics of layout encodings ( #30 )
2022-07-31 13:59:44 -07:00
Philippe Tillet
6d62d88d4f
[CI] run clang-format ( #24 )
2022-07-26 17:25:03 -07:00
Philippe Tillet
a633d2b403
[Analysis] Added Axis Info Analysis ( #8 )
2022-07-19 13:38:48 -07:00
Yan Da
9d1b5e3f79
special encoding for broadcast
2022-06-18 21:16:45 +08:00
Yan Da
53cf93ce6a
Revert "Remove TypeConverter from TritonToTritonGPU conversion"
...
This reverts commit 64d0b87ef0
.
2022-06-18 14:57:41 +08:00
Yan Da
64d0b87ef0
Remove TypeConverter from TritonToTritonGPU conversion
2022-06-18 14:34:59 +08:00
Yan Da
9feb256b71
op combine in Triton Dialect: broadcast(cst) -> cst
2022-06-17 16:19:47 +08:00
Yan Da
117a402c1b
more comments to TypeConverter & update warpTileSize
2022-06-08 16:20:07 +08:00
Yan Da
7b09b5f9e9
the pipeline pass now generates and accepts valid IR
2022-06-07 19:34:59 +08:00
Yan Da
9308e9c90c
A more general pipeliner
2022-05-25 21:52:51 +08:00
Yan Da
e6f89a5777
Fix ReduceOp conversion
2022-05-25 16:03:06 +08:00
Yan Da
9b670cfb9f
Add ReduceOp
2022-05-25 14:15:36 +08:00
Yan Da
1a4fbed25b
Skeleton for the pipeline pass
2022-05-11 16:13:53 +08:00
Yan Da
96876a46d1
More progress on Triton=>TritonGPU conversion (works for matmul)
2022-05-09 21:19:53 +08:00
Yan Da
0c5319eed9
More progress on SCF type conversion
2022-05-05 20:56:55 +08:00
Yan Da
26c59e4718
More on SCF conversion
2022-05-04 21:50:32 +08:00
Yan Da
a96fe07e1c
DotOp conversion
2022-05-04 15:56:24 +08:00
Yan Da
2d281cbc0a
ConstantOp conversion pattern
2022-05-04 15:35:43 +08:00
Yan Da
b9279d2e3b
More progress on TritonGPU conversion
2022-05-04 14:54:31 +08:00
Yan Da
3ad7bee35e
More conversion patterns
2022-05-04 12:50:02 +08:00
Yan Da
5f08e2fdae
More arith patterns
2022-05-02 22:31:29 +08:00
Yan Da
75d32e2442
More on TritonGPU conversion
2022-05-02 21:51:00 +08:00
Yan Da
1428185c9c
More progress on TritonGPUTypeConverter & TritonGPUConversionTarget
2022-05-01 22:06:54 +08:00
Yan Da
4ece9fd1f3
Move dependentDialects from .cpp to .td
2022-05-01 13:06:51 +08:00
Phil Tillet
d9017f8593
add basic template for legalizing arithmetic op
2022-04-30 20:42:25 -07:00
Phil Tillet
2c6a213131
[TRITONGPU] Added template for Triton -> TritonGPU conversion
2022-04-30 16:00:39 -07:00