triton

Author	SHA1	Message	Date
Shintaro Iwasaki	3c635449e5	[Triton] Support math and libdevice ops (#91 ) This PR adds basic math ops by using `MathDialect` and `libdevice` ops by using `extern_elementwise`. This is needed to compile some tutorial code (e.g., `softmax`). This PR implements only interface till PTX (so from frontend to TritonGPU-MLIR) - Currently till TritonGPU. It cannot be lowered to PTX now. - No special optimizations (e.g., constant folding etc) are applied. - 14.x does not define folders for many operators for math ops, but 15.x seems to increase its coverage: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.0-rc3/mlir/include/mlir/Dialect/Math/IR/MathOps.td - No constant folding etc for `libdevice` ops. ```py import triton import triton.language as tl import sys @triton.jit def add_kernel( x_ptr, y_ptr, BLOCK_SIZE: tl.constexpr, ): offsets = tl.arange(0, BLOCK_SIZE) x = tl.load(x_ptr + offsets) x = tl.sin(x) output = tl.libdevice.sin(x) output = tl.libdevice.fdiv_rn(output, output) output = tl.libdevice.fmaf_rd(output, output, output) tl.store(y_ptr + offsets, output) if __name__ == "__main__" and len(sys.argv) >= 2: signature = "fp32,fp32" constants = {'BLOCK_SIZE': 1024} output = triton.compile(add_kernel, signature, device=0, constants=constants, output="ttgir") print(output) ``` -> ```llvm #blocked = #triton_gpu.blocked<{sizePerThread = [1], threadsPerWarp = [32], warpsPerCTA = [4], order = [0]}> module attributes {"triton_gpu.num-warps" = 4 : i32} { func @add_kernel__Pfp32_Pfp32__2c1024(%arg0: !tt.ptr<f32>, %arg1: !tt.ptr<f32>) { %0 = tt.make_range {end = 1024 : i32, start = 0 : i32} : tensor<1024xi32, #blocked> %1 = tt.splat %arg0 : (!tt.ptr<f32>) -> tensor<1024x!tt.ptr<f32>, #blocked> %2 = tt.getelementptr %1, %0 : tensor<1024x!tt.ptr<f32>, #blocked> %3 = tt.load %2 {cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<1024xf32, #blocked> %4 = math.sin %3 : tensor<1024xf32, #blocked> %5 = tt.ext_elemwise %4 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_sinf"} : tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked> %6 = tt.ext_elemwise %5, %5 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_fdiv_rn"} : tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked> %7 = tt.ext_elemwise %6, %6, %6 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_fmaf_rd"} : tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked> %8 = tt.splat %arg1 : (!tt.ptr<f32>) -> tensor<1024x!tt.ptr<f32>, #blocked> %9 = tt.getelementptr %8, %0 : tensor<1024x!tt.ptr<f32>, #blocked> tt.store %9, %7 : tensor<1024xf32, #blocked> return } } ```	2022-09-01 16:34:27 -07:00
goostavz	bedbf221c0	[BACKEND] Support optional mask in TritonGPUToLLVM (#80 ) Co-authored-by: gzhu <gzhu@nvidia.com>	2022-08-24 17:51:37 -07:00
Shintaro Iwasaki	0ebef11c77	[TritonIR] Make mask operand optional (#74 )	2022-08-22 22:00:17 -07:00
goostavz	de2dd04c8a	[BACKEND] two minor bugfix on StoreOpLowering and kernel launch & support optional other in LoadOpLowering (#69 ) * [BACKEND] two minor bugfix on StoreOpLowering and kernel launch & support optional other in LoadOpLowering * Clean code Co-authored-by: goostavz <gzhu@nvidia.com> Co-authored-by: Yan Chunwei <yanchunwei@outlook.com>	2022-08-22 21:47:09 -07:00
Yan Chunwei	10ba51c3bb	[FRONTEND] add python e2e launch empty kernel test (#68 )	2022-08-19 10:46:01 -07:00
Shintaro Iwasaki	9aa00249a6	[TritonIR] make other optional and remove isOtherUnspecified (#67 ) [Triton] make other optional and remove isOtherUnspecified	2022-08-18 18:19:55 -07:00
Shintaro Iwasaki	d69ce77b19	[FRONTEND] add an attr for masked load without explicit other (#55 )	2022-08-18 09:51:37 -07:00
goostavz	fc58250a06	[BACKEND] Add backend support of arith::AddIOp, arith::AddFOp, GetProgramIdOp & GEPOp and bugfix for SplatOp, StoreOp, FuncOp (#60 ) Add backend support of arith::AddIOp, arith::AddFOp, GetProgramIdOp, GEPOp and bugfix for SplatOp, StoreOp, FuncOp Co-authored-by: gzhu <gzhu@nvidia.com>	2022-08-18 20:46:45 +08:00
Yan Chunwei	b1673caaf6	[FRONTEND] Expose end-to-end compile to python frontend (#58 )	2022-08-17 10:42:48 -07:00
Philippe Tillet	78ebbe24c7	[FRONTEND] Added `ExpandDimsOp` primitive (#36 )	2022-08-04 18:41:06 -07:00
Yan Chunwei	b988bae813	Init TritonGPU to LLVM dialect conversion (#32 ) * add toLLVM pass * update num-warps setting in mlir	2022-08-04 10:15:45 +08:00
Philippe Tillet	d1593e6ca8	[TritonGPU] Improved documentation and semantics of layout encodings (#30 )	2022-07-31 13:59:44 -07:00
Philippe Tillet	6d62d88d4f	[CI] run clang-format (#24 )	2022-07-26 17:25:03 -07:00
Philippe Tillet	3265e0df5a	[PYTHON] Cleaned up legacy code; added simple standalone compilation API (#22 )	2022-07-26 11:06:45 -07:00
Philippe Tillet	a633d2b403	[Analysis] Added Axis Info Analysis (#8 )	2022-07-19 13:38:48 -07:00
Yan Da	0ee6e486f8	add cse pass to the pipeline & pass num-warps as an argument	2022-06-10 17:31:48 +08:00
Yan Da	366dddc3bc	update mma encoding & triton-opt	2022-06-06 21:03:58 +08:00
Yan Da	9b670cfb9f	Add ReduceOp	2022-05-25 14:15:36 +08:00
Yan Da	a2c9f919a8	TritonGPU verifier	2022-05-24 19:48:56 +08:00
Yan Da	36c45ec687	make numStages an option in PipelinePass	2022-05-23 12:47:55 +08:00
Yan Da	e3916c3a46	TritonGPU combiner	2022-05-16 19:16:01 +08:00
Yan Da	d23d7b244c	More on the pipeline pass	2022-05-11 20:31:08 +08:00
Phil Tillet	2c6a213131	[TRITONGPU] Added template for Triton -> TritonGPU conversion	2022-04-30 16:00:39 -07:00
Yan Da	8dfe78f6cf	Add TritonCombineOps	2022-04-27 19:28:21 +08:00
Yan Da	c70f6b666e	Merge previous changes	2022-04-27 14:06:55 +08:00
Yan Da	74585fb970	Add Triton CombineOps	2022-04-27 13:45:56 +08:00
Philippe Tillet	81001d318c	Putting Triton dialect in its own folder	2022-04-26 14:39:27 -07:00
Yan Da	1c52bd587d	Device function & PassManager	2022-04-15 14:41:57 +08:00
Yan Da	7e0fd97965	Add set_attr(...) to ir.OpState	2022-04-11 12:26:54 +08:00
Yan Da	19f81b7dea	Add scf-codegen tests	2022-04-10 15:49:09 +08:00
Yan Da	6002340456	Better textual representation	2022-04-07 20:44:41 +08:00
Yan Da	62f772123c	now kernel functions return nothing (instead of none)	2022-04-07 20:22:17 +08:00
Yan Da	040a2b6c75	Fix OpBuilder	2022-04-07 20:01:31 +08:00
Yan Da	6b4da6f016	Documentation	2022-04-07 16:00:53 +08:00
Yan Da	9cf4107990	Add TensorSizeTrait	2022-04-07 15:18:43 +08:00
Yan Da	39fad2b18a	More progress on WhileOp	2022-04-05 17:55:43 +08:00
Yan Da	d7fbddc7d4	Fix ret::reference issue	2022-04-05 16:09:09 +08:00
Yan Da	c7ad928e60	More progress on WhileOp codegen	2022-04-05 15:55:48 +08:00
Yan Da	9df899b291	Some progress on visit_If	2022-04-03 22:34:46 +08:00
Yan Da	c71c50cd0c	ForOp's SSA construction	2022-04-03 19:11:47 +08:00
Yan Da	61413b8a97	More python bindings	2022-04-01 22:22:39 +08:00
Yan Da	bde103fab0	Replace MlirType with mlir::Type	2022-04-01 18:46:46 +08:00
Yan Da	4ad432f1fc	More on scf Ops	2022-03-31 21:42:48 +08:00
Yan Da	2041b67fbf	Now vecadd works	2022-03-30 20:21:47 +08:00
Yan Da	e381dc72c5	Use mlir::Block to replace MlirBlock	2022-03-30 16:31:03 +08:00
Yan Da	e95d98a886	bindings for ModuleOp	2022-03-30 13:32:52 +08:00
Yan Da	38e67b4293	Add more Ops	2022-03-28 19:50:23 +08:00
Yan Da	0d139ec460	Introducing SCF	2022-03-26 17:02:32 +08:00
Yan Da	c53f3486e4	create shr	2022-03-26 16:41:49 +08:00
Yan Da	ba16116f96	Let python manage created objects	2022-03-26 16:31:01 +08:00

1 2 3

106 Commits