triton

Author	SHA1	Message	Date
donproc	f63be0e9b5	[TRITON-MLIR][BACKEND]support atomic_cas (#914 ) 1. support atomics-cas 2. add xchg support in atomic_rmw Co-authored-by: dongdongl <dongdongl@nvidia.com>	2022-11-25 12:02:08 +08:00
donproc	5eee738df7	[Triton-MLIR][FRONTEND] [BACKEND] fix atomics (#879 ) minor fix to backend and frontend of atomics, we can pass 1 test without mask and the shape aligned with CTA size now Co-authored-by: dongdongl <dongdongl@nvidia.com>	2022-11-16 12:25:15 +08:00
Chenggang Zhao	57fd1864a7	[Triton-MLIR] Support FP8 (#864 ) Co-authored-by: Superjomn <yanchunwei@outlook.com>	2022-11-10 15:53:06 +08:00
Da Yan	4946167241	[Triton-MLIR] `tt.dot` operands now must have DotOperand layout; also added prefetch pass prototype (#712 ) Co-authored-by: Jokeren <kerenzhou@openai.com> Co-authored-by: Phil Tillet <phil@openai.com> Co-authored-by: Superjomn <yanchunwei@outlook.com>	2022-11-10 05:57:27 +00:00
Philippe Tillet	a4ff0c362c	[FRONTEND] Fix issues with atomics (#849 )	2022-11-06 20:52:11 -08:00
ben-zhang-609	5feb6e24f9	[Triton-MLIR]Add ptx vprintf support (#825 ) Not know how to write unit test for this feature. Co-authored-by: Yan Chunwei <yanchunwei@outlook.com>	2022-11-02 16:39:09 +08:00
Ian Bearman	f2106d0aa2	[BUILD] Fix Warnings and Enable Warnings as Errors (#794 )	2022-10-28 12:36:09 -07:00
Philippe Tillet	3e6cc6d66c	[FRONTEND] Made more tests pass (#805 )	2022-10-26 17:47:33 -07:00
Philippe Tillet	a2cbe7af91	[FRONTEND] Enhanced support for binary operators (#801 ) Disabled modulo test (due to change in behavior for `frem` in nvptx between llvm-11 and llvm-14) and bfloat16 (will require some work to emulate in software similar to how it's done in `master`)	2022-10-24 19:47:01 -07:00
Philippe Tillet	bb0f9235d1	[OPTIMIZER] Made layout simplification pass efficient for fused attention kernels (#790 )	2022-10-21 16:52:15 -07:00
Shintaro Iwasaki	5898352f97	[Triton-IR] Fix LoadOp definition (#771 ) (#777 )	2022-10-13 18:53:00 -07:00
Da Yan	963d031247	[Triton-IR] Fix LoadOp Triton->TritonGPU conversion (#775 )	2022-10-13 12:57:39 -07:00
Philippe Tillet	623c99609f	[Triton-IR] Added type inference and verifier for Triton-IR operations (#767 )	2022-10-11 18:16:41 -07:00
Shintaro Iwasaki	43be75ad42	[FRONTEND] Add scalar type support for some ops (#661 ) This PR adds basic support for scalar-type inputs to some ops (cast and pointer arithmetics) for Triton-MLIR. Also renames getelementptr -> addptr	2022-09-15 16:12:52 -07:00
Shintaro Iwasaki	3c635449e5	[Triton] Support math and libdevice ops (#91 ) This PR adds basic math ops by using `MathDialect` and `libdevice` ops by using `extern_elementwise`. This is needed to compile some tutorial code (e.g., `softmax`). This PR implements only interface till PTX (so from frontend to TritonGPU-MLIR) - Currently till TritonGPU. It cannot be lowered to PTX now. - No special optimizations (e.g., constant folding etc) are applied. - 14.x does not define folders for many operators for math ops, but 15.x seems to increase its coverage: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.0-rc3/mlir/include/mlir/Dialect/Math/IR/MathOps.td - No constant folding etc for `libdevice` ops. ```py import triton import triton.language as tl import sys @triton.jit def add_kernel( x_ptr, y_ptr, BLOCK_SIZE: tl.constexpr, ): offsets = tl.arange(0, BLOCK_SIZE) x = tl.load(x_ptr + offsets) x = tl.sin(x) output = tl.libdevice.sin(x) output = tl.libdevice.fdiv_rn(output, output) output = tl.libdevice.fmaf_rd(output, output, output) tl.store(y_ptr + offsets, output) if __name__ == "__main__" and len(sys.argv) >= 2: signature = "fp32,fp32" constants = {'BLOCK_SIZE': 1024} output = triton.compile(add_kernel, signature, device=0, constants=constants, output="ttgir") print(output) ``` -> ```llvm #blocked = #triton_gpu.blocked<{sizePerThread = [1], threadsPerWarp = [32], warpsPerCTA = [4], order = [0]}> module attributes {"triton_gpu.num-warps" = 4 : i32} { func @add_kernel__Pfp32_Pfp32__2c1024(%arg0: !tt.ptr<f32>, %arg1: !tt.ptr<f32>) { %0 = tt.make_range {end = 1024 : i32, start = 0 : i32} : tensor<1024xi32, #blocked> %1 = tt.splat %arg0 : (!tt.ptr<f32>) -> tensor<1024x!tt.ptr<f32>, #blocked> %2 = tt.getelementptr %1, %0 : tensor<1024x!tt.ptr<f32>, #blocked> %3 = tt.load %2 {cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<1024xf32, #blocked> %4 = math.sin %3 : tensor<1024xf32, #blocked> %5 = tt.ext_elemwise %4 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_sinf"} : tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked> %6 = tt.ext_elemwise %5, %5 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_fdiv_rn"} : tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked> %7 = tt.ext_elemwise %6, %6, %6 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_fmaf_rd"} : tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked> %8 = tt.splat %arg1 : (!tt.ptr<f32>) -> tensor<1024x!tt.ptr<f32>, #blocked> %9 = tt.getelementptr %8, %0 : tensor<1024x!tt.ptr<f32>, #blocked> tt.store %9, %7 : tensor<1024xf32, #blocked> return } } ```	2022-09-01 16:34:27 -07:00
goostavz	bedbf221c0	[BACKEND] Support optional mask in TritonGPUToLLVM (#80 ) Co-authored-by: gzhu <gzhu@nvidia.com>	2022-08-24 17:51:37 -07:00
Shintaro Iwasaki	0ebef11c77	[TritonIR] Make mask operand optional (#74 )	2022-08-22 22:00:17 -07:00
Shintaro Iwasaki	9aa00249a6	[TritonIR] make other optional and remove isOtherUnspecified (#67 ) [Triton] make other optional and remove isOtherUnspecified	2022-08-18 18:19:55 -07:00
Philippe Tillet	192be76b3c	[OPTIMIZER] Rewrite patterns for layout conversions (#64 )	2022-08-18 12:49:37 -07:00
Shintaro Iwasaki	d69ce77b19	[FRONTEND] add an attr for masked load without explicit other (#55 )	2022-08-18 09:51:37 -07:00
Yan Chunwei	83ef74f248	[BACKEND] Extracting numWarps from tritonGPU module (#39 )	2022-08-08 09:40:20 -07:00
Philippe Tillet	78ebbe24c7	[FRONTEND] Added `ExpandDimsOp` primitive (#36 )	2022-08-04 18:41:06 -07:00
Philippe Tillet	d1593e6ca8	[TritonGPU] Improved documentation and semantics of layout encodings (#30 )	2022-07-31 13:59:44 -07:00
Philippe Tillet	6d62d88d4f	[CI] run clang-format (#24 )	2022-07-26 17:25:03 -07:00
Philippe Tillet	a633d2b403	[Analysis] Added Axis Info Analysis (#8 )	2022-07-19 13:38:48 -07:00
Yan Da	9d1b5e3f79	special encoding for broadcast	2022-06-18 21:16:45 +08:00
Yan Da	53cf93ce6a	Revert "Remove TypeConverter from TritonToTritonGPU conversion" This reverts commit `64d0b87ef0`.	2022-06-18 14:57:41 +08:00
Yan Da	64d0b87ef0	Remove TypeConverter from TritonToTritonGPU conversion	2022-06-18 14:34:59 +08:00
Yan Da	117a402c1b	more comments to TypeConverter & update warpTileSize	2022-06-08 16:20:07 +08:00
Yan Da	7b09b5f9e9	the pipeline pass now generates and accepts valid IR	2022-06-07 19:34:59 +08:00
Yan Da	9308e9c90c	A more general pipeliner	2022-05-25 21:52:51 +08:00
Yan Da	e6f89a5777	Fix ReduceOp conversion	2022-05-25 16:03:06 +08:00
Yan Da	9b670cfb9f	Add ReduceOp	2022-05-25 14:15:36 +08:00
Yan Da	96876a46d1	More progress on Triton=>TritonGPU conversion (works for matmul)	2022-05-09 21:19:53 +08:00
Yan Da	0c5319eed9	More progress on SCF type conversion	2022-05-05 20:56:55 +08:00
Yan Da	26c59e4718	More on SCF conversion	2022-05-04 21:50:32 +08:00
Yan Da	a96fe07e1c	DotOp conversion	2022-05-04 15:56:24 +08:00
Yan Da	2d281cbc0a	ConstantOp conversion pattern	2022-05-04 15:35:43 +08:00
Yan Da	b9279d2e3b	More progress on TritonGPU conversion	2022-05-04 14:54:31 +08:00
Yan Da	3ad7bee35e	More conversion patterns	2022-05-04 12:50:02 +08:00
Yan Da	5f08e2fdae	More arith patterns	2022-05-02 22:31:29 +08:00
Yan Da	75d32e2442	More on TritonGPU conversion	2022-05-02 21:51:00 +08:00
Yan Da	1428185c9c	More progress on TritonGPUTypeConverter & TritonGPUConversionTarget	2022-05-01 22:06:54 +08:00
Yan Da	4ece9fd1f3	Move dependentDialects from .cpp to .td	2022-05-01 13:06:51 +08:00
Phil Tillet	d9017f8593	add basic template for legalizing arithmetic op	2022-04-30 20:42:25 -07:00
Phil Tillet	2c6a213131	[TRITONGPU] Added template for Triton -> TritonGPU conversion	2022-04-30 16:00:39 -07:00

46 Commits