Commit Graph

467 Commits

Author SHA1 Message Date
Shintaro Iwasaki
13669b46a6 [DOCS] Correct spelling (#665)
This PR corrects spelling like #664 for Triton-MLIR. It should not break anything.
2022-09-16 15:07:34 -07:00
Shintaro Iwasaki
e9e1a4e682 [FRONTEND] Fix the implicit broadcasting rule (#663)
This PR solves the cast issue that appears in some tutorial code.
2022-09-16 10:49:15 -07:00
Philippe Tillet
80e3fb5270 [CI] Now using clang-format from pip (#662) 2022-09-15 16:24:37 -07:00
Shintaro Iwasaki
43be75ad42 [FRONTEND] Add scalar type support for some ops (#661)
This PR adds basic support for scalar-type inputs to some ops (cast and pointer arithmetics) for Triton-MLIR. Also renames getelementptr -> addptr
2022-09-15 16:12:52 -07:00
Yan Chunwei
a9464f4993 [Backend] Vectorize Load/Store Ops (#86)
This PR does the following things:

- Code refactoring on Load and Store op codegen, rewrite with same logic
and share much code
- Support the vectorized load/store
2022-09-06 12:28:09 -07:00
Shintaro Iwasaki
3c635449e5 [Triton] Support math and libdevice ops (#91)
This PR adds basic math ops by using `MathDialect` and `libdevice` ops by using `extern_elementwise`. This is needed to compile some tutorial code (e.g., `softmax`). This PR implements only interface till PTX (so from frontend to TritonGPU-MLIR) 
- Currently till TritonGPU. It cannot be lowered to PTX now.
- No special optimizations (e.g., constant folding etc) are applied.
  - 14.x does not define folders for many operators for math ops, but 15.x seems to increase its coverage: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.0-rc3/mlir/include/mlir/Dialect/Math/IR/MathOps.td
  - No constant folding etc for `libdevice` ops.

```py
import triton
import triton.language as tl
import sys

@triton.jit
def add_kernel(
    x_ptr,
    y_ptr,
    BLOCK_SIZE: tl.constexpr,
):
    offsets = tl.arange(0, BLOCK_SIZE)
    x = tl.load(x_ptr + offsets)
    x = tl.sin(x)
    output = tl.libdevice.sin(x)
    output = tl.libdevice.fdiv_rn(output, output)
    output = tl.libdevice.fmaf_rd(output, output, output)
    tl.store(y_ptr + offsets, output)


if __name__ == "__main__" and len(sys.argv) >= 2:
    signature = "*fp32,*fp32"
    constants = {'BLOCK_SIZE': 1024}
    output = triton.compile(add_kernel, signature, device=0, constants=constants, output="ttgir")
    print(output)
```
->
```llvm
#blocked = #triton_gpu.blocked<{sizePerThread = [1], threadsPerWarp = [32], warpsPerCTA = [4], order = [0]}>
module attributes {"triton_gpu.num-warps" = 4 : i32} {
  func @add_kernel__Pfp32_Pfp32__2c1024(%arg0: !tt.ptr<f32>, %arg1: !tt.ptr<f32>) {
    %0 = tt.make_range {end = 1024 : i32, start = 0 : i32} : tensor<1024xi32, #blocked>
    %1 = tt.splat %arg0 : (!tt.ptr<f32>) -> tensor<1024x!tt.ptr<f32>, #blocked>
    %2 = tt.getelementptr %1, %0 : tensor<1024x!tt.ptr<f32>, #blocked>
    %3 = tt.load %2 {cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<1024xf32, #blocked>
    %4 = math.sin %3 : tensor<1024xf32, #blocked>
    %5 = tt.ext_elemwise %4 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_sinf"} : tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked>
    %6 = tt.ext_elemwise %5, %5 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_fdiv_rn"} : tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked>
    %7 = tt.ext_elemwise %6, %6, %6 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_fmaf_rd"} : tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked>
    %8 = tt.splat %arg1 : (!tt.ptr<f32>) -> tensor<1024x!tt.ptr<f32>, #blocked>
    %9 = tt.getelementptr %8, %0 : tensor<1024x!tt.ptr<f32>, #blocked>
    tt.store %9, %7 : tensor<1024xf32, #blocked>
    return
  }
}
```
2022-09-01 16:34:27 -07:00
Shintaro Iwasaki
d01353de07 [CI] add assert-enabled MLIR option (#78)
This deprecates the use of release-build LLVM hosted by the LLVM project, which makes debugging harder for developers.

This PR implements the following solution:
1. Create LLVM release tarballs with assert enabled on our own (using Docker)
2. Host them in our own GitHub repositories
3. Use our LLVM for CI and/or development if `TRITON_USE_ASSERT_ENABLED_LLVM=1` is set.
2022-08-31 18:55:32 -07:00
goostavz
bedbf221c0 [BACKEND] Support optional mask in TritonGPUToLLVM (#80)
Co-authored-by: gzhu <gzhu@nvidia.com>
2022-08-24 17:51:37 -07:00
Shintaro Iwasaki
0ebef11c77 [TritonIR] Make mask operand optional (#74) 2022-08-22 22:00:17 -07:00
goostavz
de2dd04c8a [BACKEND] two minor bugfix on StoreOpLowering and kernel launch & support optional other in LoadOpLowering (#69)
* [BACKEND] two minor bugfix on StoreOpLowering and kernel launch & support optional other in LoadOpLowering

* Clean code

Co-authored-by: goostavz <gzhu@nvidia.com>
Co-authored-by: Yan Chunwei <yanchunwei@outlook.com>
2022-08-22 21:47:09 -07:00
Yan Chunwei
10ba51c3bb [FRONTEND] add python e2e launch empty kernel test (#68) 2022-08-19 10:46:01 -07:00
Shintaro Iwasaki
9aa00249a6 [TritonIR] make other optional and remove isOtherUnspecified (#67)
[Triton] make other optional and remove isOtherUnspecified
2022-08-18 18:19:55 -07:00
Philippe Tillet
192be76b3c [OPTIMIZER] Rewrite patterns for layout conversions (#64) 2022-08-18 12:49:37 -07:00
Shintaro Iwasaki
d69ce77b19 [FRONTEND] add an attr for masked load without explicit other (#55) 2022-08-18 09:51:37 -07:00
goostavz
fc58250a06 [BACKEND] Add backend support of arith::AddIOp, arith::AddFOp, GetProgramIdOp & GEPOp and bugfix for SplatOp, StoreOp, FuncOp (#60)
Add backend support of arith::AddIOp, arith::AddFOp, GetProgramIdOp, GEPOp and bugfix for SplatOp, StoreOp, FuncOp

Co-authored-by: gzhu <gzhu@nvidia.com>
2022-08-18 20:46:45 +08:00
Yan Chunwei
b1673caaf6 [FRONTEND] Expose end-to-end compile to python frontend (#58) 2022-08-17 10:42:48 -07:00
Da Yan
e5ec8e16ea [BUILD] Fix setup.py (#45) 2022-08-13 16:38:31 -07:00
Philippe Tillet
78ebbe24c7 [FRONTEND] Added ExpandDimsOp primitive (#36) 2022-08-04 18:41:06 -07:00
Yan Chunwei
b988bae813 Init TritonGPU to LLVM dialect conversion (#32)
* add toLLVM pass

* update num-warps setting in mlir
2022-08-04 10:15:45 +08:00
Philippe Tillet
3236642e8f [OPTIMIZER] Added memory coalescing pass (#31) 2022-07-31 20:59:31 -07:00
Philippe Tillet
d1593e6ca8 [TritonGPU] Improved documentation and semantics of layout encodings (#30) 2022-07-31 13:59:44 -07:00
Philippe Tillet
432c3df265 [BUILD] MacOS can now build compiler and run MLIR tests (#25) 2022-07-27 01:32:10 -07:00
Philippe Tillet
6d62d88d4f [CI] run clang-format (#24) 2022-07-26 17:25:03 -07:00
Philippe Tillet
25357083e6 [CI] Added basic CI skeletons (#23)
Includes minor fixes to make things compile and pass static checks properly
2022-07-26 14:16:30 -07:00
Philippe Tillet
3265e0df5a [PYTHON] Cleaned up legacy code; added simple standalone compilation API (#22) 2022-07-26 11:06:45 -07:00
Keren Zhou
7eda373a12 Add lit dependency (#9) 2022-07-24 19:14:52 -07:00
Philippe Tillet
a633d2b403 [Analysis] Added Axis Info Analysis (#8) 2022-07-19 13:38:48 -07:00
Yan Da
35736aa44e more progress on the testing infrastructure 2022-06-12 15:14:45 +08:00
Yan Da
0ee6e486f8 add cse pass to the pipeline & pass num-warps as an argument 2022-06-10 17:31:48 +08:00
Yan Da
366dddc3bc update mma encoding & triton-opt 2022-06-06 21:03:58 +08:00
Yan Da
9b670cfb9f Add ReduceOp 2022-05-25 14:15:36 +08:00
Yan Da
a2c9f919a8 TritonGPU verifier 2022-05-24 19:48:56 +08:00
Yan Da
36c45ec687 make numStages an option in PipelinePass 2022-05-23 12:47:55 +08:00
Yan Da
39b1235082 fix atomic_cas 2022-05-22 19:43:04 +08:00
Yan Da
e3916c3a46 TritonGPU combiner 2022-05-16 19:16:01 +08:00
Yan Da
7027af9666 The pipeline pass is now functional 2022-05-15 22:29:27 +08:00
Yan Da
d23d7b244c More on the pipeline pass 2022-05-11 20:31:08 +08:00
Phil Tillet
2c6a213131 [TRITONGPU] Added template for Triton -> TritonGPU conversion 2022-04-30 16:00:39 -07:00
Yan Da
8dfe78f6cf Add TritonCombineOps 2022-04-27 19:28:21 +08:00
Yan Da
c70f6b666e Merge previous changes 2022-04-27 14:06:55 +08:00
Yan Da
74585fb970 Add Triton CombineOps 2022-04-27 13:45:56 +08:00
Philippe Tillet
81001d318c Putting Triton dialect in its own folder 2022-04-26 14:39:27 -07:00
Yan Da
9e304cf79d Allow JITFunction to return multiple results 2022-04-15 15:38:19 +08:00
Yan Da
1c52bd587d Device function & PassManager 2022-04-15 14:41:57 +08:00
Keren Zhou
f51e0b1be4 [FRONTEND] Hot fix for lineno (#481)
Override __reduce__ to make CompilationError pickable and print out error messages
2022-04-12 13:02:33 +08:00
Yan Da
7e0fd97965 Add set_attr(...) to ir.OpState 2022-04-11 12:26:54 +08:00
Yan Da
4eb062f313 fix issues in visit_If 2022-04-10 16:28:45 +08:00
Yan Da
fcbbb3c10e Fix visit_While issues 2022-04-10 16:16:13 +08:00
Yan Da
19f81b7dea Add scf-codegen tests 2022-04-10 15:49:09 +08:00
Yan Da
9c7b3d5173 Manage insertion block with context manager 2022-04-10 15:02:12 +08:00