Philippe Tillet
dc0588a898
[OPTIMIZER] Improved layout simplification pass so it handles swizzled layouts better ( #789 )
...
Note: uncommented `test_gemm`, since backend has an issue with swizzling. This will get uncommented in a subsequent PR.
2022-10-20 19:03:37 -07:00
Shintaro Iwasaki
0d22d2bc03
[TritonMLIR] Disallow 0D tensor ( #788 )
2022-10-19 10:34:32 -07:00
Philippe Tillet
623c99609f
[Triton-IR] Added type inference and verifier for Triton-IR operations ( #767 )
2022-10-11 18:16:41 -07:00
Yan Chunwei
555f94f9b9
[triton-mlir][BACKEND] Support masked load/store ( #657 )
...
This PR does
- fix some bugs to support masked load/store,
- refine frontend, and support the `and` and `or` syntax in mask(by
extending the BoolOp in python ast.visitor), e.g. `tl.store(...,
mask=offset<n and other_conditions)`,
- add `arith.cmpI` and `arith.cmpF` op conversion in backend(required by
mask),
- add more test cases in vecadd.
2022-10-10 13:29:53 +08:00
goostavz
1d772cd843
[Triton-MLIR][Backend] Add SCF lowering in the backend ( #750 )
2022-10-08 18:36:37 +08:00
Philippe Tillet
1e91ed30d0
[RUNTIME] Major code cleanup ( #711 )
...
This PR does the following:
- CUDA utilities (e.g., cuGetInfo) won't be compiled as part of libtriton.so anymore.
- Refactoring driver/llvm.cc to split it between PTX codegen and python.
- By extension this will also deprecate include/external so Triton won't have to live with a copy of some CUDA/Hip headers anymore.
- `triton-translate` becomes a `triton.tools.aot` Python utility that re-uses functions from the triton.compile sub-module.
2022-09-26 16:38:06 -07:00
Philippe Tillet
22ec22c257
[FRONTEND] Backport new runtime from master
( #706 )
...
This PR merges the new runtime back into the `triton-mlir` branch. This
adds caching and just-in-time compilation functionality to the
triton-mlir project, and paves the way for re-using tests from the
master branch.
2022-09-23 16:09:43 -07:00
goostavz
15bfd0cb79
[BACKEND] Support of ConvertLayoutOp from blocked to blocked and SliceLayout with blocked parent ( #658 )
2022-09-17 14:58:42 -07:00
Shintaro Iwasaki
13669b46a6
[DOCS] Correct spelling ( #665 )
...
This PR corrects spelling like #664 for Triton-MLIR. It should not break anything.
2022-09-16 15:07:34 -07:00
Shintaro Iwasaki
43be75ad42
[FRONTEND] Add scalar type support for some ops ( #661 )
...
This PR adds basic support for scalar-type inputs to some ops (cast and pointer arithmetics) for Triton-MLIR. Also renames getelementptr -> addptr
2022-09-15 16:12:52 -07:00
Yan Chunwei
a9464f4993
[Backend] Vectorize Load/Store Ops ( #86 )
...
This PR does the following things:
- Code refactoring on Load and Store op codegen, rewrite with same logic
and share much code
- Support the vectorized load/store
2022-09-06 12:28:09 -07:00
Shintaro Iwasaki
3c635449e5
[Triton] Support math and libdevice ops ( #91 )
...
This PR adds basic math ops by using `MathDialect` and `libdevice` ops by using `extern_elementwise`. This is needed to compile some tutorial code (e.g., `softmax`). This PR implements only interface till PTX (so from frontend to TritonGPU-MLIR)
- Currently till TritonGPU. It cannot be lowered to PTX now.
- No special optimizations (e.g., constant folding etc) are applied.
- 14.x does not define folders for many operators for math ops, but 15.x seems to increase its coverage: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.0-rc3/mlir/include/mlir/Dialect/Math/IR/MathOps.td
- No constant folding etc for `libdevice` ops.
```py
import triton
import triton.language as tl
import sys
@triton.jit
def add_kernel(
x_ptr,
y_ptr,
BLOCK_SIZE: tl.constexpr,
):
offsets = tl.arange(0, BLOCK_SIZE)
x = tl.load(x_ptr + offsets)
x = tl.sin(x)
output = tl.libdevice.sin(x)
output = tl.libdevice.fdiv_rn(output, output)
output = tl.libdevice.fmaf_rd(output, output, output)
tl.store(y_ptr + offsets, output)
if __name__ == "__main__" and len(sys.argv) >= 2:
signature = "*fp32,*fp32"
constants = {'BLOCK_SIZE': 1024}
output = triton.compile(add_kernel, signature, device=0, constants=constants, output="ttgir")
print(output)
```
->
```llvm
#blocked = #triton_gpu.blocked<{sizePerThread = [1], threadsPerWarp = [32], warpsPerCTA = [4], order = [0]}>
module attributes {"triton_gpu.num-warps" = 4 : i32} {
func @add_kernel__Pfp32_Pfp32__2c1024(%arg0: !tt.ptr<f32>, %arg1: !tt.ptr<f32>) {
%0 = tt.make_range {end = 1024 : i32, start = 0 : i32} : tensor<1024xi32, #blocked>
%1 = tt.splat %arg0 : (!tt.ptr<f32>) -> tensor<1024x!tt.ptr<f32>, #blocked>
%2 = tt.getelementptr %1, %0 : tensor<1024x!tt.ptr<f32>, #blocked>
%3 = tt.load %2 {cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<1024xf32, #blocked>
%4 = math.sin %3 : tensor<1024xf32, #blocked>
%5 = tt.ext_elemwise %4 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_sinf"} : tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked>
%6 = tt.ext_elemwise %5, %5 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_fdiv_rn"} : tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked>
%7 = tt.ext_elemwise %6, %6, %6 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_fmaf_rd"} : tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked>
%8 = tt.splat %arg1 : (!tt.ptr<f32>) -> tensor<1024x!tt.ptr<f32>, #blocked>
%9 = tt.getelementptr %8, %0 : tensor<1024x!tt.ptr<f32>, #blocked>
tt.store %9, %7 : tensor<1024xf32, #blocked>
return
}
}
```
2022-09-01 16:34:27 -07:00
goostavz
bedbf221c0
[BACKEND] Support optional mask in TritonGPUToLLVM ( #80 )
...
Co-authored-by: gzhu <gzhu@nvidia.com >
2022-08-24 17:51:37 -07:00
Shintaro Iwasaki
0ebef11c77
[TritonIR] Make mask operand optional ( #74 )
2022-08-22 22:00:17 -07:00
goostavz
de2dd04c8a
[BACKEND] two minor bugfix on StoreOpLowering and kernel launch & support optional other in LoadOpLowering ( #69 )
...
* [BACKEND] two minor bugfix on StoreOpLowering and kernel launch & support optional other in LoadOpLowering
* Clean code
Co-authored-by: goostavz <gzhu@nvidia.com >
Co-authored-by: Yan Chunwei <yanchunwei@outlook.com >
2022-08-22 21:47:09 -07:00
Yan Chunwei
10ba51c3bb
[FRONTEND] add python e2e launch empty kernel test ( #68 )
2022-08-19 10:46:01 -07:00
Shintaro Iwasaki
9aa00249a6
[TritonIR] make other optional and remove isOtherUnspecified ( #67 )
...
[Triton] make other optional and remove isOtherUnspecified
2022-08-18 18:19:55 -07:00
Shintaro Iwasaki
d69ce77b19
[FRONTEND] add an attr for masked load without explicit other ( #55 )
2022-08-18 09:51:37 -07:00
goostavz
fc58250a06
[BACKEND] Add backend support of arith::AddIOp, arith::AddFOp, GetProgramIdOp & GEPOp and bugfix for SplatOp, StoreOp, FuncOp ( #60 )
...
Add backend support of arith::AddIOp, arith::AddFOp, GetProgramIdOp, GEPOp and bugfix for SplatOp, StoreOp, FuncOp
Co-authored-by: gzhu <gzhu@nvidia.com >
2022-08-18 20:46:45 +08:00
Yan Chunwei
b1673caaf6
[FRONTEND] Expose end-to-end compile to python frontend ( #58 )
2022-08-17 10:42:48 -07:00
Philippe Tillet
78ebbe24c7
[FRONTEND] Added ExpandDimsOp
primitive ( #36 )
2022-08-04 18:41:06 -07:00
Yan Chunwei
b988bae813
Init TritonGPU to LLVM dialect conversion ( #32 )
...
* add toLLVM pass
* update num-warps setting in mlir
2022-08-04 10:15:45 +08:00
Philippe Tillet
d1593e6ca8
[TritonGPU] Improved documentation and semantics of layout encodings ( #30 )
2022-07-31 13:59:44 -07:00
Philippe Tillet
6d62d88d4f
[CI] run clang-format ( #24 )
2022-07-26 17:25:03 -07:00
Philippe Tillet
3265e0df5a
[PYTHON] Cleaned up legacy code; added simple standalone compilation API ( #22 )
2022-07-26 11:06:45 -07:00
Philippe Tillet
a633d2b403
[Analysis] Added Axis Info Analysis ( #8 )
2022-07-19 13:38:48 -07:00
Yan Da
0ee6e486f8
add cse pass to the pipeline & pass num-warps as an argument
2022-06-10 17:31:48 +08:00
Yan Da
366dddc3bc
update mma encoding & triton-opt
2022-06-06 21:03:58 +08:00
Yan Da
9b670cfb9f
Add ReduceOp
2022-05-25 14:15:36 +08:00
Yan Da
a2c9f919a8
TritonGPU verifier
2022-05-24 19:48:56 +08:00
Yan Da
36c45ec687
make numStages an option in PipelinePass
2022-05-23 12:47:55 +08:00
Yan Da
e3916c3a46
TritonGPU combiner
2022-05-16 19:16:01 +08:00
Yan Da
d23d7b244c
More on the pipeline pass
2022-05-11 20:31:08 +08:00
Phil Tillet
2c6a213131
[TRITONGPU] Added template for Triton -> TritonGPU conversion
2022-04-30 16:00:39 -07:00
Yan Da
8dfe78f6cf
Add TritonCombineOps
2022-04-27 19:28:21 +08:00
Yan Da
c70f6b666e
Merge previous changes
2022-04-27 14:06:55 +08:00
Yan Da
74585fb970
Add Triton CombineOps
2022-04-27 13:45:56 +08:00
Philippe Tillet
81001d318c
Putting Triton dialect in its own folder
2022-04-26 14:39:27 -07:00
Yan Da
1c52bd587d
Device function & PassManager
2022-04-15 14:41:57 +08:00
Yan Da
7e0fd97965
Add set_attr(...) to ir.OpState
2022-04-11 12:26:54 +08:00
Yan Da
19f81b7dea
Add scf-codegen tests
2022-04-10 15:49:09 +08:00
Yan Da
6002340456
Better textual representation
2022-04-07 20:44:41 +08:00
Yan Da
62f772123c
now kernel functions return nothing (instead of none)
2022-04-07 20:22:17 +08:00
Yan Da
040a2b6c75
Fix OpBuilder
2022-04-07 20:01:31 +08:00
Yan Da
6b4da6f016
Documentation
2022-04-07 16:00:53 +08:00
Yan Da
9cf4107990
Add TensorSizeTrait
2022-04-07 15:18:43 +08:00
Yan Da
39fad2b18a
More progress on WhileOp
2022-04-05 17:55:43 +08:00
Yan Da
d7fbddc7d4
Fix ret::reference issue
2022-04-05 16:09:09 +08:00
Yan Da
c7ad928e60
More progress on WhileOp codegen
2022-04-05 15:55:48 +08:00
Yan Da
9df899b291
Some progress on visit_If
2022-04-03 22:34:46 +08:00