triton

Author	SHA1	Message	Date
Ian Bearman	303790da88	[BUILD] use Python Var In Tests (#859 )	2022-11-08 17:44:19 +00:00
Philippe Tillet	82834d34f9	[BUILD] No longer use `include((HandleLLVMOptions)` (#818 )	2022-10-28 17:02:49 -07:00
Ian Bearman	f2106d0aa2	[BUILD] Fix Warnings and Enable Warnings as Errors (#794 )	2022-10-28 12:36:09 -07:00
Philippe Tillet	3aa8296b06	[BUILD] Download pybind11 in setup.py (#703 ) (#797 ) Cherry-picks #703 and resolves conflicts Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com>	2022-10-23 18:52:48 -07:00
Ian Bearman	ccc5ab6ac9	[BUILD] When set, use MLIR_DIR for finding both MLIR and LLVM (#755 )	2022-10-09 13:11:20 -07:00
Ian Bearman	89f6e1db5e	[BUILD] use cmake to set include path when build isn't triggered by setup.py (#754 )	2022-10-09 12:30:44 -07:00
Ian Bearman	863578a7fa	[BUILD] Enable current-dir inclusion (#753 ) This change enables `CMAKE_INCLUDE_CURRENT_DIR` when building Triton.	2022-10-09 18:09:49 +00:00
Ian Bearman	448d14a598	[BUILD] Add TRITON Prefix to build variables (#752 )	2022-10-09 10:55:17 -07:00
Philippe Tillet	1e91ed30d0	[RUNTIME] Major code cleanup (#711 ) This PR does the following: - CUDA utilities (e.g., cuGetInfo) won't be compiled as part of libtriton.so anymore. - Refactoring driver/llvm.cc to split it between PTX codegen and python. - By extension this will also deprecate include/external so Triton won't have to live with a copy of some CUDA/Hip headers anymore. - `triton-translate` becomes a `triton.tools.aot` Python utility that re-uses functions from the triton.compile sub-module.	2022-09-26 16:38:06 -07:00
Philippe Tillet	c56f0198dd	Revert "[Triton-MLIR][pybind11] Update pybind11 to 2.10.0" (#702 ) Reverts openai/triton#694	2022-09-23 12:31:33 -07:00
Shintaro Iwasaki	23f424c660	[Triton-MLIR][pybind11] Update pybind11 to 2.10.0 (#694 ) This PR applies #691 to the Triton-MLIR branch.	2022-09-22 17:53:42 -07:00
Jun Yang	ea175f689e	[CI]Added initial framework of CXX unittest (#98 ) Based on the discussion in #53 , I just added the initial flow of CXX unittests for this repo, with providing two dummy UTs as placeholder to show the usage, feel free to add your own CXX unittests. @Superjomn @ptillet @ptillet , in this PR, I also configure the integration-tests.yml to add the unittest into github CI check. Thanks	2022-09-04 12:50:27 +08:00
Shintaro Iwasaki	3c635449e5	[Triton] Support math and libdevice ops (#91 ) This PR adds basic math ops by using `MathDialect` and `libdevice` ops by using `extern_elementwise`. This is needed to compile some tutorial code (e.g., `softmax`). This PR implements only interface till PTX (so from frontend to TritonGPU-MLIR) - Currently till TritonGPU. It cannot be lowered to PTX now. - No special optimizations (e.g., constant folding etc) are applied. - 14.x does not define folders for many operators for math ops, but 15.x seems to increase its coverage: https://github.com/llvm/llvm-project/blob/llvmorg-15.0.0-rc3/mlir/include/mlir/Dialect/Math/IR/MathOps.td - No constant folding etc for `libdevice` ops. ```py import triton import triton.language as tl import sys @triton.jit def add_kernel( x_ptr, y_ptr, BLOCK_SIZE: tl.constexpr, ): offsets = tl.arange(0, BLOCK_SIZE) x = tl.load(x_ptr + offsets) x = tl.sin(x) output = tl.libdevice.sin(x) output = tl.libdevice.fdiv_rn(output, output) output = tl.libdevice.fmaf_rd(output, output, output) tl.store(y_ptr + offsets, output) if __name__ == "__main__" and len(sys.argv) >= 2: signature = "fp32,fp32" constants = {'BLOCK_SIZE': 1024} output = triton.compile(add_kernel, signature, device=0, constants=constants, output="ttgir") print(output) ``` -> ```llvm #blocked = #triton_gpu.blocked<{sizePerThread = [1], threadsPerWarp = [32], warpsPerCTA = [4], order = [0]}> module attributes {"triton_gpu.num-warps" = 4 : i32} { func @add_kernel__Pfp32_Pfp32__2c1024(%arg0: !tt.ptr<f32>, %arg1: !tt.ptr<f32>) { %0 = tt.make_range {end = 1024 : i32, start = 0 : i32} : tensor<1024xi32, #blocked> %1 = tt.splat %arg0 : (!tt.ptr<f32>) -> tensor<1024x!tt.ptr<f32>, #blocked> %2 = tt.getelementptr %1, %0 : tensor<1024x!tt.ptr<f32>, #blocked> %3 = tt.load %2 {cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<1024xf32, #blocked> %4 = math.sin %3 : tensor<1024xf32, #blocked> %5 = tt.ext_elemwise %4 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_sinf"} : tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked> %6 = tt.ext_elemwise %5, %5 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_fdiv_rn"} : tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked> %7 = tt.ext_elemwise %6, %6, %6 {libname = "libdevice", libpath = "/home/siwasaki/triton/python/triton/language/libdevice.10.bc", symbol = "__nv_fmaf_rd"} : tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked>, tensor<1024xf32, #blocked> -> tensor<1024xf32, #blocked> %8 = tt.splat %arg1 : (!tt.ptr<f32>) -> tensor<1024x!tt.ptr<f32>, #blocked> %9 = tt.getelementptr %8, %0 : tensor<1024x!tt.ptr<f32>, #blocked> tt.store %9, %7 : tensor<1024xf32, #blocked> return } } ```	2022-09-01 16:34:27 -07:00
Shintaro Iwasaki	d01353de07	[CI] add assert-enabled MLIR option (#78 ) This deprecates the use of release-build LLVM hosted by the LLVM project, which makes debugging harder for developers. This PR implements the following solution: 1. Create LLVM release tarballs with assert enabled on our own (using Docker) 2. Host them in our own GitHub repositories 3. Use our LLVM for CI and/or development if `TRITON_USE_ASSERT_ENABLED_LLVM=1` is set.	2022-08-31 18:55:32 -07:00
goostavz	fc58250a06	[BACKEND] Add backend support of arith::AddIOp, arith::AddFOp, GetProgramIdOp & GEPOp and bugfix for SplatOp, StoreOp, FuncOp (#60 ) Add backend support of arith::AddIOp, arith::AddFOp, GetProgramIdOp, GEPOp and bugfix for SplatOp, StoreOp, FuncOp Co-authored-by: gzhu <gzhu@nvidia.com>	2022-08-18 20:46:45 +08:00
Yan Chunwei	b1673caaf6	[FRONTEND] Expose end-to-end compile to python frontend (#58 )	2022-08-17 10:42:48 -07:00
Philippe Tillet	490d34e0d5	[FRONTEND] Fixed python bindings link options (#40 )	2022-08-07 13:09:12 -07:00
Philippe Tillet	432c3df265	[BUILD] MacOS can now build compiler and run MLIR tests (#25 )	2022-07-27 01:32:10 -07:00
Philippe Tillet	3265e0df5a	[PYTHON] Cleaned up legacy code; added simple standalone compilation API (#22 )	2022-07-26 11:06:45 -07:00
Philippe Tillet	a633d2b403	[Analysis] Added Axis Info Analysis (#8 )	2022-07-19 13:38:48 -07:00
Yan Da	35736aa44e	more progress on the testing infrastructure	2022-06-12 15:14:45 +08:00
Yan Da	22c65a53d9	more progress on test/CMakeLists.txt	2022-06-10 21:37:56 +08:00
Yan Da	55cf9a0a97	Add triton's opt	2022-06-04 22:10:00 +08:00
Yan Da	1a4fbed25b	Skeleton for the pipeline pass	2022-05-11 16:13:53 +08:00
Phil Tillet	2c6a213131	[TRITONGPU] Added template for Triton -> TritonGPU conversion	2022-04-30 16:00:39 -07:00
Yan Da	8dfe78f6cf	Add TritonCombineOps	2022-04-27 19:28:21 +08:00
Philippe Tillet	62a64ff29b	Fixed Python link bug in CMakeLists	2022-04-26 14:39:18 -07:00
Yan Da	1c52bd587d	Device function & PassManager	2022-04-15 14:41:57 +08:00
Yan Da	07881b4d41	Update includes	2022-03-24 13:46:35 +08:00
Yan Da	f2ab318614	New python binding	2022-03-22 21:53:22 +08:00
Yan Da	419bbe0f6e	Reverts back to MLIR 14 & updates CMakeLists	2022-03-20 16:41:48 +08:00
Yan Da	a2c31ff434	Init commit	2022-03-17 20:40:55 +08:00
Philippe Tillet	98ed7db8c1	[CODEGEN] Improvements and bugfixes (#463 )	2022-02-24 14:56:24 -08:00
Victor	73b04d71b2	Fixes for building on Windows (#382 ) * make C++ code compatible with Windows + MSVC * added dlfcn-win32 for cross-platform dlopen * fixed building and pip install on Windows * fixed shared library file name under Windows	2021-12-07 14:10:58 -08:00
Philippe Tillet	94c83d30ce	[GENERAL] Removed deprecated driver files and added basic compatibility with rocm (#268 ) - Removed driver module -- accelerator runtime is handled by pytorch - Added basic support for ROCM based on @micmelesse 's PR -- now can execute empty kernel on AMD devices without any compile-time changes - Now only using PREFER_SHARED for kernels when the size of shared memory is greater than 49k. Otherwise there can be poor L1 performance for broadcast tensors	2021-09-09 00:04:28 -07:00
Philippe Tillet	c0bb895d9d	[BUILD] More portable detection of terminfo (#173 )	2021-07-31 17:09:49 -07:00
Philippe Tillet	acd5e44611	[GENERAL] Some minor improvements here and there to build systems and docs (#148 )	2021-07-28 01:51:17 -07:00
Philippe Tillet	57c1fd3366	[BUILD] Now downloading LLVM from web if system does not have `llvm-config-11` (#142 )	2021-07-28 01:02:31 -07:00
Philippe Tillet	76c6f24fb6	[CI] Made build-wheels compatible with system LLVM setup (#138 ) This speeds up wheelhouse build time by ~10x	2021-07-27 12:38:49 -07:00
Philippe Tillet	8eb63bcb01	[CI] Various improvements to CI (#137 ) Add clean-up before CI runs. Now using static LLVM-11 libraries from system rather than recompilation. Still no run-time LLVM dependencies	2021-07-27 12:38:49 -07:00
Philippe Tillet	147675923e	[triton-ops] Minor build improvements (#106 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	39f4730305	Deprecation of Triton-C and Replacement by decorated Python functions (#86 ) This PR implements a major overhaul of the frontend for Triton, and replaces Triton-C by a pure Python API in which kernels are defined as @triton.jit decorated functions. The documentation and tutorials have also been updated to accommodate these changes. See documentations for more information on the new API	2021-07-27 12:38:49 -07:00
Philippe Tillet	2f80a98776	[BUILD] Added automatic nightly build releases to pip in CI; removed build-time dependence on LLVM and PyTorch (#77 ) Recently there has been more and more report about installation issues: - Installing Triton before upgrading pytorch can create some issues because Triton uses some torch headers - llvm-10-dev not available on some platform; llvm-11-dev not available on e.g. Ubuntu. absence of nightly builds This PR should fix all these issues. Some CMake tricks are used to download and install llvm at build time. Triton Python bindings were modified to remove dependence on pytorch ops. Midnight CI job added to generate binary wheels for all Triton version and update them on pypi's new triton-nightly project. This PR will also make it very easy to use LLVM forks in the future for whatever needs we have.	2021-07-27 12:38:49 -07:00
Philippe Tillet	183878dce5	[DOCS] Added matrix multiplication tutorial	2021-07-27 12:38:49 -07:00
Philippe Tillet	eacbb73968	[PYTHON] CUTLASS wrapper for fair benchmarks (#75 ) Before this commit, the benchmarking infrastructure used heterogeneous protocols between library (e.g., CUTLASS uses a C++ binary that reports mean TFLOPS; torch and triton use python call and report 10th, 50th and 90th quantiles). For the sake of uniformity and fair benchmark practices, this PR adds a python wrapper for auto-tuned CUTLASS matrix multiplication. Benchmarks have been rewritten to use this wrapper with `triton.testing.do_bench` rather than system calls to CUTLASS profiler. Importantly, this also ensures that all the matmuls are done on the same input data which should stabilize clock across providers.	2021-07-27 12:38:49 -07:00
Philippe Tillet	2a02fabdac	[PYTHON] Some cleaning of the PyBind11 wrappers (#62 )	2021-07-27 12:38:48 -07:00
Philippe Tillet	7cf358a352	[TUTORIALS] Fixed TYPO in CMakeLists.txt	2021-07-27 12:38:48 -07:00
Philippe Tillet	269ebc12e5	[PYTHON][TESTS][DOC] Various improvement of the API and code quality: * Simplified `triton.kernel` API to achieve lower latency: > .data_ptr() must now be passed as kernel argument. No more implicit conversion from torch.tensor > compilation options are now constant attributes, i.e., opt.d('VAR') becomes opt.VAR > torch.device must now be passed explicitly to triton.kernel (no longer inferred from torch.tensor arguments) * C++ tests moved to `python/tests/` * C++ tutorial created in `tutorials/` * Python tutorial created in python/tutorials/ * Version changed to 1.0alpha * No longer copying C++ headers into the Python package * added python/triton/ops/ package for pre-written Triton ops	2021-07-27 12:38:48 -07:00
Philippe Tillet	50587bbf4b	[General] LLVM-9 -> LLVM-10	2021-07-27 12:38:48 -07:00
Philippe Tillet	cf80ccc798	[PYTHON] Fixed torch ABI issue	2021-07-27 12:38:48 -07:00

1 2

59 Commits