triton

Author	SHA1	Message	Date
Gregory Axler	2193bee94e	[Example] Fix the compile function in copy_strided.py (#1029 )	2023-01-05 10:37:41 -08:00
Philippe Tillet	20100a7254	Merge `triton-mlir` branch - Complete rewrite of the backend from scratch (#1004 ) This PR merges the `triton-mlir` branch, in which we have been quietly rewriting the Triton backend from scratch to increase maintainability, stability and ultimately performance. Changes to the runtime are minimal, and this new version aims to remain backward-compatible with the previous commit. The legacy backend is now officially deprecated, but can still be accessed via the `legacy-backend` tag. Co-authored-by: Keren Zhou <kerenzhou@openai.com> Co-authored-by: Yan Chunwei <yanchunwei@outlook.com> Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com> Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com> Co-authored-by: Yan Da <dyanab@connect.ust.hk> Co-authored-by: Jun Yang <yangjunpro@gmail.com> Co-authored-by: Ian Bearman <ianb@microsoft.com> Co-authored-by: Jason Ansel <jansel@jansel.net> Co-authored-by: Qingyi Liu <qingyil@nvidia.com> Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com> Co-authored-by: Chenggang Zhao <lyricz@yeah.net> Co-authored-by: ben-zhang-609 <benzh609@gmail.com> Co-authored-by: dongdongl <dongdongl@nvidia.com>	2022-12-21 01:30:50 -08:00
Philippe Tillet	269ebc12e5	[PYTHON][TESTS][DOC] Various improvement of the API and code quality: * Simplified `triton.kernel` API to achieve lower latency: > .data_ptr() must now be passed as kernel argument. No more implicit conversion from torch.tensor > compilation options are now constant attributes, i.e., opt.d('VAR') becomes opt.VAR > torch.device must now be passed explicitly to triton.kernel (no longer inferred from torch.tensor arguments) * C++ tests moved to `python/tests/` * C++ tutorial created in `tutorials/` * Python tutorial created in python/tutorials/ * Version changed to 1.0alpha * No longer copying C++ headers into the Python package * added python/triton/ops/ package for pre-written Triton ops	2021-07-27 12:38:48 -07:00
Philippe Tillet	083bbd1e8d	[GENERAL] Merged v1.0alpha into master. Added features are: - A100 support via mma.16816 - Thread swizzling for conflict-free shared memory accesses without padding - Complete overhaul of the LLVM code generation in codegen/selection/generator.cc to remove overengineering - Added debugging capabilities in the Python binding - Compilation error for kernels that spill	2021-07-27 12:38:48 -07:00
Philippe Tillet	c0bc7ed8b0	[PYTHON] Added TRITON_DEBUG_MODE which reallocates input tensors outside of the pytorch memory pool to spot out-of-bounds accesses more easily	2021-07-27 12:38:48 -07:00
Philippe Tillet	8f8d36c7a4	[GENERAL] Various bugfixes	2021-07-27 12:38:48 -07:00
Philippe Tillet	8f3ee53f24	[PYTHON] Added option to show PTX source code in Python	2021-07-27 12:38:48 -07:00
Philippe Tillet	049ab989b5	[GENERAL] Various improvements: * Sparse einsum in triton.ops.einsum * Hacky support for fixed-tile-size atomic-add * Various bugfixes in parser	2021-07-27 12:38:48 -07:00
Philippe Tillet	acff1b5e05	[RUNTIME] Lower-level interface for executing functions	2021-07-27 12:38:48 -07:00
Philippe Tillet	ba9955ae39	[CODEGEN][ANALYSIS] Fixed issue in layout inference	2021-07-27 12:38:48 -07:00
Philippe Tillet	89e456107b	[EXAMPLES] Improved mat_mul example	2021-07-27 12:38:48 -07:00
Philippe Tillet	68c18238a9	[EXAMPLES] Added conv2d example	2021-07-27 12:38:48 -07:00
Philippe Tillet	4ccd78f1a6	[EXAMPLES][TUTORIAL] Changed to new triton.kernel API	2021-07-27 12:38:48 -07:00
jack-willturner	180ed26b61	[DOCS] Transposition fix	2021-07-27 12:38:48 -07:00
jack-willturner	a98a2db2c2	[DOCS] Matrix copy and transpose	2021-07-27 12:38:48 -07:00
jack-willturner	32819dea51	[DOCS] Matmul and vecadd working examples	2021-07-27 12:38:48 -07:00
Philippe Tillet	c36ad6bf8a	[PYTHON][EXAMPLES][EINSUM] Updated configs for matmul	2021-07-27 12:38:48 -07:00
Philippe Tillet	7924642b78	[PYTHON][EXAMPLES][EINSUM] Added stride in CONV2D example	2021-07-27 12:38:48 -07:00
Philippe Tillet	f22ad0064c	[PYTHON][EXAMPLES][EINSUM] Added group-convolution test/benchmark	2021-07-27 12:38:48 -07:00
Philippe Tillet	5bb977173f	[PYTHON][EINSUM] re-established auto-tuning	2021-07-27 12:38:48 -07:00
Philippe Tillet	3304629de9	[CORE] Fixed several issues that arose in the development of the torch-blocksparse package: * Now using warp shuffle in reductions when possible * Various bugfixes in layout inference * Added INFINITY, exponential and select * Better error messages for unimplemented constructs	2021-07-27 12:38:48 -07:00
Philippe Tillet	9fda39f64c	[PYTHON][EXAMPLES] Removed BlockSparse examples; see https://github.com/ptillet/torch-blocksparse.git	2021-07-27 12:38:48 -07:00
Philippe Tillet	268894a5ce	[PYTHON] Merged blocksparse branch: * Example for blocksparse matrix multiplication * Simplified Triton kernel API * Revived auto-tuning in einsum	2021-07-27 12:38:48 -07:00
Philippe Tillet	dfb844bf41	[GENERAL] Improved caching mechanism: * Now computing hash in libtriton * Now only compiling a single pytorch hook per function signature	2021-07-27 12:38:48 -07:00
Philippe Tillet	9e54a03006	[PYTHON][EXAMPLES] Removed obsolete files	2021-07-27 12:38:48 -07:00
Philippe Tillet	3816f2f259	[PYTHON][EINSUM] Now handling reduction sizes that are not a multiple of TK	2021-07-27 12:38:48 -07:00
Philippe Tillet	404dd18333	[PYTHON][CORE] Deprecating Tensorflow support	2021-07-27 12:38:48 -07:00
Philippe Tillet	558422c18a	[PYTHON][EXAMPLES] Changed shape of einsum examples	2021-07-27 12:38:48 -07:00
Philippe Tillet	6d7cf35123	History prior to this date belonged to the now deprecated ISAAC project, and was deleted to save space	2021-07-27 12:38:38 -07:00

29 Commits