triton

Author	SHA1	Message	Date
Philippe Tillet	22ec22c257	[FRONTEND] Backport new runtime from `master` (#706 ) This PR merges the new runtime back into the `triton-mlir` branch. This adds caching and just-in-time compilation functionality to the triton-mlir project, and paves the way for re-using tests from the master branch.	2022-09-23 16:09:43 -07:00
Philippe Tillet	3236642e8f	[OPTIMIZER] Added memory coalescing pass (#31 )	2022-07-31 20:59:31 -07:00
Philippe Tillet	d1593e6ca8	[TritonGPU] Improved documentation and semantics of layout encodings (#30 )	2022-07-31 13:59:44 -07:00
Philippe Tillet	25357083e6	[CI] Added basic CI skeletons (#23 ) Includes minor fixes to make things compile and pass static checks properly	2022-07-26 14:16:30 -07:00
Philippe Tillet	3265e0df5a	[PYTHON] Cleaned up legacy code; added simple standalone compilation API (#22 )	2022-07-26 11:06:45 -07:00
Philippe Tillet	269ebc12e5	[PYTHON][TESTS][DOC] Various improvement of the API and code quality: * Simplified `triton.kernel` API to achieve lower latency: > .data_ptr() must now be passed as kernel argument. No more implicit conversion from torch.tensor > compilation options are now constant attributes, i.e., opt.d('VAR') becomes opt.VAR > torch.device must now be passed explicitly to triton.kernel (no longer inferred from torch.tensor arguments) * C++ tests moved to `python/tests/` * C++ tutorial created in `tutorials/` * Python tutorial created in python/tutorials/ * Version changed to 1.0alpha * No longer copying C++ headers into the Python package * added python/triton/ops/ package for pre-written Triton ops	2021-07-27 12:38:48 -07:00
Philippe Tillet	083bbd1e8d	[GENERAL] Merged v1.0alpha into master. Added features are: - A100 support via mma.16816 - Thread swizzling for conflict-free shared memory accesses without padding - Complete overhaul of the LLVM code generation in codegen/selection/generator.cc to remove overengineering - Added debugging capabilities in the Python binding - Compilation error for kernels that spill	2021-07-27 12:38:48 -07:00
Philippe Tillet	c0bc7ed8b0	[PYTHON] Added TRITON_DEBUG_MODE which reallocates input tensors outside of the pytorch memory pool to spot out-of-bounds accesses more easily	2021-07-27 12:38:48 -07:00
Philippe Tillet	8f8d36c7a4	[GENERAL] Various bugfixes	2021-07-27 12:38:48 -07:00
Philippe Tillet	8f3ee53f24	[PYTHON] Added option to show PTX source code in Python	2021-07-27 12:38:48 -07:00
Philippe Tillet	049ab989b5	[GENERAL] Various improvements: * Sparse einsum in triton.ops.einsum * Hacky support for fixed-tile-size atomic-add * Various bugfixes in parser	2021-07-27 12:38:48 -07:00
Philippe Tillet	acff1b5e05	[RUNTIME] Lower-level interface for executing functions	2021-07-27 12:38:48 -07:00
Philippe Tillet	ba9955ae39	[CODEGEN][ANALYSIS] Fixed issue in layout inference	2021-07-27 12:38:48 -07:00
Philippe Tillet	89e456107b	[EXAMPLES] Improved mat_mul example	2021-07-27 12:38:48 -07:00
Philippe Tillet	68c18238a9	[EXAMPLES] Added conv2d example	2021-07-27 12:38:48 -07:00
Philippe Tillet	4ccd78f1a6	[EXAMPLES][TUTORIAL] Changed to new triton.kernel API	2021-07-27 12:38:48 -07:00
jack-willturner	180ed26b61	[DOCS] Transposition fix	2021-07-27 12:38:48 -07:00
jack-willturner	a98a2db2c2	[DOCS] Matrix copy and transpose	2021-07-27 12:38:48 -07:00
jack-willturner	32819dea51	[DOCS] Matmul and vecadd working examples	2021-07-27 12:38:48 -07:00
Philippe Tillet	c36ad6bf8a	[PYTHON][EXAMPLES][EINSUM] Updated configs for matmul	2021-07-27 12:38:48 -07:00
Philippe Tillet	7924642b78	[PYTHON][EXAMPLES][EINSUM] Added stride in CONV2D example	2021-07-27 12:38:48 -07:00
Philippe Tillet	f22ad0064c	[PYTHON][EXAMPLES][EINSUM] Added group-convolution test/benchmark	2021-07-27 12:38:48 -07:00
Philippe Tillet	5bb977173f	[PYTHON][EINSUM] re-established auto-tuning	2021-07-27 12:38:48 -07:00
Philippe Tillet	3304629de9	[CORE] Fixed several issues that arose in the development of the torch-blocksparse package: * Now using warp shuffle in reductions when possible * Various bugfixes in layout inference * Added INFINITY, exponential and select * Better error messages for unimplemented constructs	2021-07-27 12:38:48 -07:00
Philippe Tillet	9fda39f64c	[PYTHON][EXAMPLES] Removed BlockSparse examples; see https://github.com/ptillet/torch-blocksparse.git	2021-07-27 12:38:48 -07:00
Philippe Tillet	268894a5ce	[PYTHON] Merged blocksparse branch: * Example for blocksparse matrix multiplication * Simplified Triton kernel API * Revived auto-tuning in einsum	2021-07-27 12:38:48 -07:00
Philippe Tillet	dfb844bf41	[GENERAL] Improved caching mechanism: * Now computing hash in libtriton * Now only compiling a single pytorch hook per function signature	2021-07-27 12:38:48 -07:00
Philippe Tillet	9e54a03006	[PYTHON][EXAMPLES] Removed obsolete files	2021-07-27 12:38:48 -07:00
Philippe Tillet	3816f2f259	[PYTHON][EINSUM] Now handling reduction sizes that are not a multiple of TK	2021-07-27 12:38:48 -07:00
Philippe Tillet	404dd18333	[PYTHON][CORE] Deprecating Tensorflow support	2021-07-27 12:38:48 -07:00
Philippe Tillet	558422c18a	[PYTHON][EXAMPLES] Changed shape of einsum examples	2021-07-27 12:38:48 -07:00
Philippe Tillet	6d7cf35123	History prior to this date belonged to the now deprecated ISAAC project, and was deleted to save space	2021-07-27 12:38:38 -07:00

32 Commits