triton

Author	SHA1	Message	Date
Philippe Tillet	325ee38581	[PYTHON] Fixed bug in scoping mechanism (#111 ) Inline functions didn't restore scope of parents. Also some control flow structure still had the scoping semantics of C++	2021-07-27 12:38:49 -07:00
Philippe Tillet	9f30af76fb	[GENERAL] Minor improvements: (#110 ) * Load libcuda.so.1 if libcuda.so is not there. Error if both aren't there. * Support for multiple grad_to_none in triton.testing.do_bench * Benchmark dataframe printed along with name	2021-07-27 12:38:49 -07:00
Philippe Tillet	288b4f7f58	[PYTHON] Added frontend to print sass using turingas disasm.py (#109 )	2021-07-27 12:38:49 -07:00
daadaada	840d65d8c6	[CODEGEN] Clean up visit_mma884 (#107 )	2021-07-27 12:38:49 -07:00
daadaada	967e629c0c	[CODEGEN] Add a pass to prefetch operands of dot if applicable. (#105 ) * update membar pass when data is double buffered * Add instruction prefetch_s * prefetch tests pass (except the 1 warp case) * Fix the 1-warp bug * Add back prefetch files * Disable prefetch on a100 * Always add war barrier on sm>=80	2021-07-27 12:38:49 -07:00
Philippe Tillet	d10265f054	[CODEGEN] Bugfix for immediate offsets in inline PTX (#104 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	1e844ba78d	[CODEGEN] Switching to predicated inline PTX for LDGs (#103 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	840140bf26	[CODEGEN] Removed dedicated reassociate pass to merge it into LLVM isel (#101 ) This massively simplifies implementation of `reassociate` and also fixes a bunch of bug. The pass could still be improved, but can already be used to generate constant pointer offsets in eg the matmul epilogue	2021-07-27 12:38:49 -07:00
Philippe Tillet	6a9810ccf2	[codegen] small bugfix: (#97 ) * Added fp32 -> fp8 for ConstantFP = 0 * Added some more robust semantic check for atomic_add	2021-07-27 12:38:49 -07:00
Philippe Tillet	7355efa745	[LANG] Preliminary FP8 support (#96 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	4290be1ae8	[PYTHON] Various minor codegen fixes (#95 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	2b75158426	[PYTHON] Added atomic_add (#94 )	2021-07-27 12:38:49 -07:00
daadaada	f6688372db	[PYTHON] Allow triton.code_gen.Binary to print Triton-IR asm. (#89 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	39f4730305	Deprecation of Triton-C and Replacement by decorated Python functions (#86 ) This PR implements a major overhaul of the frontend for Triton, and replaces Triton-C by a pure Python API in which kernels are defined as @triton.jit decorated functions. The documentation and tutorials have also been updated to accommodate these changes. See documentations for more information on the new API	2021-07-27 12:38:49 -07:00
Philippe Tillet	3f6ba1020d	[CODEGEN] Make sure peephole is called before anything else in codegen	2021-07-27 12:38:49 -07:00
Philippe Tillet	5ba5a77561	[BUILD] Remove compilation warnings	2021-07-27 12:38:49 -07:00
Philippe Tillet	183878dce5	[DOCS] Added matrix multiplication tutorial	2021-07-27 12:38:49 -07:00
Philippe Tillet	5b9afaa688	[CODEGEN] Fixed bug that caused conditional operator to not always properly mask load operations Also includes minor improvement to benchmarking infrastructure	2021-07-27 12:38:49 -07:00
Philippe Tillet	62835a0979	[RUNTIME] Added auto-alignment mechanism (#71 ) This PR adds an automatic memory alignment mechanism in the Triton runtime. Specifically, the JIT compiler detects the alignment (in bytes) of each pointer argument as well as the largest power of two divisor (between 1 and 16) of each integer argument. Proper .aligned and .multipleof attributes are then added to the Triton-IR on-the-fly for all auto-tunable kernels. There is a cache that remembers all the kernels compiled for each possible configuration. This PR also includes substantial cleaning of the Python API. This adds 2-3us overhead, mostly due to accessing integer #defines from the auto-tuned compilation options. The previous solution was slightly faster but hacky and potentially unsafe, so this is preferred for now.	2021-07-27 12:38:49 -07:00
Philippe Tillet	567a1a3d17	[CODEGEN] Bugfixes with FP32 async copy	2021-07-27 12:38:49 -07:00
Philippe Tillet	11215f0f03	[CODEGEN] Now initializing cp.async to zero when predicate is false WARNING: case for non-zero initialization is still not handled. Will require manual copy to shared	2021-07-27 12:38:49 -07:00
Philippe Tillet	5b83259592	[CODEGEN] Major performance improvements on A100 (#70 ) Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks	2021-07-27 12:38:49 -07:00
Philippe Tillet	3ca40b05cf	[DRIVER] Added options for developers to cache PTX file so that ti can be manually modified	2021-07-27 12:38:49 -07:00
Philippe Tillet	b8a52c70c9	[LANG] Now requiring tiles have power of 2 number of elements	2021-07-27 12:38:48 -07:00
Philippe Tillet	6fb4800f57	Improvements w/ Auto-Tuning and standard benchmarks (#57 ) [PYTHON] Bug-fixes in the auto-tuning module and improvement of the existing API for it	2021-07-27 12:38:48 -07:00
Philippe Tillet	ad5a30bae1	[LANG] Added __debug_barrier() call to force insertion of a CUDA __syncthreads	2021-07-27 12:38:48 -07:00
Philippe Tillet	3fde4b8f5b	[RUNTIME] Auto-tuning now works as expected when the values of autotune_key change	2021-07-27 12:38:48 -07:00
Philippe Tillet	0b025db2ee	[RUNTIME] Added option to print LLVM-IR Also includes appropriate driver code change for that	2021-07-27 12:38:48 -07:00
Philippe Tillet	9f9d7b8840	[LANG] Fixed parsing error for built-in functions exp/log/sqrtf	2021-07-27 12:38:48 -07:00
Philippe Tillet	269ebc12e5	[PYTHON][TESTS][DOC] Various improvement of the API and code quality: * Simplified `triton.kernel` API to achieve lower latency: > .data_ptr() must now be passed as kernel argument. No more implicit conversion from torch.tensor > compilation options are now constant attributes, i.e., opt.d('VAR') becomes opt.VAR > torch.device must now be passed explicitly to triton.kernel (no longer inferred from torch.tensor arguments) * C++ tests moved to `python/tests/` * C++ tutorial created in `tutorials/` * Python tutorial created in python/tutorials/ * Version changed to 1.0alpha * No longer copying C++ headers into the Python package * added python/triton/ops/ package for pre-written Triton ops	2021-07-27 12:38:48 -07:00
Philippe Tillet	a5a477c36b	[CODEGEN] Fixed bug in recoalesce_inst LLVM codegen	2021-07-27 12:38:48 -07:00
Philippe Tillet	376c876eb8	[RUNTIME] Disable error on spills	2021-07-27 12:38:48 -07:00
Philippe Tillet	3b36a1e60c	[CODEGEN] Fixed issue in traversal order for atomic_add and store_inst	2021-07-27 12:38:48 -07:00
Philippe Tillet	083bbd1e8d	[GENERAL] Merged v1.0alpha into master. Added features are: - A100 support via mma.16816 - Thread swizzling for conflict-free shared memory accesses without padding - Complete overhaul of the LLVM code generation in codegen/selection/generator.cc to remove overengineering - Added debugging capabilities in the Python binding - Compilation error for kernels that spill	2021-07-27 12:38:48 -07:00
Philippe Tillet	c4fceeea49	[LANG] Added hacky min/max	2021-07-27 12:38:48 -07:00
Yan Da	27dc780871	[IR] Check constant_int type	2021-07-27 12:38:48 -07:00
Yan Da	01ef691b84	[LANG] Fix gep bug in INC	2021-07-27 12:38:48 -07:00
Yan Da	e9b2335224	[LANG] Add support for POSTFIX_INC and POSTFIX_DEC, and pointer type	2021-07-27 12:38:48 -07:00
Yan Da	05b95b7fa6	[LANG] Add support for PREFIX_INC and PREFIX_DEC.	2021-07-27 12:38:48 -07:00
Philippe Tillet	fd5c72d6a0	[LANG] Added some more atomic_add support	2021-07-27 12:38:48 -07:00
Philippe Tillet	5e8f4c934c	[DRIVER] Better exception handling of invalid ptx	2021-07-27 12:38:48 -07:00
Philippe Tillet	44ca2c0cb8	[DRIVER] Removed deprecated files and functions	2021-07-27 12:38:48 -07:00
Philippe Tillet	7ab2c2a356	[DRIVER] Removed obsolete SetArg	2021-07-27 12:38:48 -07:00
Philippe Tillet	4f08d87fed	[DRIVER] Simplified Driver API by substantially removing reliance on driver::context	2021-07-27 12:38:48 -07:00
Philippe Tillet	f42b04d925	[DRIVER] Added (slow) support for CUDA11 and Ampere	2021-07-27 12:38:48 -07:00
Philippe Tillet	baa858aa74	[CODEGEN] Fixed bug in atomic_add	2021-07-27 12:38:48 -07:00
Philippe Tillet	7d095ec686	[LANG] Added sqrtf support	2021-07-27 12:38:48 -07:00
Philippe Tillet	073fddffc1	[PYTHON] Compiling Triton in Release mode now...	2021-07-27 12:38:48 -07:00
Philippe Tillet	5d84fde733	tmp	2021-07-27 12:38:48 -07:00
Philippe Tillet	da287bb710	[CODEGEN] Progress on atom.add.f16x2	2021-07-27 12:38:48 -07:00

... 4 5 6 7 8

386 Commits