triton

Author	SHA1	Message	Date
Philippe Tillet	9c05ec148f	[BUILD] Added automatic nightly build releases to pip in CI; removed build-time dependence on LLVM and PyTorch (#77 ) Recently there has been more and more report about installation issues: - Installing Triton before upgrading pytorch can create some issues because Triton uses some torch headers - llvm-10-dev not available on some platform; llvm-11-dev not available on e.g. Ubuntu. absence of nightly builds This PR should fix all these issues. Some CMake tricks are used to download and install llvm at build time. Triton Python bindings were modified to remove dependence on pytorch ops. Midnight CI job added to generate binary wheels for all Triton version and update them on pypi's new triton-nightly project. This PR will also make it very easy to use LLVM forks in the future for whatever needs we have.	2021-03-22 20:03:37 -04:00
Philippe Tillet	15f8e8c3b7	[CODEGEN] Major performance improvements on A100 (#70 ) Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks	2021-02-21 18:19:39 -05:00
Philippe Tillet	c847cc6320	[DRIVER] Added options for developers to cache PTX file so that ti can be manually modified	2021-02-09 00:09:10 -05:00
Philippe Tillet	1726197bb4	Improvements w/ Auto-Tuning and standard benchmarks (#57 ) [PYTHON] Bug-fixes in the auto-tuning module and improvement of the existing API for it	2021-02-03 16:37:21 -05:00
Philippe Tillet	6e77538087	[RUNTIME] Auto-tuning now works as expected when the values of autotune_key change	2021-01-31 19:23:51 -05:00
Philippe Tillet	0b23f95b20	[RUNTIME] Added option to print LLVM-IR Also includes appropriate driver code change for that	2021-01-31 01:01:32 -05:00
Philippe Tillet	79d098450f	[PYTHON][TESTS][DOC] Various improvement of the API and code quality: * Simplified `triton.kernel` API to achieve lower latency: > .data_ptr() must now be passed as kernel argument. No more implicit conversion from torch.tensor > compilation options are now constant attributes, i.e., opt.d('VAR') becomes opt.VAR > torch.device must now be passed explicitly to triton.kernel (no longer inferred from torch.tensor arguments) * C++ tests moved to `python/tests/` * C++ tutorial created in `tutorials/` * Python tutorial created in python/tutorials/ * Version changed to 1.0alpha * No longer copying C++ headers into the Python package * added python/triton/ops/ package for pre-written Triton ops	2021-01-29 17:27:16 -05:00

7 Commits