Commit Graph

13 Commits

Author SHA1 Message Date
Nicholas Joseph
23c71538fc [DOCS] Improve tutorial readability (#185) 2021-08-05 09:27:06 -07:00
Philippe Tillet
acd5e44611 [GENERAL] Some minor improvements here and there to build systems and docs (#148) 2021-07-28 01:51:17 -07:00
Philippe Tillet
b253b77c71 [DOCS] Improved documentation and integration in CI (#139) 2021-07-27 12:38:49 -07:00
Philippe Tillet
bfc0a7587d [PYTHON] Renamed triton.core -> triton.language (#92) 2021-07-27 12:38:49 -07:00
Philippe Tillet
29e33e50b7 [DOCS] Updates and improvements (#87) 2021-07-27 12:38:49 -07:00
Philippe Tillet
39f4730305 Deprecation of Triton-C and Replacement by decorated Python functions (#86)
This PR implements a major overhaul of the frontend for Triton, and replaces Triton-C by a pure Python API in which kernels are defined as @triton.jit decorated functions. The documentation and tutorials have also been updated to accommodate these changes.

See documentations for more information on the new API
2021-07-27 12:38:49 -07:00
Philippe Tillet
183878dce5 [DOCS] Added matrix multiplication tutorial 2021-07-27 12:38:49 -07:00
Philippe Tillet
50e58d73db [DOCS] Improved plots in tutorials 2021-07-27 12:38:49 -07:00
Philippe Tillet
d1d09566b1 [DOCS] Improved tutorials documentation 2021-07-27 12:38:49 -07:00
Philippe Tillet
92242ace2c [DOCS] Re-structured documentation hierarchy 2021-07-27 12:38:49 -07:00
Philippe Tillet
ca04da3575 [DOCS] Switched tutorials to Python and use Sphinx Gallery 2021-07-27 12:38:49 -07:00
Philippe Tillet
3ecf834a69 [PYTHON] Deleted 01-vector-add.py: it is an unnecessary duplicate of
01-vector-add.ipynb
2021-07-27 12:38:49 -07:00
Philippe Tillet
62835a0979 [RUNTIME] Added auto-alignment mechanism (#71)
This PR adds an automatic memory alignment mechanism in the Triton runtime. Specifically, the JIT compiler detects the alignment (in bytes) of each pointer argument as well as the largest power of two divisor (between 1 and 16) of each integer argument. Proper .aligned and .multipleof attributes are then added to the Triton-IR on-the-fly for all auto-tunable kernels. There is a cache that remembers all the kernels compiled for each possible configuration.

This PR also includes substantial cleaning of the Python API. This adds 2-3us overhead, mostly due to accessing integer #defines from the auto-tuned compilation options. The previous solution was slightly faster but hacky and potentially unsafe, so this is preferred for now.
2021-07-27 12:38:49 -07:00