Commit Graph

7 Commits

Author SHA1 Message Date
Philippe Tillet
2f8f0042a9 [DOCS] Added matrix multiplication tutorial 2021-03-15 13:57:41 -04:00
Philippe Tillet
134e246117 [DOCS] Improved plots in tutorials 2021-03-11 00:42:29 -05:00
Philippe Tillet
dfa0d45ffe [DOCS] Improved tutorials documentation 2021-03-06 22:04:00 -05:00
Philippe Tillet
e78211c8f5 [DOCS] Re-structured documentation hierarchy 2021-03-06 17:26:49 -05:00
Philippe Tillet
85d1b02e16 [DOCS] Switched tutorials to Python and use Sphinx Gallery 2021-03-06 14:03:01 -05:00
Philippe Tillet
2b9b284026 [PYTHON] Deleted 01-vector-add.py: it is an unnecessary duplicate of
01-vector-add.ipynb
2021-03-04 02:06:57 -05:00
Philippe Tillet
a7437e14c5 [RUNTIME] Added auto-alignment mechanism (#71)
This PR adds an automatic memory alignment mechanism in the Triton runtime. Specifically, the JIT compiler detects the alignment (in bytes) of each pointer argument as well as the largest power of two divisor (between 1 and 16) of each integer argument. Proper .aligned and .multipleof attributes are then added to the Triton-IR on-the-fly for all auto-tunable kernels. There is a cache that remembers all the kernels compiled for each possible configuration.

This PR also includes substantial cleaning of the Python API. This adds 2-3us overhead, mostly due to accessing integer #defines from the auto-tuned compilation options. The previous solution was slightly faster but hacky and potentially unsafe, so this is preferred for now.
2021-03-04 01:51:11 -05:00