Commit Graph

10 Commits

Author SHA1 Message Date
Philippe Tillet
183878dce5 [DOCS] Added matrix multiplication tutorial 2021-07-27 12:38:49 -07:00
Philippe Tillet
50e58d73db [DOCS] Improved plots in tutorials 2021-07-27 12:38:49 -07:00
Philippe Tillet
d1d09566b1 [DOCS] Improved tutorials documentation 2021-07-27 12:38:49 -07:00
Philippe Tillet
92242ace2c [DOCS] Re-structured documentation hierarchy 2021-07-27 12:38:49 -07:00
Philippe Tillet
ca04da3575 [DOCS] Switched tutorials to Python and use Sphinx Gallery 2021-07-27 12:38:49 -07:00
Philippe Tillet
5172792543 [DOCS] Added .ipynb tutorials in docs 2021-07-27 12:38:49 -07:00
Philippe Tillet
3ecf834a69 [PYTHON] Deleted 01-vector-add.py: it is an unnecessary duplicate of
01-vector-add.ipynb
2021-07-27 12:38:49 -07:00
Philippe Tillet
62835a0979 [RUNTIME] Added auto-alignment mechanism (#71)
This PR adds an automatic memory alignment mechanism in the Triton runtime. Specifically, the JIT compiler detects the alignment (in bytes) of each pointer argument as well as the largest power of two divisor (between 1 and 16) of each integer argument. Proper .aligned and .multipleof attributes are then added to the Triton-IR on-the-fly for all auto-tunable kernels. There is a cache that remembers all the kernels compiled for each possible configuration.

This PR also includes substantial cleaning of the Python API. This adds 2-3us overhead, mostly due to accessing integer #defines from the auto-tuned compilation options. The previous solution was slightly faster but hacky and potentially unsafe, so this is preferred for now.
2021-07-27 12:38:49 -07:00
Philippe Tillet
50ff1aea86 [DOCS] Added Python 02-fused-softmax.ipynb tutorial 2021-07-27 12:38:49 -07:00
Philippe Tillet
269ebc12e5 [PYTHON][TESTS][DOC] Various improvement of the API and code quality:
* Simplified `triton.kernel` API to achieve lower latency:
  > .data_ptr() must now be passed as kernel argument. No more implicit
conversion from torch.tensor
  > compilation options are now constant attributes, i.e., opt.d('VAR')
becomes opt.VAR
  > torch.device must now be passed explicitly to triton.kernel (no
longer inferred from torch.tensor arguments)
* C++ tests moved to `python/tests/`
* C++ tutorial created in `tutorials/`
* Python tutorial created in python/tutorials/
* Version changed to 1.0alpha
* No longer copying C++ headers into the Python package
* added python/triton/ops/ package for pre-written Triton ops
2021-07-27 12:38:48 -07:00