Commit Graph

6 Commits

Author SHA1 Message Date
Philippe Tillet
a7437e14c5 [RUNTIME] Added auto-alignment mechanism (#71)
This PR adds an automatic memory alignment mechanism in the Triton runtime. Specifically, the JIT compiler detects the alignment (in bytes) of each pointer argument as well as the largest power of two divisor (between 1 and 16) of each integer argument. Proper .aligned and .multipleof attributes are then added to the Triton-IR on-the-fly for all auto-tunable kernels. There is a cache that remembers all the kernels compiled for each possible configuration.

This PR also includes substantial cleaning of the Python API. This adds 2-3us overhead, mostly due to accessing integer #defines from the auto-tuned compilation options. The previous solution was slightly faster but hacky and potentially unsafe, so this is preferred for now.
2021-03-04 01:51:11 -05:00
Philippe Tillet
8f9233e546 [LANG] Fixed undefined behavior in replace_all_uses_with() 2020-05-12 20:31:10 -04:00
Philippe Tillet
cd21151b98 [GENERAL] Fixed some undefined behavior with GCC-9 2020-05-11 11:07:21 -04:00
Philippe Tillet
b81734553b [lang] added support for batched matrix multiplication 2019-10-21 15:41:50 -04:00
Philippe Tillet
a842d337c5 [general] various cleaning and bugfix:
* added copy1d and copy2d benchmark
* fixed issue in reassociation pass
2019-09-02 23:00:49 -04:00
Philippe Tillet
732156b942 [general] rename *.cpp -> *.cc 2019-08-23 19:06:39 -07:00