5a51f3e5294f6604fdccf4cddb5d2c676e71582a
Membar pass on top of master is buggy with asynchronous copy. For example, it doesn't wait for asynchronous copies to complete before recoalescing accumulator in GEMM, which leads to undefined behavior when the program doesn't enter the loop. This PR proposes
[BUILD] Added automatic nightly build releases to pip in CI; removed build-time dependence on LLVM and PyTorch (#77)
Triton
This is the development repository of Triton, a language and compiler for writing highly efficient custom Deep-Learning primitives. The aim of Triton is to provide an open-source environment to write fast code at higher productivity than CUDA, but also with higher flexibility than other existing DSLs.
The foundations of this project are described in the following MAPL2019 publication: Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations. Please consider citing us if you use our work!
The official documentation contains installation instructions and tutorials.
Compatibility
Supported Platforms:
- Linux
Supported Hardware:
- NVIDIA GPUs (Compute Capability 7.0+)
- Under development: AMD GPUs, CPUs
Description
Languages
C++
49.7%
Python
35.3%
MLIR
13.3%
CMake
1.7%