of extended einsum
Triton
This is the development repository of Triton, a language and compiler for writing highly efficient custom Deep-Learning primitives. The aim of Triton is to provide an open-source environment to write fast code at higher productivity than CUDA, but also with much higher flexibility than TVM and without having to manually specify compute schedules.
The main components of Triton at the moment are:
- Triton-C: An imperative, single-threaded language for writing highly efficient compute-kernels at a relatively high abstraction level (think numpy-like array operations in a C-like language).
- Triton-IR: A special-purpose intermediate representation (Triton-IR) for aiding array-level program analysis and optimizations in Triton-C programs.
- Triton-JIT: An optimizing just-in-time compiler for Triton-IR, which generates GPU code on par with state-of-the-art CUDA-C (e.g., CUTLASS). This includes transparent support for mixed-precision and Tensor Cores.
Bindings for automatic PyTorch custom op generations are included in PyTriton, along with a small DSL based on einsum that supports convolutions, shift-convolutions, direct einsums, etc.
The formal foundations of this project are described in the following MAPL2019 publication: Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations. Please cite us if you use our work!
Installation
Triton is a fairly self-contained package and uses its own parser (forked from wgtcc) and LLVM-8.0+ for code generation.
sudo apt-get install llvm-8-dev
git clone https://github.com/ptillet/triton.git;
cd triton/python/;
python setup.py develop;
cd examples;
python einsum.py
Getting Started
Please visit the documentation to get started with Triton
Contributing
Please keep in mind that this is a project I have been carrying out completely on my own as part of my Ph.D. thesis. While I am confident in the approach, there are still many things to fix and to polish. Please contact me (ptillet AT g.harvard.edu) or raise an issue if you want to contribute!
ISAAC (deprecated) for fast inference
Before working on Triton, I wrote custom auto-tuned PTX code for fast, quantized inference on GPUs. While this project is now deprecated, you can use it at your own risk by checking out the "isaac" tag in this repository.