Merge triton-mlir
branch - Complete rewrite of the backend from scratch (#1004)
This PR merges the `triton-mlir` branch, in which we have been quietly rewriting the Triton backend from scratch to increase maintainability, stability and ultimately performance. Changes to the runtime are minimal, and this new version aims to remain backward-compatible with the previous commit. The legacy backend is now officially deprecated, but can still be accessed via the `legacy-backend` tag. Co-authored-by: Keren Zhou <kerenzhou@openai.com> Co-authored-by: Yan Chunwei <yanchunwei@outlook.com> Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com> Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com> Co-authored-by: Yan Da <dyanab@connect.ust.hk> Co-authored-by: Jun Yang <yangjunpro@gmail.com> Co-authored-by: Ian Bearman <ianb@microsoft.com> Co-authored-by: Jason Ansel <jansel@jansel.net> Co-authored-by: Qingyi Liu <qingyil@nvidia.com> Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com> Co-authored-by: Chenggang Zhao <lyricz@yeah.net> Co-authored-by: ben-zhang-609 <benzh609@gmail.com> Co-authored-by: dongdongl <dongdongl@nvidia.com>
This commit is contained in:
@@ -45,7 +45,7 @@ def setup(app):
|
||||
|
||||
def wrapped(obj, **kwargs):
|
||||
import triton
|
||||
if isinstance(obj, triton.runtime.JITFunction):
|
||||
if isinstance(obj, triton.code_gen.JITFunction):
|
||||
obj = obj.fn
|
||||
return old(obj)
|
||||
|
||||
@@ -56,7 +56,7 @@ def setup(app):
|
||||
|
||||
def documenter(app, obj, parent):
|
||||
import triton
|
||||
if isinstance(obj, triton.runtime.JITFunction):
|
||||
if isinstance(obj, triton.code_gen.JITFunction):
|
||||
obj = obj.fn
|
||||
return old_documenter(app, obj, parent)
|
||||
|
||||
|
@@ -34,13 +34,11 @@ You can install the Python package from source by running the following commands
|
||||
.. code-block:: bash
|
||||
|
||||
git clone https://github.com/openai/triton.git;
|
||||
cd triton;
|
||||
git submodule update --init --recursive;
|
||||
cd python;
|
||||
cd triton/python;
|
||||
pip install cmake; # build time dependency
|
||||
pip install -e .
|
||||
|
||||
Note that, if llvm-11 is not present on your system and you are on linux, the setup.py script will download the official LLVM11 static libraries link against that. For windows users, LLVM must be installed and configured in PATH.
|
||||
Note that, if llvm-11 is not present on your system, the setup.py script will download the official LLVM11 static libraries link against that.
|
||||
|
||||
You can then test your installation by running the unit tests:
|
||||
|
||||
|
@@ -168,7 +168,7 @@ Scheduling languages are, without a doubt, one of the most popular approaches fo
|
||||
Limitations
|
||||
++++++++++++
|
||||
|
||||
This ease-of-development comes at a cost. First of all, existing systems that follow this paradigm tend to be noticeably slower than Triton on modern hardware when applicable (e.g., V100/A100 tensor cores w/ equal tile sizes). I do believe that this is not a fundamental issue of scheduling languages -- in the sense that it could probably be solved with more efforts -- but it could mean that these systems are harder to engineer. More importantly, existing scheduling languages generate loops whose bounds and increments cannot depend on surrounding loop indice without at least imposing severe constraints on possible schedules -- if not breaking the system entirely. This is problematic for sparse computations, whose iteration spaces may be irregular.
|
||||
This ease-of-development comes at a cost. First of all, existing systems that follow this paradigm tend to be noticeably slower than Triton on modern hardware when applicable (e.g., V100/A100 tensor cores w/ equal tile sizes). I do believe that this is not a fundamental issue of scheduling languages -- in the sense that it could probably be solved with more efforts -- but it could mean that these systems are harder to engineer. More importantly, existing scheduling languages generate loops whose bounds and increments cannot depend on surrounding loop indices without at least imposing severe constraints on possible schedules -- if not breaking the system entirely. This is problematic for sparse computations, whose iteration spaces may be irregular.
|
||||
|
||||
.. table::
|
||||
:widths: 50 50
|
||||
|
@@ -106,13 +106,9 @@ Atomic Ops
|
||||
:nosignatures:
|
||||
|
||||
atomic_cas
|
||||
atomic_xchg
|
||||
atomic_add
|
||||
atomic_max
|
||||
atomic_min
|
||||
atomic_and
|
||||
atomic_or
|
||||
atomic_xor
|
||||
|
||||
|
||||
Comparison ops
|
||||
|
Reference in New Issue
Block a user