Files
triton/programming-guide/chapter-4/triton-ir.html

301 lines
16 KiB
HTML
Raw Normal View History

2021-03-23 17:10:07 -04:00
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>The Triton-IR Intermediate Representation &mdash; Triton documentation</title>
<link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="../../_static/gallery.css" type="text/css" />
<link rel="stylesheet" href="../../_static/gallery-binder.css" type="text/css" />
<link rel="stylesheet" href="../../_static/gallery-dataframe.css" type="text/css" />
<link rel="stylesheet" href="../../_static/gallery-rendered-html.css" type="text/css" />
<link rel="stylesheet" href="../../_static/css/custom.css" type="text/css" />
<!--[if lt IE 9]>
<script src="../../_static/js/html5shiv.min.js"></script>
<![endif]-->
<script type="text/javascript" id="documentation_options" data-url_root="../../" src="../../_static/documentation_options.js"></script>
<script src="../../_static/jquery.js"></script>
<script src="../../_static/underscore.js"></script>
<script src="../../_static/doctools.js"></script>
<script async="async" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>
<script type="text/javascript" src="../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" />
<link rel="prev" title="The Triton-C Language" href="../chapter-3/triton-c.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> Triton
</a>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<p class="caption"><span class="caption-text">Getting Started</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../getting-started/installation.html">Installation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../getting-started/tutorials/index.html">Tutorials</a></li>
</ul>
<p class="caption"><span class="caption-text">Programming Guide</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../chapter-1/introduction.html">Introduction</a></li>
<li class="toctree-l1"><a class="reference internal" href="../chapter-2/related-work.html">Related Work</a></li>
<li class="toctree-l1"><a class="reference internal" href="../chapter-3/triton-c.html">The Triton-C Language</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">The Triton-IR Intermediate Representation</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#structure-of-a-triton-ir-program">Structure of a Triton-IR Program</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#modules">Modules</a></li>
<li class="toctree-l3"><a class="reference internal" href="#functions">Functions</a></li>
<li class="toctree-l3"><a class="reference internal" href="#basic-blocks">Basic Blocks</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#block-level-dataflow-analysis">Block-Level Dataflow Analysis</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#types">Types</a></li>
<li class="toctree-l3"><a class="reference internal" href="#instructions">Instructions</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#block-level-control-flow-analysis">Block-Level Control Flow Analysis</a></li>
<li class="toctree-l2"><a class="reference internal" href="#references">References</a></li>
</ul>
</li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">Triton</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home"></a> &raquo;</li>
<li>The Triton-IR Intermediate Representation</li>
<li class="wy-breadcrumbs-aside">
<a href="../../_sources/programming-guide/chapter-4/triton-ir.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="the-triton-ir-intermediate-representation">
<h1>The Triton-IR Intermediate Representation<a class="headerlink" href="#the-triton-ir-intermediate-representation" title="Permalink to this headline"></a></h1>
<p>Triton-IR is an LLVM-based Intermediate Representation (IR) whose purpose is to provide an environment suitable for block-level program analysis, transformation and optimization.
In our implementation, Triton-IR programs are constructed directly from Triton-C after parsing, but they could also be formed directly by higher-level DSLs in the future.
Triton-IR and LLVM-IR programs share the same high-level structure, but the former also includes a number of extensions necessary for block-level data-flow analysis.
These extensions are crucial for carrying out the optimizations outlined in the next chapter of this document.</p>
<div class="section" id="structure-of-a-triton-ir-program">
<h2>Structure of a Triton-IR Program<a class="headerlink" href="#structure-of-a-triton-ir-program" title="Permalink to this headline"></a></h2>
<div class="section" id="modules">
<h3>Modules<a class="headerlink" href="#modules" title="Permalink to this headline"></a></h3>
<p>At the highest level, Triton-IR programs consist of one or multiple basic units of compilation known as <em>modules</em>. These modules are compiled independently from one another, and eventually aggregated by a linker whose role is to resolve forward declarations and adequately merge global definitions. Each module itself is composed of functions, global variables, constants and other miscellaneous symbols such as metadata and attributes.</p>
</div>
<div class="section" id="functions">
<h3>Functions<a class="headerlink" href="#functions" title="Permalink to this headline"></a></h3>
<p>Triton-IR function definitions consist of a return type, a name and a potentially empty arguments list. Additional visibility, alignment and linkage specifiers can be added if desired. Function attributes (such as inlining hints) and parameter attributes (such as “readonly”, aliasing hints) can also be specified, allowing compiler backends to perform more aggressive optimizations by, for instance, making better use of non-coherent caches found on NVIDIA GPUs. This header is followed by a body composed of a list of basic blocks whose interdependencies form the Control Flow Graph (CFG) of the function.</p>
</div>
<div class="section" id="basic-blocks">
<h3>Basic Blocks<a class="headerlink" href="#basic-blocks" title="Permalink to this headline"></a></h3>
<p>Basic blocks are straight-line code sequences that may only contain so-called <em>terminator</em> instructions (i.e., branching, return) at their end. To simplify program analysis, Triton-IR uses the Static Single Assignment (SSA) form, meaning that each variable in each basic block must be (1) assigned to only once and (2) defined before being used. In so doing, each basic block implicitly defines a Data-Flow Graph (DFG). In our case, the SSA form is created directly from Triton-Cs Abstract Syntax Trees (ASTs) using an algorithm from the literature <a class="reference internal" href="#braun13" id="id1"><span>[BRAUN13]</span></a>.</p>
</div>
</div>
<div class="section" id="block-level-dataflow-analysis">
<h2>Block-Level Dataflow Analysis<a class="headerlink" href="#block-level-dataflow-analysis" title="Permalink to this headline"></a></h2>
<div class="section" id="types">
<h3>Types<a class="headerlink" href="#types" title="Permalink to this headline"></a></h3>
<p>Multi-dimensional blocks are at the center of data-flow analysis in Triton-JIT. They can be declared using syntax similar to vector declarations in LLVM-IR. For example, <code class="code docutils literal notranslate"><span class="pre">i32&lt;8,</span> <span class="pre">8&gt;</span></code> is the type corresponding to <span class="math notranslate nohighlight">\(8 \times 8\)</span> blocks of 32-bit integers. Note that there is no preprocessor in Triton-IR, hence parametric shape values must be resolved before programs are generated. In our case, this is done by Triton-JITs auto-tuner.</p>
</div>
<div class="section" id="instructions">
<h3>Instructions<a class="headerlink" href="#instructions" title="Permalink to this headline"></a></h3>
<p>Triton-IR introduces a set of <em>reblocking</em> instructions whose purpose is to support broadcasting semantics as described in the previous chapter. The <code class="code docutils literal notranslate"><span class="pre">reshape</span></code> instruction creates a block of the specified shape using the raw data from its input argument. This is particularly useful to re-interpret variables as higher-dimensional arrays by padding their input shapes with ones in preparation for broadcasting. The <code class="code docutils literal notranslate"><span class="pre">broadcast</span></code> instruction creates a block of the specified shapes by replicating its input argument as many times as necessary along dimensions of size 1 as shown below for the <code class="code docutils literal notranslate"><span class="pre">broadcast&lt;3,3&gt;</span></code> instruction.</p>
<p><a class="reference internal" href="../../_images/broadcast-1.png"><img alt="pic1" src="../../_images/broadcast-1.png" style="width: 40%;" /></a> and <a class="reference internal" href="../../_images/broadcast-2.png"><img alt="pic2" src="../../_images/broadcast-2.png" style="width: 40%;" /></a></p>
<p>Usual scalar instructions (<code class="code docutils literal notranslate"><span class="pre">cmp</span></code>, <code class="code docutils literal notranslate"><span class="pre">getelementptr</span></code>, <code class="code docutils literal notranslate"><span class="pre">add</span></code>, <code class="code docutils literal notranslate"><span class="pre">load</span></code>…) were preserved and extended to signify element-wise operations when applicable. Finally, Triton-IR also exposes specialized arithmetic instructions for reductions (<code class="code docutils literal notranslate"><span class="pre">reduce</span></code>) and matrix multiplications (<code class="code docutils literal notranslate"><span class="pre">dot</span></code>).</p>
</div>
</div>
<div class="section" id="block-level-control-flow-analysis">
<h2>Block-Level Control Flow Analysis<a class="headerlink" href="#block-level-control-flow-analysis" title="Permalink to this headline"></a></h2>
<p>In Triton-IR, operations on block variables are atomic: they execute either in full or not at all. As a result, traditional control flow structures (e.g., conditional, loops) are not applicable to individual block elements. This is problematic, since a program may need to e.g., partially guard blocked loads against memory access violations.</p>
<p>This could be potentially solved through the use of the Predicated SSA (PSSA) <a class="reference internal" href="#carter99" id="id2"><span>[CARTER99]</span></a> <a class="reference internal" href="#stoutchinin01" id="id3"><span>[STOUTCHININ01]</span></a> form for Triton-IR. However, this would create a lot of unnecessary complexity for GPUs, where the benefits of PSSA are close to none as divergent program paths within warps are serialized anyway. Therefore, recent versions of Triton handle intra-block control flow in a much simpler way, using conditional instructions such as <code class="code docutils literal notranslate"><span class="pre">select</span></code>, <code class="code docutils literal notranslate"><span class="pre">masked_load</span></code> and <code class="code docutils literal notranslate"><span class="pre">masked_store</span></code>:</p>
<div class="highlight-C notranslate"><div class="highlight"><pre><span></span><span class="c1">// For all indices [idx], return cond[idx] ? true_value[idx] : false_value[idx];</span>
<span class="n">select</span> <span class="n">TYPE</span><span class="o">&lt;</span><span class="n">TS1</span><span class="p">,</span> <span class="p">...,</span> <span class="n">TSN</span><span class="o">&gt;</span> <span class="n">cond</span><span class="p">,</span> <span class="n">true_value</span><span class="p">,</span> <span class="n">false_value</span><span class="p">;</span>
<span class="c1">// For all indices [idx], return cond[idx] ? *true_addr[idx] : false_value[idx];</span>
<span class="n">masked_load</span> <span class="n">TYPE</span><span class="o">&lt;</span><span class="n">TS1</span><span class="p">,</span> <span class="p">...,</span> <span class="n">TSN</span><span class="o">&gt;</span> <span class="n">cond</span><span class="p">,</span> <span class="n">true_addr</span><span class="p">,</span> <span class="n">false_value</span><span class="p">;</span>
<span class="c1">// For all indices [idx], execute *true_addr[idx] = true_value[idx] if cond[idx]</span>
<span class="n">masked_store</span> <span class="n">TYPE</span><span class="o">&lt;</span><span class="n">TS1</span><span class="p">,</span> <span class="p">...,</span> <span class="n">TSN</span><span class="o">&gt;</span> <span class="n">cond</span><span class="p">,</span> <span class="n">true_addr</span><span class="p">,</span> <span class="n">true_value</span><span class="p">;</span>
</pre></div>
</div>
</div>
<div class="section" id="references">
<h2>References<a class="headerlink" href="#references" title="Permalink to this headline"></a></h2>
<dl class="citation">
<dt class="label" id="braun13"><span class="brackets"><a class="fn-backref" href="#id1">BRAUN13</a></span></dt>
<dd><ol class="upperalpha simple" start="13">
<li><p>Braun et al., “Simple and Efficient Construction of Static Single Assignment Form”, CC 2013</p></li>
</ol>
</dd>
<dt class="label" id="carter99"><span class="brackets"><a class="fn-backref" href="#id2">CARTER99</a></span></dt>
<dd><ol class="upperalpha simple" start="12">
<li><p>Carter et al., “Predicated Static Single Assignment”, PACT 1999</p></li>
</ol>
</dd>
<dt class="label" id="stoutchinin01"><span class="brackets"><a class="fn-backref" href="#id3">STOUTCHININ01</a></span></dt>
<dd><ol class="upperalpha simple">
<li><p>Stoutchinin et al., “Efficient Static Single Assignment Form for Predication”, MICRO 2001</p></li>
</ol>
</dd>
</dl>
</div>
</div>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../chapter-3/triton-c.html" class="btn btn-neutral float-left" title="The Triton-C Language" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
&#169; Copyright 2020, Philippe Tillet.
</p>
</div>
Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
<a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>