[GH-PAGES] Updated website

This commit is contained in:
Philippe Tillet
2022-05-04 00:43:20 +00:00
parent af77440e1b
commit d420763e0b
158 changed files with 290 additions and 290 deletions

View File

@@ -321,7 +321,7 @@ for different problem sizes.</p>
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>vector-add-performance:
size Triton Torch
0 4096.0 8.000000 9.600000
0 4096.0 9.600000 9.600000
1 8192.0 19.200000 19.200000
2 16384.0 38.400001 38.400001
3 32768.0 76.800002 76.800002
@@ -339,7 +339,7 @@ for different problem sizes.</p>
15 134217728.0 849.737435 850.656574
</pre></div>
</div>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 42.959 seconds)</p>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 46.854 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-01-vector-add-py">
<div class="sphx-glr-download sphx-glr-download-python docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/62d97d49a32414049819dd8bb8378080/01-vector-add.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">01-vector-add.py</span></code></a></p>

View File

@@ -374,16 +374,16 @@ We will then compare its performance against (1) <code class="code docutils lite
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>softmax-performance:
N Triton Torch (native) Torch (jit)
0 256.0 512.000001 546.133347 190.511628
1 384.0 585.142862 558.545450 151.703707
2 512.0 655.360017 606.814814 154.566038
3 640.0 682.666684 640.000002 158.759699
0 256.0 512.000001 546.133347 188.321838
1 384.0 585.142862 585.142862 153.600004
2 512.0 655.360017 585.142849 154.566038
3 640.0 682.666684 640.000002 160.000000
4 768.0 722.823517 664.216187 163.839992
.. ... ... ... ...
93 12160.0 812.359066 405.755985 198.936606
93 12160.0 812.359066 405.333344 198.936606
94 12288.0 814.111783 415.222812 199.096718
95 12416.0 812.498981 411.722274 198.755369
96 12544.0 812.566838 412.546756 198.913776
95 12416.0 812.498981 411.296057 198.755369
96 12544.0 812.566838 412.971190 198.913776
97 12672.0 812.633240 411.679167 199.069228
[98 rows x 4 columns]
@@ -397,7 +397,7 @@ We will then compare its performance against (1) <code class="code docutils lite
Note however that the PyTorch <cite>softmax</cite> operation is more general and will works on tensors of any shape.</p></li>
</ul>
</div></blockquote>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes 21.302 seconds)</p>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes 25.630 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-02-fused-softmax-py">
<div class="sphx-glr-download sphx-glr-download-python docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/d91442ac2982c4e0cc3ab0f43534afbc/02-fused-softmax.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">02-fused-softmax.py</span></code></a></p>

View File

@@ -573,37 +573,37 @@ torch_output=tensor([[ 1.1045, -36.9688, 31.4688, ..., -11.3906, 24.4531, -3
2 512.0 14.563555 ... 16.384000 16.384000
3 640.0 22.260869 ... 24.380953 24.380953
4 768.0 32.768000 ... 34.028308 34.028308
5 896.0 37.971025 ... 39.025776 37.971025
5 896.0 37.971025 ... 39.025776 39.025776
6 1024.0 49.932191 ... 52.428801 52.428801
7 1152.0 45.242181 ... 46.656000 46.656000
8 1280.0 51.200001 ... 56.888887 56.109587
9 1408.0 64.138541 ... 67.305878 66.485074
10 1536.0 79.526831 ... 79.526831 78.643199
11 1664.0 63.372618 ... 62.492442 62.061463
10 1536.0 80.430545 ... 79.526831 78.643199
11 1664.0 63.372618 ... 62.929456 62.061463
12 1792.0 72.983276 ... 72.512412 71.588687
13 1920.0 69.120002 ... 70.172588 70.172588
13 1920.0 68.776119 ... 70.172588 70.172588
14 2048.0 73.908442 ... 76.959706 76.608294
15 2176.0 83.155572 ... 85.998493 85.269692
16 2304.0 68.446623 ... 77.057651 76.563695
17 2432.0 71.305746 ... 85.393507 84.877538
18 2560.0 77.833728 ... 81.715711 80.511054
19 2688.0 84.108772 ... 90.316801 89.254248
20 2816.0 83.074685 ... 84.035084 83.552120
21 2944.0 82.237674 ... 82.373605 82.237674
22 3072.0 82.360879 ... 88.473602 88.820552
23 3200.0 84.656085 ... 95.380032 94.814812
24 3328.0 83.034941 ... 85.096096 84.596116
25 3456.0 81.849303 ... 91.097818 90.994998
26 3584.0 85.715344 ... 91.099693 95.350361
27 3712.0 83.247783 ... 83.317214 85.896254
28 3840.0 84.809814 ... 89.912191 91.549669
29 3968.0 88.938731 ... 91.335278 88.423140
30 4096.0 93.629390 ... 92.436452 88.243079
16 2304.0 68.056616 ... 77.057651 76.563695
17 2432.0 71.125224 ... 85.134737 84.877538
18 2560.0 77.833728 ... 80.908642 80.908642
19 2688.0 83.369354 ... 89.888756 89.254248
20 2816.0 83.873477 ... 83.712490 82.759409
21 2944.0 81.967162 ... 82.784108 83.060049
22 3072.0 82.661468 ... 89.593522 88.890270
23 3200.0 84.432717 ... 96.385543 95.522391
24 3328.0 83.516586 ... 82.275764 82.939284
25 3456.0 81.766291 ... 91.615417 91.200871
26 3584.0 83.915437 ... 96.891584 98.483450
27 3712.0 82.491612 ... 89.755028 85.970176
28 3840.0 84.744825 ... 92.313853 86.197974
29 3968.0 93.648452 ... 88.231331 84.915752
30 4096.0 91.553703 ... 88.417474 91.929954
[31 rows x 5 columns]
</pre></div>
</div>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 5 minutes 18.180 seconds)</p>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 5 minutes 38.193 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-03-matrix-multiplication-py">
<div class="sphx-glr-download sphx-glr-download-python docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/d5fee5b55a64e47f1b5724ec39adf171/03-matrix-multiplication.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">03-matrix-multiplication.py</span></code></a></p>

View File

@@ -371,7 +371,7 @@ to explore the <cite>triton/language/random</cite> folder!</p>
<dd><p>Nitish Srivastava and Geoffrey Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, JMLR 2014</p>
</dd>
</dl>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 0 minutes 0.011 seconds)</p>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 0 minutes 0.010 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-04-low-memory-dropout-py">
<div class="sphx-glr-download sphx-glr-download-python docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/c9aed78977a4c05741d675a38dde3d7d/04-low-memory-dropout.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">04-low-memory-dropout.py</span></code></a></p>

View File

@@ -194,36 +194,36 @@ to download the full example code</p>
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>layer-norm-backward:
N Triton Torch Apex
0 1024.0 307.200008 97.912354 303.407414
1 1536.0 351.085717 134.050910 341.333333
2 2048.0 423.724127 160.627450 334.367350
3 2560.0 461.954908 180.705883 330.322572
4 3072.0 515.580429 191.999993 323.368415
5 3584.0 551.384634 207.768111 310.527060
6 4096.0 568.231237 220.412561 297.890900
7 4608.0 498.162157 232.336141 287.251954
8 5120.0 525.128191 242.366855 284.444444
9 5632.0 538.517949 243.107920 289.438969
10 6144.0 542.117638 248.661056 286.879370
11 6656.0 528.953642 255.590406 285.257135
12 7168.0 505.976473 260.063480 284.821192
13 7680.0 485.052616 262.751252 280.121579
14 8192.0 460.440290 266.406514 284.526763
15 8704.0 416.127506 267.472468 284.987724
16 9216.0 429.483477 271.724806 288.375482
17 9728.0 437.213490 280.615388 289.667485
18 10240.0 446.025405 286.433562 289.811322
19 10752.0 429.364408 246.699797 290.267711
20 11264.0 429.104745 245.536784 286.980888
21 11776.0 423.089806 249.667843 288.981596
22 12288.0 418.909088 254.453844 294.911986
23 12800.0 414.016170 253.884294 288.180121
24 13312.0 411.711355 253.160074 290.443638
25 13824.0 406.090579 257.390218 292.056329
26 14336.0 396.387109 255.051144 287.198654
27 14848.0 386.498925 257.665934 289.717061
28 15360.0 376.163261 257.790220 288.000007
29 15872.0 368.046389 261.626369 290.562936
0 1024.0 311.088617 99.497980 311.088617
1 1536.0 354.461542 133.565214 341.333333
2 2048.0 423.724127 159.067963 321.254900
3 2560.0 461.954908 182.314537 326.808501
4 3072.0 519.211251 191.005181 321.956335
5 3584.0 551.384634 208.271186 308.301075
6 4096.0 568.231237 220.907859 300.623865
7 4608.0 498.162157 232.336141 287.999990
8 5120.0 525.128191 241.414550 285.104413
9 5632.0 538.517949 242.671458 288.204696
10 6144.0 546.133354 251.631408 288.563606
11 6656.0 534.260858 255.590406 284.242007
12 7168.0 508.970395 255.619613 278.820105
13 7680.0 485.052616 264.827585 281.404588
14 8192.0 461.521112 267.493874 282.077471
15 8704.0 416.958106 263.093202 281.152082
16 9216.0 431.157889 271.724806 289.129410
17 9728.0 438.857162 282.653752 291.840007
18 10240.0 446.836366 285.104413 288.112552
19 10752.0 432.966444 245.059832 287.999996
20 11264.0 429.104745 243.107920 285.465683
21 11776.0 421.826879 250.331271 288.981596
22 12288.0 420.701865 254.673582 294.911986
23 12800.0 414.016170 254.515329 288.993430
24 13312.0 412.242569 252.161013 289.653667
25 13824.0 403.130022 257.790206 293.347481
26 14336.0 396.844280 254.109315 286.959121
27 14848.0 383.999990 257.108233 288.777966
28 15360.0 376.932517 261.261510 289.129401
29 15872.0 368.046389 261.986243 291.229369
</pre></div>
</div>
<div class="line-block">
@@ -477,7 +477,7 @@ to download the full example code</p>
<span class="n">bench_layer_norm</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">save_path</span><span class="o">=</span><span class="s1">&#39;.&#39;</span><span class="p">,</span> <span class="n">print_data</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre></div>
</div>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes 11.085 seconds)</p>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes 12.419 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-05-layer-norm-py">
<div class="sphx-glr-download sphx-glr-download-python docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/935c0dd0fbeb4b2e69588471cbb2d4b2/05-layer-norm.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">05-layer-norm.py</span></code></a></p>

View File

@@ -174,7 +174,7 @@
<div class="section" id="computation-times">
<span id="sphx-glr-getting-started-tutorials-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline"></a></h1>
<p><strong>12:33.536</strong> total execution time for <strong>getting-started_tutorials</strong> files:</p>
<p><strong>13:03.107</strong> total execution time for <strong>getting-started_tutorials</strong> files:</p>
<table class="docutils align-default">
<colgroup>
<col style="width: 85%" />
@@ -183,23 +183,23 @@
</colgroup>
<tbody>
<tr class="row-odd"><td><p><a class="reference internal" href="03-matrix-multiplication.html#sphx-glr-getting-started-tutorials-03-matrix-multiplication-py"><span class="std std-ref">Matrix Multiplication</span></a> (<code class="docutils literal notranslate"><span class="pre">03-matrix-multiplication.py</span></code>)</p></td>
<td><p>05:18.180</p></td>
<td><p>05:38.193</p></td>
<td><p>0.0 MB</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="02-fused-softmax.html#sphx-glr-getting-started-tutorials-02-fused-softmax-py"><span class="std std-ref">Fused Softmax</span></a> (<code class="docutils literal notranslate"><span class="pre">02-fused-softmax.py</span></code>)</p></td>
<td><p>03:21.302</p></td>
<td><p>03:25.630</p></td>
<td><p>0.0 MB</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="05-layer-norm.html#sphx-glr-getting-started-tutorials-05-layer-norm-py"><span class="std std-ref">Layer Normalization</span></a> (<code class="docutils literal notranslate"><span class="pre">05-layer-norm.py</span></code>)</p></td>
<td><p>02:11.085</p></td>
<td><p>02:12.419</p></td>
<td><p>0.0 MB</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="01-vector-add.html#sphx-glr-getting-started-tutorials-01-vector-add-py"><span class="std std-ref">Vector Addition</span></a> (<code class="docutils literal notranslate"><span class="pre">01-vector-add.py</span></code>)</p></td>
<td><p>01:42.959</p></td>
<td><p>01:46.854</p></td>
<td><p>0.0 MB</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="04-low-memory-dropout.html#sphx-glr-getting-started-tutorials-04-low-memory-dropout-py"><span class="std std-ref">Low-Memory Dropout</span></a> (<code class="docutils literal notranslate"><span class="pre">04-low-memory-dropout.py</span></code>)</p></td>
<td><p>00:00.011</p></td>
<td><p>00:00.010</p></td>
<td><p>0.0 MB</p></td>
</tr>
</tbody>