[GH-PAGES] Updated website
This commit is contained in:
		@@ -325,22 +325,22 @@ for different problem sizes.</p>
 | 
			
		||||
0        4096.0    9.600000    9.600000
 | 
			
		||||
1        8192.0   19.200000   19.200000
 | 
			
		||||
2       16384.0   38.400001   38.400001
 | 
			
		||||
3       32768.0   63.999998   63.999998
 | 
			
		||||
3       32768.0   63.999998   76.800002
 | 
			
		||||
4       65536.0  127.999995  127.999995
 | 
			
		||||
5      131072.0  219.428568  219.428568
 | 
			
		||||
6      262144.0  341.333321  384.000001
 | 
			
		||||
6      262144.0  341.333321  341.333321
 | 
			
		||||
7      524288.0  472.615390  472.615390
 | 
			
		||||
8     1048576.0  614.400016  614.400016
 | 
			
		||||
9     2097152.0  722.823517  722.823517
 | 
			
		||||
9     2097152.0  722.823517  702.171410
 | 
			
		||||
10    4194304.0  780.190482  780.190482
 | 
			
		||||
11    8388608.0  812.429770  812.429770
 | 
			
		||||
12   16777216.0  833.084721  833.084721
 | 
			
		||||
13   33554432.0  842.004273  843.811163
 | 
			
		||||
13   33554432.0  842.004273  842.004273
 | 
			
		||||
14   67108864.0  847.448255  848.362445
 | 
			
		||||
15  134217728.0  849.737435  850.656574
 | 
			
		||||
</pre></div>
 | 
			
		||||
</div>
 | 
			
		||||
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  39.155 seconds)</p>
 | 
			
		||||
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  41.030 seconds)</p>
 | 
			
		||||
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-01-vector-add-py">
 | 
			
		||||
<div class="sphx-glr-download sphx-glr-download-python docutils container">
 | 
			
		||||
<p><a class="reference download internal" download="" href="../../_downloads/62d97d49a32414049819dd8bb8378080/01-vector-add.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">01-vector-add.py</span></code></a></p>
 | 
			
		||||
 
 | 
			
		||||
@@ -369,17 +369,17 @@ We will then compare its performance against (1) <code class="code docutils lite
 | 
			
		||||
<p class="sphx-glr-script-out">Out:</p>
 | 
			
		||||
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>softmax-performance:
 | 
			
		||||
          N      Triton  Torch (native)  Torch (jit)
 | 
			
		||||
0     256.0  512.000001      546.133347   190.511628
 | 
			
		||||
1     384.0  438.857137      558.545450   151.703707
 | 
			
		||||
0     256.0  512.000001      512.000001   190.511628
 | 
			
		||||
1     384.0  438.857137      585.142862   151.703707
 | 
			
		||||
2     512.0  481.882344      606.814814   154.566038
 | 
			
		||||
3     640.0  465.454542      640.000002   158.759699
 | 
			
		||||
4     768.0  463.698115      664.216187   163.839992
 | 
			
		||||
4     768.0  463.698115      664.216187   162.754967
 | 
			
		||||
..      ...         ...             ...          ...
 | 
			
		||||
93  12160.0  479.211815      405.333344   199.038365
 | 
			
		||||
94  12288.0  484.853264      415.222812   199.197579
 | 
			
		||||
95  12416.0  460.384708      412.149375   198.954424
 | 
			
		||||
96  12544.0  457.705824      412.546756   199.012395
 | 
			
		||||
97  12672.0  457.679461      411.679167   199.167004
 | 
			
		||||
93  12160.0  478.622374      405.333344   198.834951
 | 
			
		||||
94  12288.0  487.256521      415.661740   199.096718
 | 
			
		||||
95  12416.0  459.851851      411.722274   198.755369
 | 
			
		||||
96  12544.0  458.228323      412.971190   199.012395
 | 
			
		||||
97  12672.0  457.679461      412.097543   199.069228
 | 
			
		||||
 | 
			
		||||
[98 rows x 4 columns]
 | 
			
		||||
</pre></div>
 | 
			
		||||
@@ -392,7 +392,7 @@ We will then compare its performance against (1) <code class="code docutils lite
 | 
			
		||||
Note however that the PyTorch <cite>softmax</cite> operation is more general and will works on tensors of any shape.</p></li>
 | 
			
		||||
</ul>
 | 
			
		||||
</div></blockquote>
 | 
			
		||||
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes  8.135 seconds)</p>
 | 
			
		||||
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes  8.844 seconds)</p>
 | 
			
		||||
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-02-fused-softmax-py">
 | 
			
		||||
<div class="sphx-glr-download sphx-glr-download-python docutils container">
 | 
			
		||||
<p><a class="reference download internal" download="" href="../../_downloads/d91442ac2982c4e0cc3ab0f43534afbc/02-fused-softmax.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">02-fused-softmax.py</span></code></a></p>
 | 
			
		||||
 
 | 
			
		||||
@@ -564,42 +564,42 @@ torch_output=tensor([[  1.1045, -36.9688,  31.4688,  ..., -11.3906,  24.4531, -3
 | 
			
		||||
<p class="sphx-glr-script-out">Out:</p>
 | 
			
		||||
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>matmul-performance:
 | 
			
		||||
         M     cuBLAS  ...     Triton  Triton (+ LeakyReLU)
 | 
			
		||||
0    256.0   2.730667  ...   3.276800              3.276800
 | 
			
		||||
1    384.0   7.372800  ...   7.899428              7.899428
 | 
			
		||||
2    512.0  14.563555  ...  16.384000             16.384000
 | 
			
		||||
0    256.0   2.730667  ...   2.978909              2.978909
 | 
			
		||||
1    384.0   7.372800  ...   8.507077              8.507077
 | 
			
		||||
2    512.0  14.563555  ...  15.420235             16.384000
 | 
			
		||||
3    640.0  22.260869  ...  24.380953             24.380953
 | 
			
		||||
4    768.0  32.768000  ...  35.389441             34.028308
 | 
			
		||||
5    896.0  37.971025  ...  40.140799             40.140799
 | 
			
		||||
6   1024.0  49.932191  ...  53.773130             53.773130
 | 
			
		||||
5    896.0  37.971025  ...  40.140799             39.025776
 | 
			
		||||
6   1024.0  49.932191  ...  53.773130             52.428801
 | 
			
		||||
7   1152.0  45.242181  ...  48.161033             47.396572
 | 
			
		||||
8   1280.0  51.200001  ...  57.690139             57.690139
 | 
			
		||||
9   1408.0  64.138541  ...  69.009825             68.147202
 | 
			
		||||
9   1408.0  64.138541  ...  69.009825             67.305878
 | 
			
		||||
10  1536.0  79.526831  ...  79.526831             79.526831
 | 
			
		||||
11  1664.0  63.372618  ...  63.372618             62.929456
 | 
			
		||||
11  1664.0  62.929456  ...  63.372618             62.929456
 | 
			
		||||
12  1792.0  72.983276  ...  63.499573             63.142831
 | 
			
		||||
13  1920.0  69.467336  ...  71.257735             71.257735
 | 
			
		||||
14  2048.0  73.262953  ...  78.033565             77.672296
 | 
			
		||||
15  2176.0  83.155572  ...  87.115360             86.739860
 | 
			
		||||
16  2304.0  68.251065  ...  78.064941             77.558029
 | 
			
		||||
17  2432.0  71.305746  ...  75.522751             75.320281
 | 
			
		||||
18  2560.0  77.833728  ...  82.331658             82.125311
 | 
			
		||||
19  2688.0  83.552988  ...  90.748936             90.316801
 | 
			
		||||
20  2816.0  84.035084  ...  83.873477             84.035084
 | 
			
		||||
21  2944.0  82.784108  ...  84.324925             84.040530
 | 
			
		||||
22  3072.0  82.540970  ...  89.170242             89.030036
 | 
			
		||||
23  3200.0  84.768213  ...  95.096582             95.380032
 | 
			
		||||
24  3328.0  83.808259  ...  85.500351             86.424125
 | 
			
		||||
25  3456.0  82.604067  ...  92.033756             91.719645
 | 
			
		||||
26  3584.0  87.466332  ...  92.696281             96.372338
 | 
			
		||||
27  3712.0  86.267139  ...  85.970176             88.248537
 | 
			
		||||
28  3840.0  82.716526  ...  86.400002             91.322872
 | 
			
		||||
29  3968.0  85.871877  ...  92.163097             87.472354
 | 
			
		||||
30  4096.0  93.924229  ...  94.055868             87.097813
 | 
			
		||||
13  1920.0  68.776119  ...  71.626943             71.257735
 | 
			
		||||
14  2048.0  73.584279  ...  78.398206             78.033565
 | 
			
		||||
15  2176.0  83.500614  ...  87.115360             86.739860
 | 
			
		||||
16  2304.0  68.251065  ...  77.810656             77.558029
 | 
			
		||||
17  2432.0  71.125224  ...  75.726318             75.522751
 | 
			
		||||
18  2560.0  77.833728  ...  82.331658             81.920002
 | 
			
		||||
19  2688.0  83.737433  ...  90.532356             90.316801
 | 
			
		||||
20  2816.0  82.290955  ...  83.712490             84.197315
 | 
			
		||||
21  2944.0  82.646820  ...  81.967162             83.477440
 | 
			
		||||
22  3072.0  82.062468  ...  85.662786             89.030036
 | 
			
		||||
23  3200.0  84.210524  ...  97.116842             95.952022
 | 
			
		||||
24  3328.0  83.905938  ...  86.946008             86.736504
 | 
			
		||||
25  3456.0  79.196043  ...  86.689860             91.407671
 | 
			
		||||
26  3584.0  87.211821  ...  94.947616             97.840469
 | 
			
		||||
27  3712.0  85.896254  ...  83.005689             88.404730
 | 
			
		||||
28  3840.0  81.738356  ...  88.297007             91.473945
 | 
			
		||||
29  3968.0  88.040360  ...  92.093539             84.797731
 | 
			
		||||
30  4096.0  93.336389  ...  91.491294             88.185107
 | 
			
		||||
 | 
			
		||||
[31 rows x 5 columns]
 | 
			
		||||
</pre></div>
 | 
			
		||||
</div>
 | 
			
		||||
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 6 minutes  0.518 seconds)</p>
 | 
			
		||||
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 6 minutes  27.164 seconds)</p>
 | 
			
		||||
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-03-matrix-multiplication-py">
 | 
			
		||||
<div class="sphx-glr-download sphx-glr-download-python docutils container">
 | 
			
		||||
<p><a class="reference download internal" download="" href="../../_downloads/d5fee5b55a64e47f1b5724ec39adf171/03-matrix-multiplication.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">03-matrix-multiplication.py</span></code></a></p>
 | 
			
		||||
 
 | 
			
		||||
@@ -372,7 +372,7 @@ to explore the <cite>triton/language/random</cite> folder!</p>
 | 
			
		||||
<dd><p>Nitish Srivastava and Geoffrey Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, JMLR 2014</p>
 | 
			
		||||
</dd>
 | 
			
		||||
</dl>
 | 
			
		||||
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 0 minutes  0.012 seconds)</p>
 | 
			
		||||
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 0 minutes  0.325 seconds)</p>
 | 
			
		||||
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-04-low-memory-dropout-py">
 | 
			
		||||
<div class="sphx-glr-download sphx-glr-download-python docutils container">
 | 
			
		||||
<p><a class="reference download internal" download="" href="../../_downloads/c9aed78977a4c05741d675a38dde3d7d/04-low-memory-dropout.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">04-low-memory-dropout.py</span></code></a></p>
 | 
			
		||||
 
 | 
			
		||||
@@ -194,36 +194,36 @@ to download the full example code</p>
 | 
			
		||||
<p class="sphx-glr-script-out">Out:</p>
 | 
			
		||||
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>layer-norm-backward:
 | 
			
		||||
          N      Triton       Torch        Apex
 | 
			
		||||
0    1024.0  114.306981   97.912354  303.407414
 | 
			
		||||
1    1536.0  118.153850  134.540150  341.333333
 | 
			
		||||
2    2048.0  125.068704  161.154101  334.367350
 | 
			
		||||
3    2560.0  119.766080  181.238943  330.322572
 | 
			
		||||
4    3072.0  124.121216  192.501302  323.368415
 | 
			
		||||
5    3584.0  127.242599  208.271186  311.652167
 | 
			
		||||
6    4096.0  130.549806  220.907859  296.990947
 | 
			
		||||
7    4608.0  105.526723  232.825259  287.251954
 | 
			
		||||
8    5120.0  108.743364  242.366855  284.444444
 | 
			
		||||
9    5632.0  109.625308  243.107920  290.060087
 | 
			
		||||
10   6144.0  112.133840  248.661056  286.879370
 | 
			
		||||
11   6656.0  112.733948  256.000009  285.767438
 | 
			
		||||
12   7168.0  114.917836  260.260201  284.821192
 | 
			
		||||
13   7680.0  115.128047  262.938666  280.121579
 | 
			
		||||
14   8192.0  115.856217  266.767970  284.526763
 | 
			
		||||
15   8704.0   94.182151  267.815384  285.377055
 | 
			
		||||
16   9216.0   96.713601  271.724806  287.999990
 | 
			
		||||
17   9728.0   97.687028  280.615388  290.027323
 | 
			
		||||
18  10240.0  100.065144  286.433562  290.153487
 | 
			
		||||
19  10752.0  101.115983  246.935876  290.594591
 | 
			
		||||
20  11264.0  103.024392  245.536784  286.980888
 | 
			
		||||
21  11776.0  103.373808  249.667843  288.981596
 | 
			
		||||
22  12288.0  106.083454  254.673582  294.617366
 | 
			
		||||
23  12800.0  106.004143  254.094291  288.180121
 | 
			
		||||
24  13312.0  107.067028  253.260416  290.443638
 | 
			
		||||
25  13824.0  107.440415  257.390218  292.056329
 | 
			
		||||
26  14336.0  109.296061  255.051144  286.959121
 | 
			
		||||
27  14848.0  109.143034  257.665934  289.952797
 | 
			
		||||
28  15360.0  110.802526  257.970599  288.000007
 | 
			
		||||
29  15872.0  110.702706  261.806182  290.341468
 | 
			
		||||
0    1024.0  114.306981   99.497980  315.076934
 | 
			
		||||
1    1536.0  117.776359  132.604320  341.333333
 | 
			
		||||
2    2048.0  124.751268  158.554837  321.254900
 | 
			
		||||
3    2560.0  119.766080  182.857144  325.079368
 | 
			
		||||
4    3072.0  123.912607  191.501303  319.168834
 | 
			
		||||
5    3584.0  126.308369  208.271186  309.410081
 | 
			
		||||
6    4096.0  130.549806  220.412561  298.796351
 | 
			
		||||
7    4608.0  105.325718  231.849059  285.767436
 | 
			
		||||
8    5120.0  108.647215  244.294240  286.433562
 | 
			
		||||
9    5632.0  109.803417  244.426754  291.310338
 | 
			
		||||
10   6144.0  111.878602  251.202731  286.879370
 | 
			
		||||
11   6656.0  112.733948  256.000009  286.793541
 | 
			
		||||
12   7168.0  114.917836  253.360829  277.470965
 | 
			
		||||
13   7680.0  115.128047  266.743841  284.444450
 | 
			
		||||
14   8192.0  115.583772  258.694729  277.303250
 | 
			
		||||
15   8704.0   93.928060  267.130429  286.158893
 | 
			
		||||
16   9216.0   96.650214  272.729961  289.129410
 | 
			
		||||
17   9728.0   97.564561  279.942444  288.950501
 | 
			
		||||
18  10240.0  100.024417  287.102804  290.153487
 | 
			
		||||
19  10752.0  100.997264  246.699797  289.941565
 | 
			
		||||
20  11264.0  102.985144  246.432094  287.897767
 | 
			
		||||
21  11776.0  103.373808  249.667843  289.573776
 | 
			
		||||
22  12288.0  106.007189  254.453844  294.029924
 | 
			
		||||
23  12800.0  105.967577  254.094291  290.084977
 | 
			
		||||
24  13312.0  107.138837  252.959629  289.391298
 | 
			
		||||
25  13824.0  107.370873  256.991469  292.056329
 | 
			
		||||
26  14336.0  109.157360  255.429842  288.160801
 | 
			
		||||
27  14848.0  109.461527  257.108233  288.544136
 | 
			
		||||
28  15360.0  110.669469  258.513318  288.225185
 | 
			
		||||
29  15872.0  110.960674  262.527914  290.562936
 | 
			
		||||
</pre></div>
 | 
			
		||||
</div>
 | 
			
		||||
<div class="line-block">
 | 
			
		||||
@@ -487,7 +487,7 @@ to download the full example code</p>
 | 
			
		||||
<span class="n">bench_layer_norm</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">save_path</span><span class="o">=</span><span class="s1">'.'</span><span class="p">,</span> <span class="n">print_data</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
 | 
			
		||||
</pre></div>
 | 
			
		||||
</div>
 | 
			
		||||
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  0.332 seconds)</p>
 | 
			
		||||
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  2.154 seconds)</p>
 | 
			
		||||
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-05-layer-norm-py">
 | 
			
		||||
<div class="sphx-glr-download sphx-glr-download-python docutils container">
 | 
			
		||||
<p><a class="reference download internal" download="" href="../../_downloads/935c0dd0fbeb4b2e69588471cbb2d4b2/05-layer-norm.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">05-layer-norm.py</span></code></a></p>
 | 
			
		||||
 
 | 
			
		||||
@@ -174,7 +174,7 @@
 | 
			
		||||
            
 | 
			
		||||
  <div class="section" id="computation-times">
 | 
			
		||||
<span id="sphx-glr-getting-started-tutorials-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
 | 
			
		||||
<p><strong>12:48.152</strong> total execution time for <strong>getting-started_tutorials</strong> files:</p>
 | 
			
		||||
<p><strong>13:19.517</strong> total execution time for <strong>getting-started_tutorials</strong> files:</p>
 | 
			
		||||
<table class="docutils align-default">
 | 
			
		||||
<colgroup>
 | 
			
		||||
<col style="width: 85%" />
 | 
			
		||||
@@ -183,23 +183,23 @@
 | 
			
		||||
</colgroup>
 | 
			
		||||
<tbody>
 | 
			
		||||
<tr class="row-odd"><td><p><a class="reference internal" href="03-matrix-multiplication.html#sphx-glr-getting-started-tutorials-03-matrix-multiplication-py"><span class="std std-ref">Matrix Multiplication</span></a> (<code class="docutils literal notranslate"><span class="pre">03-matrix-multiplication.py</span></code>)</p></td>
 | 
			
		||||
<td><p>06:00.518</p></td>
 | 
			
		||||
<td><p>06:27.164</p></td>
 | 
			
		||||
<td><p>0.0 MB</p></td>
 | 
			
		||||
</tr>
 | 
			
		||||
<tr class="row-even"><td><p><a class="reference internal" href="02-fused-softmax.html#sphx-glr-getting-started-tutorials-02-fused-softmax-py"><span class="std std-ref">Fused Softmax</span></a> (<code class="docutils literal notranslate"><span class="pre">02-fused-softmax.py</span></code>)</p></td>
 | 
			
		||||
<td><p>03:08.135</p></td>
 | 
			
		||||
<td><p>03:08.844</p></td>
 | 
			
		||||
<td><p>0.0 MB</p></td>
 | 
			
		||||
</tr>
 | 
			
		||||
<tr class="row-odd"><td><p><a class="reference internal" href="05-layer-norm.html#sphx-glr-getting-started-tutorials-05-layer-norm-py"><span class="std std-ref">Layer Normalization</span></a> (<code class="docutils literal notranslate"><span class="pre">05-layer-norm.py</span></code>)</p></td>
 | 
			
		||||
<td><p>02:00.332</p></td>
 | 
			
		||||
<td><p>02:02.154</p></td>
 | 
			
		||||
<td><p>0.0 MB</p></td>
 | 
			
		||||
</tr>
 | 
			
		||||
<tr class="row-even"><td><p><a class="reference internal" href="01-vector-add.html#sphx-glr-getting-started-tutorials-01-vector-add-py"><span class="std std-ref">Vector Addition</span></a> (<code class="docutils literal notranslate"><span class="pre">01-vector-add.py</span></code>)</p></td>
 | 
			
		||||
<td><p>01:39.155</p></td>
 | 
			
		||||
<td><p>01:41.030</p></td>
 | 
			
		||||
<td><p>0.0 MB</p></td>
 | 
			
		||||
</tr>
 | 
			
		||||
<tr class="row-odd"><td><p><a class="reference internal" href="04-low-memory-dropout.html#sphx-glr-getting-started-tutorials-04-low-memory-dropout-py"><span class="std std-ref">Low-Memory Dropout</span></a> (<code class="docutils literal notranslate"><span class="pre">04-low-memory-dropout.py</span></code>)</p></td>
 | 
			
		||||
<td><p>00:00.012</p></td>
 | 
			
		||||
<td><p>00:00.325</p></td>
 | 
			
		||||
<td><p>0.0 MB</p></td>
 | 
			
		||||
</tr>
 | 
			
		||||
</tbody>
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user