[GH-PAGES] Updated website

This commit is contained in:
Philippe Tillet
2022-02-08 00:24:15 +00:00
parent 0f03cfcfd3
commit cf8c3ba438
23 changed files with 147 additions and 147 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 23 KiB

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 15 KiB

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 37 KiB

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 24 KiB

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 58 KiB

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 34 KiB

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 18 KiB

After

Width:  |  Height:  |  Size: 18 KiB

View File

@@ -235,10 +235,10 @@ We can now run the decorated function above. Pass `print_data=True` to see the p
0 4096.0 9.600000 9.600000
1 8192.0 19.200000 19.200000
2 16384.0 38.400001 38.400001
3 32768.0 76.800002 76.800002
3 32768.0 63.999998 63.999998
4 65536.0 127.999995 127.999995
5 131072.0 219.428568 219.428568
6 262144.0 341.333321 341.333321
6 262144.0 341.333321 384.000001
7 524288.0 472.615390 472.615390
8 1048576.0 614.400016 614.400016
9 2097152.0 722.823517 722.823517
@@ -255,7 +255,7 @@ We can now run the decorated function above. Pass `print_data=True` to see the p
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 50.312 seconds)
**Total running time of the script:** ( 1 minutes 45.080 seconds)
.. _sphx_glr_download_getting-started_tutorials_01-vector-add.py:

View File

@@ -278,17 +278,17 @@ We will then compare its performance against (1) :code:`torch.softmax` and (2) t
softmax-performance:
N Triton Torch (native) Torch (jit)
0 256.0 546.133347 512.000001 190.511628
1 384.0 614.400016 558.545450 153.600004
2 512.0 655.360017 585.142849 154.566038
0 256.0 512.000001 546.133347 188.321838
1 384.0 614.400016 585.142862 153.600004
2 512.0 655.360017 606.814814 154.566038
3 640.0 682.666684 640.000002 160.000000
4 768.0 722.823517 664.216187 162.754967
.. ... ... ... ...
93 12160.0 814.058574 406.179533 198.530610
94 12288.0 814.111783 415.661740 198.694297
95 12416.0 814.163950 412.149375 198.457532
96 12544.0 814.214963 412.546756 198.716830
97 12672.0 814.265046 412.097543 198.679085
93 12160.0 814.058574 406.179533 199.038365
94 12288.0 814.955429 415.661740 199.298541
95 12416.0 814.163950 412.149375 198.854847
96 12544.0 814.214963 412.546756 199.111113
97 12672.0 814.265046 412.097543 199.264875
[98 rows x 4 columns]
@@ -306,7 +306,7 @@ In the above plot, we can see that:
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 3 minutes 22.431 seconds)
**Total running time of the script:** ( 3 minutes 19.691 seconds)
.. _sphx_glr_download_getting-started_tutorials_02-fused-softmax.py:

View File

@@ -459,36 +459,36 @@ We can now compare the performance of our kernel against that of cuBLAS. Here we
matmul-performance:
M cuBLAS ... Triton Triton (+ LeakyReLU)
0 256.0 2.730667 ... 2.978909 2.978909
1 384.0 7.372800 ... 8.507077 7.899428
2 512.0 14.563555 ... 15.420235 16.384000
1 384.0 7.372800 ... 7.899428 7.899428
2 512.0 14.563555 ... 15.420235 15.420235
3 640.0 22.260869 ... 24.380953 24.380953
4 768.0 32.768000 ... 34.028308 34.028308
5 896.0 37.971025 ... 39.025776 39.025776
5 896.0 39.025776 ... 39.025776 39.025776
6 1024.0 49.932191 ... 52.428801 52.428801
7 1152.0 45.242181 ... 46.656000 46.656000
7 1152.0 44.566925 ... 46.656000 46.656000
8 1280.0 51.200001 ... 56.888887 56.109587
9 1408.0 64.138541 ... 66.485074 65.684049
10 1536.0 79.526831 ... 79.526831 78.643199
9 1408.0 64.138541 ... 67.305878 66.485074
10 1536.0 80.430545 ... 79.526831 78.643199
11 1664.0 62.929456 ... 62.061463 62.061463
12 1792.0 72.983276 ... 72.047592 71.588687
13 1920.0 68.776119 ... 70.172588 69.818184
14 2048.0 73.262953 ... 76.959706 76.608294
15 2176.0 83.500614 ... 85.998493 85.269692
16 2304.0 68.446623 ... 76.319081 75.834511
17 2432.0 71.125224 ... 82.509438 84.877538
18 2560.0 77.833728 ... 80.709358 81.108913
19 2688.0 83.369354 ... 89.676257 89.464755
20 2816.0 83.233226 ... 82.446516 81.827785
21 2944.0 82.373605 ... 82.373605 81.298583
22 3072.0 82.062468 ... 88.060814 88.473602
23 3200.0 82.368085 ... 89.761569 94.955488
24 3328.0 80.889094 ... 80.527177 82.939284
25 3456.0 81.849303 ... 86.783176 91.304157
26 3584.0 87.042978 ... 98.375705 90.364394
27 3712.0 79.726532 ... 90.815768 85.820159
28 3840.0 82.592983 ... 88.191387 91.398346
29 3968.0 85.873762 ... 90.656713 83.867052
30 4096.0 92.563952 ... 82.441739 82.291681
14 2048.0 73.262953 ... 76.608294 76.260072
15 2176.0 82.813365 ... 85.998493 85.269692
16 2304.0 68.251065 ... 76.319081 75.834511
17 2432.0 71.125224 ... 80.041209 84.621881
18 2560.0 77.649287 ... 80.709358 80.709358
19 2688.0 83.922689 ... 89.464755 89.464755
20 2816.0 81.218262 ... 82.916747 82.135981
21 2944.0 81.166173 ... 82.509987 82.921853
22 3072.0 81.589488 ... 88.473602 87.924073
23 3200.0 84.768213 ... 95.380032 95.238096
24 3328.0 83.130825 ... 84.200347 83.516586
25 3456.0 81.683457 ... 91.511426 90.994998
26 3584.0 84.033077 ... 89.379119 94.448944
27 3712.0 85.675250 ... 82.423549 85.019017
28 3840.0 80.081098 ... 87.286505 91.022218
29 3968.0 86.911637 ... 88.873953 84.915752
30 4096.0 91.118618 ... 82.391920 84.599893
[31 rows x 5 columns]
@@ -498,7 +498,7 @@ We can now compare the performance of our kernel against that of cuBLAS. Here we
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 6 minutes 5.923 seconds)
**Total running time of the script:** ( 5 minutes 29.708 seconds)
.. _sphx_glr_download_getting-started_tutorials_03-matrix-multiplication.py:

View File

@@ -240,7 +240,7 @@ References
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 0.477 seconds)
**Total running time of the script:** ( 0 minutes 0.011 seconds)
.. _sphx_glr_download_getting-started_tutorials_04-low-memory-dropout.py:

View File

@@ -38,36 +38,36 @@ Layer Normalization
layer-norm-backward:
N Triton Torch
0 1024.0 311.088617 99.497980
1 1536.0 347.773587 133.565214
2 2048.0 420.102553 162.217818
3 2560.0 455.111129 182.857144
4 3072.0 511.999982 191.501303
0 1024.0 307.200008 98.303995
1 1536.0 347.773587 134.540150
2 2048.0 420.102553 161.154101
3 2560.0 455.111129 181.238943
4 3072.0 511.999982 192.501302
5 3584.0 551.384634 208.271186
6 4096.0 568.231237 220.907859
7 4608.0 502.690905 232.336141
8 5120.0 527.381977 243.326731
9 5632.0 540.671974 243.545956
10 6144.0 544.118087 249.081070
11 6656.0 528.953642 256.410903
12 7168.0 507.469040 262.243907
13 7680.0 481.253256 261.076480
14 8192.0 461.521112 269.326017
15 8704.0 417.791980 268.159180
16 9216.0 431.157889 273.404206
17 9728.0 442.181815 280.953074
18 10240.0 448.467168 286.767793
19 10752.0 427.231788 246.699797
20 11264.0 427.071098 245.313973
21 11776.0 420.571432 249.447482
22 12288.0 420.102570 254.673582
23 12800.0 414.016170 253.674644
24 13312.0 410.652963 252.759501
25 13824.0 403.620451 257.390218
26 14336.0 396.387109 254.862216
27 14848.0 382.351933 257.293872
28 15360.0 374.253788 257.790220
29 15872.0 368.046389 262.890274
6 4096.0 564.965515 220.412561
7 4608.0 504.986315 233.316456
8 5120.0 527.381977 242.366855
9 5632.0 542.843364 243.545956
10 6144.0 546.133354 248.661056
11 6656.0 532.479975 256.410903
12 7168.0 507.469040 260.260201
13 7680.0 482.513091 262.938666
14 8192.0 463.698115 265.327937
15 8704.0 416.958106 267.815384
16 9216.0 431.157889 271.391419
17 9728.0 438.857162 280.615388
18 10240.0 448.467168 286.433562
19 10752.0 428.651173 246.699797
20 11264.0 427.071098 245.760001
21 11776.0 423.089806 249.667843
22 12288.0 419.504980 254.673582
23 12800.0 413.458944 253.674644
24 13312.0 411.711355 252.161013
25 13824.0 405.594132 257.190689
26 14336.0 394.568805 254.485198
27 14848.0 386.080180 257.479779
28 15360.0 374.634130 257.970599
29 15872.0 368.402336 262.166551
@@ -339,7 +339,7 @@ Layer Normalization
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 23.770 seconds)
**Total running time of the script:** ( 1 minutes 22.248 seconds)
.. _sphx_glr_download_getting-started_tutorials_05-layer-norm.py:

View File

@@ -5,16 +5,16 @@
Computation times
=================
**12:42.913** total execution time for **getting-started_tutorials** files:
**11:56.738** total execution time for **getting-started_tutorials** files:
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_03-matrix-multiplication.py` (``03-matrix-multiplication.py``) | 06:05.923 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_03-matrix-multiplication.py` (``03-matrix-multiplication.py``) | 05:29.708 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_02-fused-softmax.py` (``02-fused-softmax.py``) | 03:22.431 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_02-fused-softmax.py` (``02-fused-softmax.py``) | 03:19.691 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_01-vector-add.py` (``01-vector-add.py``) | 01:50.312 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_01-vector-add.py` (``01-vector-add.py``) | 01:45.080 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_05-layer-norm.py` (``05-layer-norm.py``) | 01:23.770 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_05-layer-norm.py` (``05-layer-norm.py``) | 01:22.248 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_04-low-memory-dropout.py` (``04-low-memory-dropout.py``) | 00:00.477 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_04-low-memory-dropout.py` (``04-low-memory-dropout.py``) | 00:00.011 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+

View File

@@ -325,10 +325,10 @@ for different problem sizes.</p>
0 4096.0 9.600000 9.600000
1 8192.0 19.200000 19.200000
2 16384.0 38.400001 38.400001
3 32768.0 76.800002 76.800002
3 32768.0 63.999998 63.999998
4 65536.0 127.999995 127.999995
5 131072.0 219.428568 219.428568
6 262144.0 341.333321 341.333321
6 262144.0 341.333321 384.000001
7 524288.0 472.615390 472.615390
8 1048576.0 614.400016 614.400016
9 2097152.0 722.823517 722.823517
@@ -340,7 +340,7 @@ for different problem sizes.</p>
15 134217728.0 849.737435 850.656574
</pre></div>
</div>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 50.312 seconds)</p>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 45.080 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-01-vector-add-py">
<div class="sphx-glr-download sphx-glr-download-python docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/62d97d49a32414049819dd8bb8378080/01-vector-add.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">01-vector-add.py</span></code></a></p>

View File

@@ -369,17 +369,17 @@ We will then compare its performance against (1) <code class="code docutils lite
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>softmax-performance:
N Triton Torch (native) Torch (jit)
0 256.0 546.133347 512.000001 190.511628
1 384.0 614.400016 558.545450 153.600004
2 512.0 655.360017 585.142849 154.566038
0 256.0 512.000001 546.133347 188.321838
1 384.0 614.400016 585.142862 153.600004
2 512.0 655.360017 606.814814 154.566038
3 640.0 682.666684 640.000002 160.000000
4 768.0 722.823517 664.216187 162.754967
.. ... ... ... ...
93 12160.0 814.058574 406.179533 198.530610
94 12288.0 814.111783 415.661740 198.694297
95 12416.0 814.163950 412.149375 198.457532
96 12544.0 814.214963 412.546756 198.716830
97 12672.0 814.265046 412.097543 198.679085
93 12160.0 814.058574 406.179533 199.038365
94 12288.0 814.955429 415.661740 199.298541
95 12416.0 814.163950 412.149375 198.854847
96 12544.0 814.214963 412.546756 199.111113
97 12672.0 814.265046 412.097543 199.264875
[98 rows x 4 columns]
</pre></div>
@@ -392,7 +392,7 @@ We will then compare its performance against (1) <code class="code docutils lite
Note however that the PyTorch <cite>softmax</cite> operation is more general and will works on tensors of any shape.</p></li>
</ul>
</div></blockquote>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes 22.431 seconds)</p>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes 19.691 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-02-fused-softmax-py">
<div class="sphx-glr-download sphx-glr-download-python docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/d91442ac2982c4e0cc3ab0f43534afbc/02-fused-softmax.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">02-fused-softmax.py</span></code></a></p>

View File

@@ -565,41 +565,41 @@ torch_output=tensor([[ 1.1045, -36.9688, 31.4688, ..., -11.3906, 24.4531, -3
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>matmul-performance:
M cuBLAS ... Triton Triton (+ LeakyReLU)
0 256.0 2.730667 ... 2.978909 2.978909
1 384.0 7.372800 ... 8.507077 7.899428
2 512.0 14.563555 ... 15.420235 16.384000
1 384.0 7.372800 ... 7.899428 7.899428
2 512.0 14.563555 ... 15.420235 15.420235
3 640.0 22.260869 ... 24.380953 24.380953
4 768.0 32.768000 ... 34.028308 34.028308
5 896.0 37.971025 ... 39.025776 39.025776
5 896.0 39.025776 ... 39.025776 39.025776
6 1024.0 49.932191 ... 52.428801 52.428801
7 1152.0 45.242181 ... 46.656000 46.656000
7 1152.0 44.566925 ... 46.656000 46.656000
8 1280.0 51.200001 ... 56.888887 56.109587
9 1408.0 64.138541 ... 66.485074 65.684049
10 1536.0 79.526831 ... 79.526831 78.643199
9 1408.0 64.138541 ... 67.305878 66.485074
10 1536.0 80.430545 ... 79.526831 78.643199
11 1664.0 62.929456 ... 62.061463 62.061463
12 1792.0 72.983276 ... 72.047592 71.588687
13 1920.0 68.776119 ... 70.172588 69.818184
14 2048.0 73.262953 ... 76.959706 76.608294
15 2176.0 83.500614 ... 85.998493 85.269692
16 2304.0 68.446623 ... 76.319081 75.834511
17 2432.0 71.125224 ... 82.509438 84.877538
18 2560.0 77.833728 ... 80.709358 81.108913
19 2688.0 83.369354 ... 89.676257 89.464755
20 2816.0 83.233226 ... 82.446516 81.827785
21 2944.0 82.373605 ... 82.373605 81.298583
22 3072.0 82.062468 ... 88.060814 88.473602
23 3200.0 82.368085 ... 89.761569 94.955488
24 3328.0 80.889094 ... 80.527177 82.939284
25 3456.0 81.849303 ... 86.783176 91.304157
26 3584.0 87.042978 ... 98.375705 90.364394
27 3712.0 79.726532 ... 90.815768 85.820159
28 3840.0 82.592983 ... 88.191387 91.398346
29 3968.0 85.873762 ... 90.656713 83.867052
30 4096.0 92.563952 ... 82.441739 82.291681
14 2048.0 73.262953 ... 76.608294 76.260072
15 2176.0 82.813365 ... 85.998493 85.269692
16 2304.0 68.251065 ... 76.319081 75.834511
17 2432.0 71.125224 ... 80.041209 84.621881
18 2560.0 77.649287 ... 80.709358 80.709358
19 2688.0 83.922689 ... 89.464755 89.464755
20 2816.0 81.218262 ... 82.916747 82.135981
21 2944.0 81.166173 ... 82.509987 82.921853
22 3072.0 81.589488 ... 88.473602 87.924073
23 3200.0 84.768213 ... 95.380032 95.238096
24 3328.0 83.130825 ... 84.200347 83.516586
25 3456.0 81.683457 ... 91.511426 90.994998
26 3584.0 84.033077 ... 89.379119 94.448944
27 3712.0 85.675250 ... 82.423549 85.019017
28 3840.0 80.081098 ... 87.286505 91.022218
29 3968.0 86.911637 ... 88.873953 84.915752
30 4096.0 91.118618 ... 82.391920 84.599893
[31 rows x 5 columns]
</pre></div>
</div>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 6 minutes 5.923 seconds)</p>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 5 minutes 29.708 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-03-matrix-multiplication-py">
<div class="sphx-glr-download sphx-glr-download-python docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/d5fee5b55a64e47f1b5724ec39adf171/03-matrix-multiplication.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">03-matrix-multiplication.py</span></code></a></p>

View File

@@ -372,7 +372,7 @@ to explore the <cite>triton/language/random</cite> folder!</p>
<dd><p>Nitish Srivastava and Geoffrey Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, JMLR 2014</p>
</dd>
</dl>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 0 minutes 0.477 seconds)</p>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 0 minutes 0.011 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-04-low-memory-dropout-py">
<div class="sphx-glr-download sphx-glr-download-python docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/c9aed78977a4c05741d675a38dde3d7d/04-low-memory-dropout.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">04-low-memory-dropout.py</span></code></a></p>

View File

@@ -194,36 +194,36 @@ to download the full example code</p>
<p class="sphx-glr-script-out">Out:</p>
<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>layer-norm-backward:
N Triton Torch
0 1024.0 311.088617 99.497980
1 1536.0 347.773587 133.565214
2 2048.0 420.102553 162.217818
3 2560.0 455.111129 182.857144
4 3072.0 511.999982 191.501303
0 1024.0 307.200008 98.303995
1 1536.0 347.773587 134.540150
2 2048.0 420.102553 161.154101
3 2560.0 455.111129 181.238943
4 3072.0 511.999982 192.501302
5 3584.0 551.384634 208.271186
6 4096.0 568.231237 220.907859
7 4608.0 502.690905 232.336141
8 5120.0 527.381977 243.326731
9 5632.0 540.671974 243.545956
10 6144.0 544.118087 249.081070
11 6656.0 528.953642 256.410903
12 7168.0 507.469040 262.243907
13 7680.0 481.253256 261.076480
14 8192.0 461.521112 269.326017
15 8704.0 417.791980 268.159180
16 9216.0 431.157889 273.404206
17 9728.0 442.181815 280.953074
18 10240.0 448.467168 286.767793
19 10752.0 427.231788 246.699797
20 11264.0 427.071098 245.313973
21 11776.0 420.571432 249.447482
22 12288.0 420.102570 254.673582
23 12800.0 414.016170 253.674644
24 13312.0 410.652963 252.759501
25 13824.0 403.620451 257.390218
26 14336.0 396.387109 254.862216
27 14848.0 382.351933 257.293872
28 15360.0 374.253788 257.790220
29 15872.0 368.046389 262.890274
6 4096.0 564.965515 220.412561
7 4608.0 504.986315 233.316456
8 5120.0 527.381977 242.366855
9 5632.0 542.843364 243.545956
10 6144.0 546.133354 248.661056
11 6656.0 532.479975 256.410903
12 7168.0 507.469040 260.260201
13 7680.0 482.513091 262.938666
14 8192.0 463.698115 265.327937
15 8704.0 416.958106 267.815384
16 9216.0 431.157889 271.391419
17 9728.0 438.857162 280.615388
18 10240.0 448.467168 286.433562
19 10752.0 428.651173 246.699797
20 11264.0 427.071098 245.760001
21 11776.0 423.089806 249.667843
22 12288.0 419.504980 254.673582
23 12800.0 413.458944 253.674644
24 13312.0 411.711355 252.161013
25 13824.0 405.594132 257.190689
26 14336.0 394.568805 254.485198
27 14848.0 386.080180 257.479779
28 15360.0 374.634130 257.970599
29 15872.0 368.402336 262.166551
</pre></div>
</div>
<div class="line-block">
@@ -487,7 +487,7 @@ to download the full example code</p>
<span class="n">bench_layer_norm</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">save_path</span><span class="o">=</span><span class="s1">&#39;.&#39;</span><span class="p">,</span> <span class="n">print_data</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre></div>
</div>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 23.770 seconds)</p>
<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes 22.248 seconds)</p>
<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-05-layer-norm-py">
<div class="sphx-glr-download sphx-glr-download-python docutils container">
<p><a class="reference download internal" download="" href="../../_downloads/935c0dd0fbeb4b2e69588471cbb2d4b2/05-layer-norm.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">05-layer-norm.py</span></code></a></p>

View File

@@ -174,7 +174,7 @@
<div class="section" id="computation-times">
<span id="sphx-glr-getting-started-tutorials-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline"></a></h1>
<p><strong>12:42.913</strong> total execution time for <strong>getting-started_tutorials</strong> files:</p>
<p><strong>11:56.738</strong> total execution time for <strong>getting-started_tutorials</strong> files:</p>
<table class="docutils align-default">
<colgroup>
<col style="width: 85%" />
@@ -183,23 +183,23 @@
</colgroup>
<tbody>
<tr class="row-odd"><td><p><a class="reference internal" href="03-matrix-multiplication.html#sphx-glr-getting-started-tutorials-03-matrix-multiplication-py"><span class="std std-ref">Matrix Multiplication</span></a> (<code class="docutils literal notranslate"><span class="pre">03-matrix-multiplication.py</span></code>)</p></td>
<td><p>06:05.923</p></td>
<td><p>05:29.708</p></td>
<td><p>0.0 MB</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="02-fused-softmax.html#sphx-glr-getting-started-tutorials-02-fused-softmax-py"><span class="std std-ref">Fused Softmax</span></a> (<code class="docutils literal notranslate"><span class="pre">02-fused-softmax.py</span></code>)</p></td>
<td><p>03:22.431</p></td>
<td><p>03:19.691</p></td>
<td><p>0.0 MB</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="01-vector-add.html#sphx-glr-getting-started-tutorials-01-vector-add-py"><span class="std std-ref">Vector Addition</span></a> (<code class="docutils literal notranslate"><span class="pre">01-vector-add.py</span></code>)</p></td>
<td><p>01:50.312</p></td>
<td><p>01:45.080</p></td>
<td><p>0.0 MB</p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="05-layer-norm.html#sphx-glr-getting-started-tutorials-05-layer-norm-py"><span class="std std-ref">Layer Normalization</span></a> (<code class="docutils literal notranslate"><span class="pre">05-layer-norm.py</span></code>)</p></td>
<td><p>01:23.770</p></td>
<td><p>01:22.248</p></td>
<td><p>0.0 MB</p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="04-low-memory-dropout.html#sphx-glr-getting-started-tutorials-04-low-memory-dropout-py"><span class="std std-ref">Low-Memory Dropout</span></a> (<code class="docutils literal notranslate"><span class="pre">04-low-memory-dropout.py</span></code>)</p></td>
<td><p>00:00.477</p></td>
<td><p>00:00.011</p></td>
<td><p>0.0 MB</p></td>
</tr>
</tbody>

File diff suppressed because one or more lines are too long