[GH-PAGES] Updated website

This commit is contained in:
Philippe Tillet
2022-04-21 00:45:25 +00:00
parent 8d0e47d73e
commit ab04e47bf2
158 changed files with 312 additions and 312 deletions

View File

@@ -234,8 +234,8 @@ We can now run the decorated function above. Pass `print_data=True` to see the p
size Triton Torch
0 4096.0 9.600000 9.600000
1 8192.0 19.200000 19.200000
2 16384.0 31.999999 31.999999
3 32768.0 76.800002 76.800002
2 16384.0 38.400001 38.400001
3 32768.0 63.999998 63.999998
4 65536.0 127.999995 127.999995
5 131072.0 219.428568 219.428568
6 262144.0 341.333321 341.333321
@@ -255,7 +255,7 @@ We can now run the decorated function above. Pass `print_data=True` to see the p
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 46.974 seconds)
**Total running time of the script:** ( 1 minutes 37.409 seconds)
.. _sphx_glr_download_getting-started_tutorials_01-vector-add.py:

View File

@@ -278,16 +278,16 @@ We will then compare its performance against (1) :code:`torch.softmax` and (2) t
softmax-performance:
N Triton Torch (native) Torch (jit)
0 256.0 512.000001 546.133347 188.321838
1 384.0 585.142862 585.142862 151.703707
2 512.0 655.360017 606.814814 156.038096
3 640.0 682.666684 640.000002 160.000000
4 768.0 722.823517 646.736871 163.839992
0 256.0 512.000001 546.133347 186.181817
1 384.0 585.142862 585.142862 153.600004
2 512.0 655.360017 606.814814 154.566038
3 640.0 682.666684 620.606056 160.000000
4 768.0 722.823517 664.216187 162.754967
.. ... ... ... ...
93 12160.0 814.058574 405.755985 198.936606
94 12288.0 814.111783 415.222812 199.197579
95 12416.0 814.163950 412.149375 198.954424
96 12544.0 814.214963 412.971190 199.111113
94 12288.0 815.800825 415.661740 199.298541
95 12416.0 814.163950 411.722274 198.854847
96 12544.0 814.214963 412.546756 199.061730
97 12672.0 814.265046 412.097543 199.167004
[98 rows x 4 columns]
@@ -306,7 +306,7 @@ In the above plot, we can see that:
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 3 minutes 28.936 seconds)
**Total running time of the script:** ( 3 minutes 27.320 seconds)
.. _sphx_glr_download_getting-started_tutorials_02-fused-softmax.py:

View File

@@ -458,37 +458,37 @@ We can now compare the performance of our kernel against that of cuBLAS. Here we
matmul-performance:
M cuBLAS ... Triton Triton (+ LeakyReLU)
0 256.0 2.730667 ... 3.276800 2.978909
1 384.0 7.372800 ... 7.899428 8.507077
2 512.0 14.563555 ... 16.384000 16.384000
0 256.0 2.730667 ... 2.978909 2.978909
1 384.0 7.372800 ... 7.899428 7.899428
2 512.0 14.563555 ... 15.420235 16.384000
3 640.0 22.260869 ... 24.380953 24.380953
4 768.0 32.768000 ... 35.389441 34.028308
5 896.0 37.971025 ... 40.140799 40.140799
6 1024.0 49.932191 ... 53.773130 53.773130
7 1152.0 45.242181 ... 48.161033 47.396572
8 1280.0 51.200001 ... 58.514284 57.690139
9 1408.0 64.138541 ... 69.009825 69.009825
10 1536.0 79.526831 ... 80.430545 80.430545
4 768.0 32.768000 ... 34.028308 34.028308
5 896.0 37.971025 ... 41.321411 40.140799
6 1024.0 49.932191 ... 53.773130 52.428801
7 1152.0 44.566925 ... 48.161033 47.396572
8 1280.0 51.200001 ... 57.690139 57.690139
9 1408.0 64.138541 ... 69.009825 68.147202
10 1536.0 80.430545 ... 80.430545 79.526831
11 1664.0 62.929456 ... 63.372618 62.929456
12 1792.0 72.983276 ... 63.142831 62.790080
13 1920.0 68.776119 ... 71.626943 71.257735
14 2048.0 73.584279 ... 78.398206 78.033565
12 1792.0 72.983276 ... 63.499573 62.790080
13 1920.0 69.120002 ... 71.626943 70.892307
14 2048.0 73.262953 ... 78.033565 78.033565
15 2176.0 83.155572 ... 87.115360 86.739860
16 2304.0 68.446623 ... 77.810656 77.558029
17 2432.0 71.125224 ... 75.726318 75.320281
18 2560.0 77.833728 ... 82.125311 81.715711
19 2688.0 83.369354 ... 90.532356 90.748936
20 2816.0 80.916902 ... 83.552120 83.552120
21 2944.0 82.373605 ... 83.617504 83.617504
22 3072.0 81.707223 ... 87.651868 89.593522
23 3200.0 85.219705 ... 96.385543 96.385543
24 3328.0 82.275764 ... 86.011103 85.703924
25 3456.0 81.890873 ... 92.455926 92.138932
26 3584.0 86.623693 ... 89.290361 94.250936
27 3712.0 85.675250 ... 88.092894 84.874549
28 3840.0 84.679936 ... 92.468225 85.136259
29 3968.0 90.724116 ... 84.974886 90.522206
30 4096.0 90.996029 ... 87.438257 92.309303
16 2304.0 68.446623 ... 77.558029 77.307030
17 2432.0 71.125224 ... 75.726318 74.918570
18 2560.0 77.833728 ... 82.331658 81.715711
19 2688.0 83.369354 ... 91.404957 91.185232
20 2816.0 79.733474 ... 83.873477 83.392363
21 2944.0 82.102191 ... 83.758038 84.182483
22 3072.0 81.296638 ... 88.473602 87.516392
23 3200.0 79.750779 ... 96.168294 95.952022
24 3328.0 82.939284 ... 86.528001 85.602017
25 3456.0 82.688790 ... 92.455926 86.689860
26 3584.0 86.291162 ... 88.152348 94.250936
27 3712.0 85.675250 ... 87.706180 87.246590
28 3840.0 81.079177 ... 86.875096 91.398346
29 3968.0 87.035620 ... 89.525997 84.917596
30 4096.0 88.243079 ... 85.161847 91.616198
[31 rows x 5 columns]
@@ -498,7 +498,7 @@ We can now compare the performance of our kernel against that of cuBLAS. Here we
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 6 minutes 10.290 seconds)
**Total running time of the script:** ( 6 minutes 2.101 seconds)
.. _sphx_glr_download_getting-started_tutorials_03-matrix-multiplication.py:

View File

@@ -240,7 +240,7 @@ References
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 0.012 seconds)
**Total running time of the script:** ( 0 minutes 0.013 seconds)
.. _sphx_glr_download_getting-started_tutorials_04-low-memory-dropout.py:

View File

@@ -38,36 +38,36 @@ Layer Normalization
layer-norm-backward:
N Triton Torch Apex
0 1024.0 361.411758 99.902435 315.076934
1 1536.0 409.599994 134.050910 344.523365
2 2048.0 496.484863 159.067963 323.368435
3 2560.0 461.954908 182.857144 325.079368
4 3072.0 519.211251 191.501303 320.556515
5 3584.0 554.941930 207.768111 310.527060
6 4096.0 564.965515 220.907859 301.546004
7 4608.0 502.690905 232.336141 287.251954
8 5120.0 529.655159 243.809526 286.433562
9 5632.0 542.843364 244.869560 291.310338
10 6144.0 550.208948 251.202731 287.438593
11 6656.0 534.260858 256.000009 286.793541
12 7168.0 510.480705 253.734520 277.919225
13 7680.0 487.619051 266.743841 285.325090
14 8192.0 468.114289 258.354805 278.087683
15 8704.0 415.300208 267.815384 285.377055
16 9216.0 430.319054 272.059034 290.267724
17 9728.0 438.033784 279.942444 288.950501
18 10240.0 445.217381 287.438599 290.840246
19 10752.0 427.231788 246.935876 289.941565
20 11264.0 427.746848 245.536784 286.372873
21 11776.0 419.323436 249.447482 288.981596
22 12288.0 415.954875 254.673582 294.323369
23 12800.0 410.695192 254.094291 289.811310
24 13312.0 410.125805 252.559690 289.391298
25 13824.0 404.604870 257.190689 291.799461
26 14336.0 396.844280 256.000002 289.129416
27 14848.0 386.080180 257.665934 289.012175
28 15360.0 378.869469 258.332158 287.550706
29 15872.0 372.363640 261.626369 290.562936
0 1024.0 356.173905 98.303995 296.096389
1 1536.0 409.599994 132.604320 341.333333
2 2048.0 491.520012 161.684218 336.657521
3 2560.0 461.954908 182.857144 330.322572
4 3072.0 519.211251 191.501303 319.168834
5 3584.0 558.545477 207.768111 310.527060
6 4096.0 564.965515 220.907859 298.796351
7 4608.0 502.690905 232.825259 288.751954
8 5120.0 529.655159 243.326731 285.767451
9 5632.0 545.032265 242.236559 290.060087
10 6144.0 550.208948 251.202731 288.563606
11 6656.0 530.710976 255.182111 285.257135
12 7168.0 512.000004 251.877006 276.134819
13 7680.0 483.779539 263.690977 276.756754
14 8192.0 465.895721 267.130429 278.876591
15 8704.0 414.476194 266.448988 283.440968
16 9216.0 428.651187 270.396088 287.625496
17 9728.0 438.445087 281.630872 289.667485
18 10240.0 444.412281 286.433562 288.112552
19 10752.0 427.940303 246.699797 290.267711
20 11264.0 427.071098 243.985547 284.564206
21 11776.0 421.826879 249.447482 288.981596
22 12288.0 418.314886 254.015505 294.323369
23 12800.0 413.458944 253.256381 289.538159
24 13312.0 409.075539 252.559690 290.443638
25 13824.0 403.620451 257.590056 292.056329
26 14336.0 400.074432 255.051144 288.160801
27 14848.0 384.414233 257.665934 289.717061
28 15360.0 378.092318 259.971797 288.225185
29 15872.0 370.913333 262.166551 290.562936
@@ -339,7 +339,7 @@ Layer Normalization
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 2 minutes 14.494 seconds)
**Total running time of the script:** ( 2 minutes 14.528 seconds)
.. _sphx_glr_download_getting-started_tutorials_05-layer-norm.py:

View File

@@ -5,16 +5,16 @@
Computation times
=================
**13:40.705** total execution time for **getting-started_tutorials** files:
**13:21.371** total execution time for **getting-started_tutorials** files:
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_03-matrix-multiplication.py` (``03-matrix-multiplication.py``) | 06:10.290 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_03-matrix-multiplication.py` (``03-matrix-multiplication.py``) | 06:02.101 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_02-fused-softmax.py` (``02-fused-softmax.py``) | 03:28.936 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_02-fused-softmax.py` (``02-fused-softmax.py``) | 03:27.320 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_05-layer-norm.py` (``05-layer-norm.py``) | 02:14.494 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_05-layer-norm.py` (``05-layer-norm.py``) | 02:14.528 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_01-vector-add.py` (``01-vector-add.py``) | 01:46.974 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_01-vector-add.py` (``01-vector-add.py``) | 01:37.409 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_04-low-memory-dropout.py` (``04-low-memory-dropout.py``) | 00:00.012 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_04-low-memory-dropout.py` (``04-low-memory-dropout.py``) | 00:00.013 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+