[GH-PAGES] Updated website

This commit is contained in:
Philippe Tillet
2022-03-01 00:42:45 +00:00
parent 11bcbd3d04
commit 014137b675
156 changed files with 272 additions and 272 deletions

View File

@@ -235,7 +235,7 @@ We can now run the decorated function above. Pass `print_data=True` to see the p
0 4096.0 9.600000 9.600000
1 8192.0 19.200000 19.200000
2 16384.0 38.400001 38.400001
3 32768.0 63.999998 63.999998
3 32768.0 76.800002 76.800002
4 65536.0 127.999995 127.999995
5 131072.0 219.428568 219.428568
6 262144.0 341.333321 384.000001
@@ -245,7 +245,7 @@ We can now run the decorated function above. Pass `print_data=True` to see the p
10 4194304.0 780.190482 780.190482
11 8388608.0 812.429770 812.429770
12 16777216.0 833.084721 833.084721
13 33554432.0 842.004273 843.811163
13 33554432.0 842.004273 842.004273
14 67108864.0 847.448255 848.362445
15 134217728.0 849.737435 850.656574
@@ -255,7 +255,7 @@ We can now run the decorated function above. Pass `print_data=True` to see the p
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 45.983 seconds)
**Total running time of the script:** ( 1 minutes 44.385 seconds)
.. _sphx_glr_download_getting-started_tutorials_01-vector-add.py:

View File

@@ -278,15 +278,15 @@ We will then compare its performance against (1) :code:`torch.softmax` and (2) t
softmax-performance:
N Triton Torch (native) Torch (jit)
0 256.0 512.000001 546.133347 190.511628
0 256.0 512.000001 546.133347 188.321838
1 384.0 614.400016 585.142862 153.600004
2 512.0 655.360017 606.814814 154.566038
3 640.0 706.206879 640.000002 160.000000
4 768.0 722.823517 664.216187 162.754967
4 768.0 722.823517 664.216187 163.839992
.. ... ... ... ...
93 12160.0 814.058574 405.755985 198.834951
93 12160.0 814.058574 405.755985 198.936606
94 12288.0 814.111783 415.661740 199.096718
95 12416.0 814.163950 411.722274 198.755369
95 12416.0 814.163950 411.296057 198.755369
96 12544.0 814.214963 412.971190 198.913776
97 12672.0 814.265046 412.097543 199.069228
@@ -306,7 +306,7 @@ In the above plot, we can see that:
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 3 minutes 22.256 seconds)
**Total running time of the script:** ( 3 minutes 23.498 seconds)
.. _sphx_glr_download_getting-started_tutorials_02-fused-softmax.py:

View File

@@ -458,37 +458,37 @@ We can now compare the performance of our kernel against that of cuBLAS. Here we
matmul-performance:
M cuBLAS ... Triton Triton (+ LeakyReLU)
0 256.0 2.978909 ... 3.276800 2.978909
1 384.0 7.372800 ... 8.507077 7.899428
2 512.0 14.563555 ... 16.384000 15.420235
0 256.0 2.730667 ... 3.276800 2.978909
1 384.0 7.372800 ... 7.899428 7.899428
2 512.0 14.563555 ... 15.420235 15.420235
3 640.0 22.260869 ... 24.380953 24.380953
4 768.0 32.768000 ... 34.028308 34.028308
5 896.0 39.025776 ... 40.140799 39.025776
6 1024.0 51.150050 ... 53.773130 52.428801
6 1024.0 49.932191 ... 52.428801 51.150050
7 1152.0 45.242181 ... 46.656000 46.656000
8 1280.0 51.200001 ... 56.888887 56.888887
9 1408.0 64.138541 ... 67.305878 66.485074
10 1536.0 79.526831 ... 79.526831 78.643199
11 1664.0 62.929456 ... 62.492442 62.492442
10 1536.0 80.430545 ... 79.526831 78.643199
11 1664.0 62.929456 ... 62.929456 62.492442
12 1792.0 72.983276 ... 72.047592 72.047592
13 1920.0 69.467336 ... 70.172588 69.818184
13 1920.0 68.776119 ... 70.172588 69.818184
14 2048.0 73.262953 ... 76.608294 76.608294
15 2176.0 83.500614 ... 86.367588 85.632545
16 2304.0 68.446623 ... 76.809875 76.809875
17 2432.0 71.396351 ... 83.614477 85.393507
18 2560.0 78.019048 ... 81.108913 80.709358
19 2688.0 83.737433 ... 89.676257 89.676257
20 2816.0 82.135981 ... 82.916747 82.916747
21 2944.0 82.237674 ... 82.237674 82.102191
22 3072.0 80.089253 ... 88.473602 88.335577
23 3200.0 83.116885 ... 95.238096 95.380032
24 3328.0 82.369902 ... 84.003845 84.200347
25 3456.0 80.220468 ... 86.503829 87.536988
26 3584.0 86.457107 ... 98.268190 98.268190
27 3712.0 80.757757 ... 87.783251 84.946722
28 3840.0 83.027026 ... 90.574940 84.613126
29 3968.0 87.976885 ... 85.212248 87.035620
30 4096.0 91.992956 ... 92.755862 86.092193
15 2176.0 83.155572 ... 85.998493 85.632545
16 2304.0 68.251065 ... 76.809875 76.809875
17 2432.0 71.305746 ... 85.393507 85.393507
18 2560.0 77.283019 ... 80.908642 81.108913
19 2688.0 83.369354 ... 90.532356 89.464755
20 2816.0 83.873477 ... 83.233226 83.392363
21 2944.0 82.034625 ... 82.373605 82.373605
22 3072.0 82.062468 ... 88.890270 86.712254
23 3200.0 81.632656 ... 94.955488 95.380032
24 3328.0 83.034941 ... 84.200347 84.695641
25 3456.0 81.766291 ... 90.586029 91.097818
26 3584.0 85.879071 ... 92.600816 95.451583
27 3712.0 85.601834 ... 89.835744 91.606915
28 3840.0 84.679936 ... 85.996889 83.718392
29 3968.0 92.864488 ... 85.034103 90.859224
30 4096.0 85.818678 ... 84.947927 91.056800
[31 rows x 5 columns]
@@ -498,7 +498,7 @@ We can now compare the performance of our kernel against that of cuBLAS. Here we
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 6 minutes 2.552 seconds)
**Total running time of the script:** ( 6 minutes 8.116 seconds)
.. _sphx_glr_download_getting-started_tutorials_03-matrix-multiplication.py:

View File

@@ -38,36 +38,36 @@ Layer Normalization
layer-norm-backward:
N Triton Torch Apex
0 1024.0 307.200008 99.902435 311.088617
1 1536.0 351.085717 133.083026 341.333333
0 1024.0 307.200008 99.497980 307.200008
1 1536.0 347.773587 133.083026 338.201833
2 2048.0 423.724127 159.067963 321.254900
3 2560.0 451.764698 182.857144 323.368411
4 3072.0 515.580429 191.501303 319.168834
5 3584.0 551.384634 208.271186 310.527060
6 4096.0 568.231237 220.412561 299.707322
7 4608.0 504.986315 232.825259 286.507772
8 5120.0 531.948056 244.294240 286.433562
9 5632.0 542.843364 244.869560 291.939522
10 6144.0 552.269672 251.631408 288.000001
11 6656.0 537.858601 255.590406 286.793541
12 7168.0 516.612607 254.485198 278.368936
13 7680.0 487.619051 266.743841 284.884090
14 8192.0 467.002371 257.003920 276.912679
15 8704.0 418.629245 267.815384 286.158893
16 9216.0 432.000001 273.404206 289.887291
17 9728.0 442.181815 280.615388 289.667485
18 10240.0 448.467168 287.102804 290.840246
19 10752.0 428.651173 246.464170 289.616170
20 11264.0 427.746848 246.432094 286.980888
21 11776.0 421.826879 249.888595 288.981596
22 12288.0 417.131525 254.893699 294.617366
23 12800.0 415.696898 253.674644 290.359162
24 13312.0 410.125805 252.559690 289.653667
25 13824.0 402.640783 257.190689 292.056329
26 14336.0 396.387109 255.240352 289.129416
27 14848.0 383.174202 257.293872 287.844912
28 15360.0 374.253788 258.513318 286.879376
29 15872.0 368.402336 262.347108 290.120338
3 2560.0 451.764698 183.402991 330.322572
4 3072.0 508.468972 193.005236 315.076914
5 3584.0 547.872604 208.271186 308.301075
6 4096.0 564.965515 220.412561 301.546004
7 4608.0 504.986315 232.825259 291.799469
8 5120.0 529.655159 240.941184 285.767451
9 5632.0 547.238891 241.371422 288.820505
10 6144.0 552.269672 249.502530 286.879370
11 6656.0 536.053693 254.369423 284.242007
12 7168.0 515.065851 252.616738 276.134819
13 7680.0 486.332448 263.314295 280.547947
14 8192.0 463.698115 263.196793 280.467910
15 8704.0 416.958106 265.096445 283.440968
16 9216.0 431.157889 271.724806 287.625496
17 9728.0 441.345926 280.615388 288.593329
18 10240.0 446.836366 285.767451 289.469963
19 10752.0 429.364408 246.464170 289.941565
20 11264.0 423.724120 244.869560 284.864065
21 11776.0 421.826879 250.109737 289.573776
22 12288.0 419.504980 253.796902 294.323369
23 12800.0 415.696898 253.256381 287.640454
24 13312.0 409.599999 253.160074 290.707920
25 13824.0 405.098897 256.593977 291.799461
26 14336.0 397.761846 254.673567 287.438588
27 14848.0 381.942121 256.922861 287.612590
28 15360.0 376.932517 259.971797 288.676598
29 15872.0 367.691129 264.717162 292.796308
@@ -339,7 +339,7 @@ Layer Normalization
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 2 minutes 11.369 seconds)
**Total running time of the script:** ( 2 minutes 11.911 seconds)
.. _sphx_glr_download_getting-started_tutorials_05-layer-norm.py:

View File

@@ -5,16 +5,16 @@
Computation times
=================
**13:22.642** total execution time for **getting-started_tutorials** files:
**13:28.393** total execution time for **getting-started_tutorials** files:
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_03-matrix-multiplication.py` (``03-matrix-multiplication.py``) | 06:02.552 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_03-matrix-multiplication.py` (``03-matrix-multiplication.py``) | 06:08.116 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_02-fused-softmax.py` (``02-fused-softmax.py``) | 03:22.256 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_02-fused-softmax.py` (``02-fused-softmax.py``) | 03:23.498 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_05-layer-norm.py` (``05-layer-norm.py``) | 02:11.369 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_05-layer-norm.py` (``05-layer-norm.py``) | 02:11.911 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_01-vector-add.py` (``01-vector-add.py``) | 01:45.983 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_01-vector-add.py` (``01-vector-add.py``) | 01:44.385 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_04-low-memory-dropout.py` (``04-low-memory-dropout.py``) | 00:00.483 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+