[GH-PAGES] Updated website

This commit is contained in:
Philippe Tillet
2022-06-05 21:05:02 +00:00
parent a598db498f
commit fd3a9985ea
351 changed files with 43281 additions and 140 deletions

View File

@@ -240,7 +240,7 @@ We can now run the decorated function above. Pass `print_data=True` to see the p
6 262144.0 341.333321 341.333321
7 524288.0 472.615390 472.615390
8 1048576.0 614.400016 614.400016
9 2097152.0 722.823517 722.823517
9 2097152.0 722.823517 702.171410
10 4194304.0 780.190482 780.190482
11 8388608.0 812.429770 812.429770
12 16777216.0 833.084721 833.084721
@@ -254,7 +254,7 @@ We can now run the decorated function above. Pass `print_data=True` to see the p
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 35.667 seconds)
**Total running time of the script:** ( 1 minutes 42.455 seconds)
.. _sphx_glr_download_getting-started_tutorials_01-vector-add.py:

View File

@@ -286,17 +286,17 @@ We will then compare its performance against (1) :code:`torch.softmax` and (2) t
softmax-performance:
N Triton Torch (native) Torch (jit)
0 256.0 512.000001 512.000001 188.321838
1 384.0 585.142862 585.142862 153.600004
2 512.0 655.360017 585.142849 154.566038
3 640.0 682.666684 640.000002 158.759699
0 256.0 512.000001 546.133347 190.511628
1 384.0 585.142862 558.545450 153.600004
2 512.0 655.360017 606.814814 154.566038
3 640.0 682.666684 640.000002 160.000000
4 768.0 722.823517 664.216187 162.754967
.. ... ... ... ...
93 12160.0 814.058574 406.179533 198.834951
94 12288.0 814.111783 415.661740 199.197579
95 12416.0 812.498981 412.149375 198.755369
96 12544.0 812.566838 412.971190 199.012395
97 12672.0 812.633240 412.097543 199.069228
93 12160.0 814.058574 406.179533 199.140227
94 12288.0 814.111783 415.661740 199.298541
95 12416.0 812.498981 411.722274 198.854847
96 12544.0 812.566838 412.971190 199.111113
97 12672.0 812.633240 412.516771 199.167004
[98 rows x 4 columns]
@@ -314,7 +314,7 @@ In the above plot, we can see that:
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 3 minutes 20.961 seconds)
**Total running time of the script:** ( 3 minutes 24.085 seconds)
.. _sphx_glr_download_getting-started_tutorials_02-fused-softmax.py:

View File

@@ -462,37 +462,37 @@ We can now compare the performance of our kernel against that of cuBLAS. Here we
matmul-performance:
M cuBLAS ... Triton Triton (+ LeakyReLU)
0 256.0 2.730667 ... 2.978909 2.978909
0 256.0 2.978909 ... 3.276800 2.978909
1 384.0 7.372800 ... 8.507077 8.507077
2 512.0 14.563555 ... 15.420235 15.420235
2 512.0 14.563555 ... 16.384000 16.384000
3 640.0 22.260869 ... 24.380953 24.380953
4 768.0 32.768000 ... 34.028308 34.028308
5 896.0 37.971025 ... 40.140799 39.025776
6 1024.0 49.932191 ... 52.428801 52.428801
6 1024.0 49.932191 ... 53.773130 52.428801
7 1152.0 45.242181 ... 46.656000 46.656000
8 1280.0 51.200001 ... 56.888887 56.109587
8 1280.0 51.200001 ... 56.888887 56.888887
9 1408.0 64.138541 ... 67.305878 66.485074
10 1536.0 80.430545 ... 79.526831 78.643199
11 1664.0 63.372618 ... 62.492442 62.061463
11 1664.0 63.372618 ... 62.929456 62.061463
12 1792.0 72.983276 ... 72.512412 71.588687
13 1920.0 69.120002 ... 70.530615 70.530615
13 1920.0 69.467336 ... 70.530615 70.530615
14 2048.0 73.908442 ... 77.314362 76.959706
15 2176.0 83.500614 ... 85.998493 85.269692
16 2304.0 68.056616 ... 77.307030 76.809875
15 2176.0 83.155572 ... 86.367588 84.909907
16 2304.0 68.446623 ... 77.307030 76.809875
17 2432.0 71.305746 ... 85.653855 84.877538
18 2560.0 77.833728 ... 81.310171 80.511054
19 2688.0 83.552988 ... 89.464755 89.044730
20 2816.0 82.759409 ... 83.873477 82.446516
21 2944.0 81.298583 ... 82.784108 83.060049
22 3072.0 82.661468 ... 89.451983 88.473602
23 3200.0 82.156612 ... 95.522391 94.814812
24 3328.0 84.003845 ... 81.254285 83.710812
25 3456.0 81.849303 ... 91.097818 91.097818
26 3584.0 87.893835 ... 92.696281 95.047985
27 3712.0 86.192706 ... 88.326564 90.445760
28 3840.0 81.859361 ... 90.798032 87.910967
29 3968.0 91.816356 ... 89.133631 91.403695
30 4096.0 86.563285 ... 89.418872 92.691803
18 2560.0 77.833728 ... 81.310171 80.709358
19 2688.0 83.737433 ... 89.888756 89.254248
20 2816.0 83.233226 ... 83.074685 83.392363
21 2944.0 82.715407 ... 83.198715 81.298583
22 3072.0 82.420822 ... 89.593522 89.170242
23 3200.0 84.544253 ... 96.096095 94.814812
24 3328.0 83.808259 ... 81.994643 84.496824
25 3456.0 82.604067 ... 91.511426 90.994998
26 3584.0 85.552231 ... 95.858629 98.483450
27 3712.0 83.317214 ... 90.939777 88.170647
28 3840.0 82.654712 ... 90.798032 87.011801
29 3968.0 89.855624 ... 87.158986 91.335278
30 4096.0 91.743400 ... 90.504200 93.206754
[31 rows x 5 columns]
@@ -502,7 +502,7 @@ We can now compare the performance of our kernel against that of cuBLAS. Here we
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 5 minutes 22.344 seconds)
**Total running time of the script:** ( 5 minutes 25.038 seconds)
.. _sphx_glr_download_getting-started_tutorials_03-matrix-multiplication.py:

View File

@@ -38,36 +38,36 @@ Layer Normalization
layer-norm-backward:
N Triton Torch Apex
0 1024.0 307.200008 99.096776 307.200008
1 1536.0 347.773587 133.083026 338.201833
2 2048.0 423.724127 162.217818 325.509933
3 2560.0 461.954908 182.857144 325.079368
4 3072.0 511.999982 191.005181 317.793096
0 1024.0 307.200008 99.497980 307.200008
1 1536.0 351.085717 133.565214 341.333333
2 2048.0 423.724127 159.067963 319.168844
3 2560.0 461.954908 183.402991 325.079368
4 3072.0 511.999982 192.501302 319.168834
5 3584.0 551.384634 207.768111 309.410081
6 4096.0 564.965515 220.412561 298.796351
7 4608.0 495.928261 231.364016 286.507772
8 5120.0 525.128191 242.845844 283.787523
9 5632.0 536.380957 243.107920 290.683877
10 6144.0 542.117638 248.242431 285.490817
11 6656.0 527.207907 256.000009 286.536325
12 7168.0 505.976473 261.844750 288.160801
13 7680.0 481.253256 260.707203 277.172933
14 8192.0 460.440290 268.957600 286.600589
15 8704.0 416.958106 267.472468 284.987724
16 9216.0 428.651187 272.729961 289.507855
17 9728.0 438.857162 279.942444 288.950501
18 10240.0 446.836366 286.767793 290.496460
19 10752.0 428.651173 246.464170 290.267711
20 11264.0 428.424741 244.869560 285.767446
21 11776.0 421.198220 249.227509 288.686414
22 12288.0 420.102570 254.344118 294.617366
23 12800.0 415.135142 253.465340 289.811310
24 13312.0 412.242569 252.559690 289.916513
25 13824.0 404.604870 257.190689 292.571423
26 14336.0 397.761846 254.673567 286.242939
27 14848.0 384.414233 257.108233 289.012175
28 15360.0 374.253788 257.610071 287.326580
29 15872.0 366.982663 262.708969 291.229369
6 4096.0 568.231237 220.412561 299.707322
7 4608.0 498.162157 232.825259 286.507772
8 5120.0 525.128191 240.941184 283.787523
9 5632.0 538.517949 243.985547 290.060087
10 6144.0 542.117638 249.502530 286.322318
11 6656.0 527.207907 256.410903 285.767438
12 7168.0 512.000004 258.306304 282.947381
13 7680.0 485.052616 263.314295 282.266452
14 8192.0 463.698115 264.613724 281.270376
15 8704.0 415.300208 267.472468 286.158893
16 9216.0 427.822068 273.066667 289.129410
17 9728.0 437.213490 279.942444 289.308559
18 10240.0 446.025405 287.102804 291.876482
19 10752.0 430.797982 246.699797 290.267711
20 11264.0 427.746848 246.656943 288.512281
21 11776.0 422.457417 249.667843 288.686414
22 12288.0 420.102570 254.234486 294.617366
23 12800.0 414.016170 253.884294 289.811310
24 13312.0 411.711355 252.360194 290.179836
25 13824.0 403.620451 256.991469 292.056329
26 14336.0 395.021816 254.862216 287.919661
27 14848.0 383.999990 257.108233 289.481735
28 15360.0 373.495460 258.332158 288.000007
29 15872.0 365.573890 262.708969 290.562936
@@ -329,7 +329,7 @@ Layer Normalization
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 2 minutes 11.039 seconds)
**Total running time of the script:** ( 2 minutes 13.195 seconds)
.. _sphx_glr_download_getting-started_tutorials_05-layer-norm.py:

View File

@@ -5,16 +5,16 @@
Computation times
=================
**12:30.021** total execution time for **getting-started_tutorials** files:
**12:44.785** total execution time for **getting-started_tutorials** files:
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_03-matrix-multiplication.py` (``03-matrix-multiplication.py``) | 05:22.344 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_03-matrix-multiplication.py` (``03-matrix-multiplication.py``) | 05:25.038 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_02-fused-softmax.py` (``02-fused-softmax.py``) | 03:20.961 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_02-fused-softmax.py` (``02-fused-softmax.py``) | 03:24.085 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_05-layer-norm.py` (``05-layer-norm.py``) | 02:11.039 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_05-layer-norm.py` (``05-layer-norm.py``) | 02:13.195 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_01-vector-add.py` (``01-vector-add.py``) | 01:35.667 | 0.0 MB |
| :ref:`sphx_glr_getting-started_tutorials_01-vector-add.py` (``01-vector-add.py``) | 01:42.455 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_getting-started_tutorials_04-low-memory-dropout.py` (``04-low-memory-dropout.py``) | 00:00.011 | 0.0 MB |
+---------------------------------------------------------------------------------------------------------+-----------+--------+