[GH-PAGES] Updated website
| @@ -1,4 +1,4 @@ | |||||||
| # Sphinx build info version 1 | # Sphinx build info version 1 | ||||||
| # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||||||
| config: bab18fe850096d21a7ceed593584c803 | config: 48e2a8a61bb97d88b7f8ffce4bdf3b57 | ||||||
| tags: 645f666f9bcd5a90fca523b33c5a78b7 | tags: 645f666f9bcd5a90fca523b33c5a78b7 | ||||||
|   | |||||||
| Before Width: | Height: | Size: 24 KiB After Width: | Height: | Size: 24 KiB | 
| Before Width: | Height: | Size: 15 KiB After Width: | Height: | Size: 15 KiB | 
| Before Width: | Height: | Size: 37 KiB After Width: | Height: | Size: 37 KiB | 
| Before Width: | Height: | Size: 23 KiB After Width: | Height: | Size: 23 KiB | 
| Before Width: | Height: | Size: 59 KiB After Width: | Height: | Size: 59 KiB | 
| Before Width: | Height: | Size: 34 KiB After Width: | Height: | Size: 34 KiB | 
| Before Width: | Height: | Size: 35 KiB After Width: | Height: | Size: 36 KiB | 
| Before Width: | Height: | Size: 22 KiB After Width: | Height: | Size: 22 KiB | 
| @@ -238,10 +238,10 @@ We can now run the decorated function above. Pass `print_data=True` to see the p | |||||||
|     3       32768.0   76.800002   76.800002 |     3       32768.0   76.800002   76.800002 | ||||||
|     4       65536.0  127.999995  127.999995 |     4       65536.0  127.999995  127.999995 | ||||||
|     5      131072.0  219.428568  219.428568 |     5      131072.0  219.428568  219.428568 | ||||||
|     6      262144.0  341.333321  341.333321 |     6      262144.0  341.333321  384.000001 | ||||||
|     7      524288.0  472.615390  472.615390 |     7      524288.0  472.615390  472.615390 | ||||||
|     8     1048576.0  614.400016  614.400016 |     8     1048576.0  614.400016  614.400016 | ||||||
|     9     2097152.0  722.823517  702.171410 |     9     2097152.0  722.823517  722.823517 | ||||||
|     10    4194304.0  780.190482  780.190482 |     10    4194304.0  780.190482  780.190482 | ||||||
|     11    8388608.0  812.429770  812.429770 |     11    8388608.0  812.429770  812.429770 | ||||||
|     12   16777216.0  833.084721  833.084721 |     12   16777216.0  833.084721  833.084721 | ||||||
| @@ -255,7 +255,7 @@ We can now run the decorated function above. Pass `print_data=True` to see the p | |||||||
|  |  | ||||||
| .. rst-class:: sphx-glr-timing | .. rst-class:: sphx-glr-timing | ||||||
|  |  | ||||||
|    **Total running time of the script:** ( 1 minutes  39.442 seconds) |    **Total running time of the script:** ( 1 minutes  41.877 seconds) | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _sphx_glr_download_getting-started_tutorials_01-vector-add.py: | .. _sphx_glr_download_getting-started_tutorials_01-vector-add.py: | ||||||
|   | |||||||
| @@ -278,17 +278,17 @@ We will then compare its performance against (1) :code:`torch.softmax` and (2) t | |||||||
|  |  | ||||||
|     softmax-performance: |     softmax-performance: | ||||||
|               N      Triton  Torch (native)  Torch (jit) |               N      Triton  Torch (native)  Torch (jit) | ||||||
|     0     256.0  512.000001      546.133347   190.511628 |     0     256.0  512.000001      546.133347   186.181817 | ||||||
|     1     384.0  614.400016      558.545450   153.600004 |     1     384.0  614.400016      585.142862   153.600004 | ||||||
|     2     512.0  655.360017      606.814814   154.566038 |     2     512.0  655.360017      585.142849   154.566038 | ||||||
|     3     640.0  706.206879      640.000002   160.000000 |     3     640.0  706.206879      640.000002   160.000000 | ||||||
|     4     768.0  722.823517      664.216187   162.754967 |     4     768.0  722.823517      664.216187   162.754967 | ||||||
|     ..      ...         ...             ...          ... |     ..      ...         ...             ...          ... | ||||||
|     93  12160.0  812.359066      405.755985   199.038365 |     93  12160.0  812.359066      406.179533   198.530610 | ||||||
|     94  12288.0  812.429770      415.661740   199.298541 |     94  12288.0  812.429770      415.661740   198.895304 | ||||||
|     95  12416.0  812.498981      411.722274   198.854847 |     95  12416.0  812.498981      412.149375   198.457532 | ||||||
|     96  12544.0  812.566838      412.971190   199.111113 |     96  12544.0  810.925276      412.971190   198.815254 | ||||||
|     97  12672.0  812.633240      412.097543   199.167004 |     97  12672.0  811.007961      412.097543   198.873965 | ||||||
|  |  | ||||||
|     [98 rows x 4 columns] |     [98 rows x 4 columns] | ||||||
|  |  | ||||||
| @@ -306,7 +306,7 @@ In the above plot, we can see that: | |||||||
|  |  | ||||||
| .. rst-class:: sphx-glr-timing | .. rst-class:: sphx-glr-timing | ||||||
|  |  | ||||||
|    **Total running time of the script:** ( 3 minutes  22.625 seconds) |    **Total running time of the script:** ( 3 minutes  22.722 seconds) | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _sphx_glr_download_getting-started_tutorials_02-fused-softmax.py: | .. _sphx_glr_download_getting-started_tutorials_02-fused-softmax.py: | ||||||
|   | |||||||
| @@ -459,12 +459,12 @@ We can now compare the performance of our kernel against that of cuBLAS. Here we | |||||||
|  |  | ||||||
|     matmul-performance: |     matmul-performance: | ||||||
|              M     cuBLAS  ...     Triton  Triton (+ LeakyReLU) |              M     cuBLAS  ...     Triton  Triton (+ LeakyReLU) | ||||||
|     0    256.0   2.730667  ...   3.276800              2.978909 |     0    256.0   2.730667  ...   2.978909              3.276800 | ||||||
|     1    384.0   7.372800  ...   8.507077              8.507077 |     1    384.0   7.372800  ...   7.899428              7.899428 | ||||||
|     2    512.0  14.563555  ...  15.420235             15.420235 |     2    512.0  14.563555  ...  15.420235             15.420235 | ||||||
|     3    640.0  22.260869  ...  24.380953             24.380953 |     3    640.0  22.260869  ...  24.380953             24.380953 | ||||||
|     4    768.0  32.768000  ...  35.389441             34.028308 |     4    768.0  32.768000  ...  35.389441             34.028308 | ||||||
|     5    896.0  39.025776  ...  40.140799             39.025776 |     5    896.0  37.971025  ...  40.140799             39.025776 | ||||||
|     6   1024.0  49.932191  ...  53.773130             52.428801 |     6   1024.0  49.932191  ...  53.773130             52.428801 | ||||||
|     7   1152.0  45.242181  ...  48.161033             47.396572 |     7   1152.0  45.242181  ...  48.161033             47.396572 | ||||||
|     8   1280.0  51.200001  ...  57.690139             57.690139 |     8   1280.0  51.200001  ...  57.690139             57.690139 | ||||||
| @@ -472,24 +472,24 @@ We can now compare the performance of our kernel against that of cuBLAS. Here we | |||||||
|     10  1536.0  80.430545  ...  81.355034             79.526831 |     10  1536.0  80.430545  ...  81.355034             79.526831 | ||||||
|     11  1664.0  63.372618  ...  63.372618             62.492442 |     11  1664.0  63.372618  ...  63.372618             62.492442 | ||||||
|     12  1792.0  72.983276  ...  73.460287             59.467852 |     12  1792.0  72.983276  ...  73.460287             59.467852 | ||||||
|     13  1920.0  69.467336  ...  71.257735             70.892307 |     13  1920.0  69.120002  ...  71.257735             70.892307 | ||||||
|     14  2048.0  73.262953  ...  78.033565             76.959706 |     14  2048.0  73.262953  ...  78.033565             76.959706 | ||||||
|     15  2176.0  83.155572  ...  87.494120             85.632545 |     15  2176.0  83.155572  ...  87.876193             85.998493 | ||||||
|     16  2304.0  68.446623  ...  78.064941             77.057651 |     16  2304.0  68.251065  ...  78.064941             77.307030 | ||||||
|     17  2432.0  71.305746  ...  86.711310             85.393507 |     17  2432.0  71.487187  ...  86.979769             85.915795 | ||||||
|     18  2560.0  77.833728  ...  82.331658             81.512437 |     18  2560.0  78.019048  ...  82.747477             81.108913 | ||||||
|     19  2688.0  83.922689  ...  90.748936             88.836198 |     19  2688.0  83.922689  ...  90.316801             88.836198 | ||||||
|     20  2816.0  79.879498  ...  84.197315             82.446516 |     20  2816.0  82.135981  ...  85.017948             84.035084 | ||||||
|     21  2944.0  82.509987  ...  83.198715             81.967162 |     21  2944.0  81.967162  ...  83.060049             81.832567 | ||||||
|     22  3072.0  82.062468  ...  88.750943             87.516392 |     22  3072.0  81.121923  ...  89.593522             88.060814 | ||||||
|     23  3200.0  84.880639  ...  93.158662             93.841640 |     23  3200.0  84.768213  ...  97.116842             95.380032 | ||||||
|     24  3328.0  81.530349  ...  85.857242             84.298943 |     24  3328.0  83.613586  ...  85.602017             84.101981 | ||||||
|     25  3456.0  82.435141  ...  91.771848             90.892410 |     25  3456.0  81.849303  ...  86.503829             83.893412 | ||||||
|     26  3584.0  85.552231  ...  88.496679             87.381330 |     26  3584.0  86.457107  ...  98.699661             97.205829 | ||||||
|     27  3712.0  85.675250  ...  93.187820             87.706180 |     27  3712.0  82.491612  ...  89.273764             84.444075 | ||||||
|     28  3840.0  81.798814  ...  90.723546             86.535214 |     28  3840.0  85.070769  ...  87.217666             91.247522 | ||||||
|     29  3968.0  89.921841  ...  85.451873             88.040360 |     29  3968.0  89.690508  ...  92.024087             85.004484 | ||||||
|     30  4096.0  92.691803  ...  93.271527             87.381330 |     30  4096.0  94.320258  ...  90.200084             82.241256 | ||||||
|  |  | ||||||
|     [31 rows x 5 columns] |     [31 rows x 5 columns] | ||||||
|  |  | ||||||
| @@ -499,7 +499,7 @@ We can now compare the performance of our kernel against that of cuBLAS. Here we | |||||||
|  |  | ||||||
| .. rst-class:: sphx-glr-timing | .. rst-class:: sphx-glr-timing | ||||||
|  |  | ||||||
|    **Total running time of the script:** ( 6 minutes  9.590 seconds) |    **Total running time of the script:** ( 6 minutes  6.038 seconds) | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _sphx_glr_download_getting-started_tutorials_03-matrix-multiplication.py: | .. _sphx_glr_download_getting-started_tutorials_03-matrix-multiplication.py: | ||||||
|   | |||||||
| @@ -240,7 +240,7 @@ References | |||||||
|  |  | ||||||
| .. rst-class:: sphx-glr-timing | .. rst-class:: sphx-glr-timing | ||||||
|  |  | ||||||
|    **Total running time of the script:** ( 0 minutes  0.468 seconds) |    **Total running time of the script:** ( 0 minutes  0.476 seconds) | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _sphx_glr_download_getting-started_tutorials_04-low-memory-dropout.py: | .. _sphx_glr_download_getting-started_tutorials_04-low-memory-dropout.py: | ||||||
|   | |||||||
| @@ -38,36 +38,36 @@ Layer Normalization | |||||||
|  |  | ||||||
|     layer-norm: |     layer-norm: | ||||||
|               N      Triton       Torch        Apex |               N      Triton       Torch        Apex | ||||||
|     0    1024.0  585.142849  277.694907  481.882344 |     0    1024.0  585.142849  277.694907  468.114273 | ||||||
|     1    1536.0  630.153868  323.368435  511.999982 |     1    1536.0  630.153868  323.368435  511.999982 | ||||||
|     2    2048.0  682.666643  337.814445  520.126988 |     2    2048.0  682.666643  334.367358  520.126988 | ||||||
|     3    2560.0  694.237267  362.477870  512.000013 |     3    2560.0  694.237267  362.477870  512.000013 | ||||||
|     4    3072.0  712.347810  378.092307  501.551037 |     4    3072.0  712.347810  375.206126  501.551037 | ||||||
|     5    3584.0  725.873439  384.859062  451.527536 |     5    3584.0  725.873439  384.859062  455.111115 | ||||||
|     6    4096.0  728.177767  381.023256  451.972420 |     6    4096.0  728.177767  381.023256  458.293714 | ||||||
|     7    4608.0  676.403666  396.387087  428.651163 |     7    4608.0  676.403666  396.387087  431.157877 | ||||||
|     8    5120.0  688.403381  395.748783  420.102563 |     8    5120.0  688.403381  397.669909  422.268057 | ||||||
|     9    5632.0  709.543270  395.228063  415.262685 |     9    5632.0  704.000002  396.969169  417.185184 | ||||||
|     10   6144.0  702.171410  402.885254  411.313806 |     10   6144.0  702.171410  402.885254  411.313806 | ||||||
|     11   6656.0  700.631610  400.360920  400.360920 |     11   6656.0  705.271522  400.360920  400.360920 | ||||||
|     12   7168.0  690.891575  388.772874  384.859062 |     12   7168.0  690.891575  396.844306  387.459443 | ||||||
|     13   7680.0  682.666656  392.587863  386.415087 |     13   7680.0  682.666656  393.846167  387.634072 | ||||||
|     14   8192.0  639.375598  390.095241  370.259899 |     14   8192.0  639.375598  393.609605  372.363633 | ||||||
|     15   8704.0  624.502255  389.005597  379.465939 |     15   8704.0  630.153861  389.005597  380.502740 | ||||||
|     16   9216.0  606.814809  406.214877  382.010363 |     16   9216.0  609.322328  407.337026  383.999986 | ||||||
|     17   9728.0  587.350922  408.524944  382.427505 |     17   9728.0  589.575753  409.599987  383.369452 | ||||||
|     18  10240.0  566.920437  409.600010  382.803739 |     18  10240.0  566.920437  408.578556  382.803739 | ||||||
|     19  10752.0  549.623009  411.559798  381.445676 |     19  10752.0  549.623009  411.559798  381.445676 | ||||||
|     20  11264.0  534.789310  403.185684  371.595879 |     20  11264.0  536.380957  406.826188  373.134567 | ||||||
|     21  11776.0  523.377770  410.492372  376.831982 |     21  11776.0  523.377770  409.599991  377.587162 | ||||||
|     22  12288.0  518.754611  413.911572  383.251457 |     22  12288.0  516.031509  414.784810  383.251457 | ||||||
|     23  12800.0  505.679014  409.599981  377.163903 |     23  12800.0  505.679014  410.420828  376.470582 | ||||||
|     24  13312.0  495.330249  405.699062  376.976995 |     24  13312.0  494.180982  405.699062  376.976995 | ||||||
|     25  13824.0  482.934503  412.656711  379.389355 |     25  13824.0  482.934503  411.888257  379.389355 | ||||||
|     26  14336.0  471.967074  403.830973  371.158581 |     26  14336.0  470.997935  406.695045  374.185964 | ||||||
|     27  14848.0  461.297068  406.794504  374.712936 |     27  14848.0  461.297068  408.192434  375.304904 | ||||||
|     28  15360.0  454.269882  406.887417  378.092307 |     28  15360.0  454.269882  406.214870  378.092307 | ||||||
|     29  15872.0  447.098578  406.974373  375.668625 |     29  15872.0  447.098578  407.627589  376.783377 | ||||||
|  |  | ||||||
|  |  | ||||||
|  |  | ||||||
| @@ -389,7 +389,7 @@ Layer Normalization | |||||||
|  |  | ||||||
| .. rst-class:: sphx-glr-timing | .. rst-class:: sphx-glr-timing | ||||||
|  |  | ||||||
|    **Total running time of the script:** ( 5 minutes  24.904 seconds) |    **Total running time of the script:** ( 5 minutes  25.911 seconds) | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _sphx_glr_download_getting-started_tutorials_05-layer-norm.py: | .. _sphx_glr_download_getting-started_tutorials_05-layer-norm.py: | ||||||
|   | |||||||
| @@ -5,16 +5,16 @@ | |||||||
|  |  | ||||||
| Computation times | Computation times | ||||||
| ================= | ================= | ||||||
| **16:37.029** total execution time for **getting-started_tutorials** files: | **16:37.024** total execution time for **getting-started_tutorials** files: | ||||||
|  |  | ||||||
| +---------------------------------------------------------------------------------------------------------+-----------+--------+ | +---------------------------------------------------------------------------------------------------------+-----------+--------+ | ||||||
| | :ref:`sphx_glr_getting-started_tutorials_03-matrix-multiplication.py` (``03-matrix-multiplication.py``) | 06:09.590 | 0.0 MB | | | :ref:`sphx_glr_getting-started_tutorials_03-matrix-multiplication.py` (``03-matrix-multiplication.py``) | 06:06.038 | 0.0 MB | | ||||||
| +---------------------------------------------------------------------------------------------------------+-----------+--------+ | +---------------------------------------------------------------------------------------------------------+-----------+--------+ | ||||||
| | :ref:`sphx_glr_getting-started_tutorials_05-layer-norm.py` (``05-layer-norm.py``)                       | 05:24.904 | 0.0 MB | | | :ref:`sphx_glr_getting-started_tutorials_05-layer-norm.py` (``05-layer-norm.py``)                       | 05:25.911 | 0.0 MB | | ||||||
| +---------------------------------------------------------------------------------------------------------+-----------+--------+ | +---------------------------------------------------------------------------------------------------------+-----------+--------+ | ||||||
| | :ref:`sphx_glr_getting-started_tutorials_02-fused-softmax.py` (``02-fused-softmax.py``)                 | 03:22.625 | 0.0 MB | | | :ref:`sphx_glr_getting-started_tutorials_02-fused-softmax.py` (``02-fused-softmax.py``)                 | 03:22.722 | 0.0 MB | | ||||||
| +---------------------------------------------------------------------------------------------------------+-----------+--------+ | +---------------------------------------------------------------------------------------------------------+-----------+--------+ | ||||||
| | :ref:`sphx_glr_getting-started_tutorials_01-vector-add.py` (``01-vector-add.py``)                       | 01:39.442 | 0.0 MB | | | :ref:`sphx_glr_getting-started_tutorials_01-vector-add.py` (``01-vector-add.py``)                       | 01:41.877 | 0.0 MB | | ||||||
| +---------------------------------------------------------------------------------------------------------+-----------+--------+ | +---------------------------------------------------------------------------------------------------------+-----------+--------+ | ||||||
| | :ref:`sphx_glr_getting-started_tutorials_04-low-memory-dropout.py` (``04-low-memory-dropout.py``)       | 00:00.468 | 0.0 MB | | | :ref:`sphx_glr_getting-started_tutorials_04-low-memory-dropout.py` (``04-low-memory-dropout.py``)       | 00:00.476 | 0.0 MB | | ||||||
| +---------------------------------------------------------------------------------------------------------+-----------+--------+ | +---------------------------------------------------------------------------------------------------------+-----------+--------+ | ||||||
|   | |||||||
| @@ -328,10 +328,10 @@ for different problem sizes.</p> | |||||||
| 3       32768.0   76.800002   76.800002 | 3       32768.0   76.800002   76.800002 | ||||||
| 4       65536.0  127.999995  127.999995 | 4       65536.0  127.999995  127.999995 | ||||||
| 5      131072.0  219.428568  219.428568 | 5      131072.0  219.428568  219.428568 | ||||||
| 6      262144.0  341.333321  341.333321 | 6      262144.0  341.333321  384.000001 | ||||||
| 7      524288.0  472.615390  472.615390 | 7      524288.0  472.615390  472.615390 | ||||||
| 8     1048576.0  614.400016  614.400016 | 8     1048576.0  614.400016  614.400016 | ||||||
| 9     2097152.0  722.823517  702.171410 | 9     2097152.0  722.823517  722.823517 | ||||||
| 10    4194304.0  780.190482  780.190482 | 10    4194304.0  780.190482  780.190482 | ||||||
| 11    8388608.0  812.429770  812.429770 | 11    8388608.0  812.429770  812.429770 | ||||||
| 12   16777216.0  833.084721  833.084721 | 12   16777216.0  833.084721  833.084721 | ||||||
| @@ -340,7 +340,7 @@ for different problem sizes.</p> | |||||||
| 15  134217728.0  849.737435  850.656574 | 15  134217728.0  849.737435  850.656574 | ||||||
| </pre></div> | </pre></div> | ||||||
| </div> | </div> | ||||||
| <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  39.442 seconds)</p> | <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  41.877 seconds)</p> | ||||||
| <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-01-vector-add-py"> | <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-01-vector-add-py"> | ||||||
| <div class="sphx-glr-download sphx-glr-download-python docutils container"> | <div class="sphx-glr-download sphx-glr-download-python docutils container"> | ||||||
| <p><a class="reference download internal" download="" href="../../_downloads/62d97d49a32414049819dd8bb8378080/01-vector-add.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">01-vector-add.py</span></code></a></p> | <p><a class="reference download internal" download="" href="../../_downloads/62d97d49a32414049819dd8bb8378080/01-vector-add.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">01-vector-add.py</span></code></a></p> | ||||||
|   | |||||||
| @@ -369,17 +369,17 @@ We will then compare its performance against (1) <code class="code docutils lite | |||||||
| <p class="sphx-glr-script-out">Out:</p> | <p class="sphx-glr-script-out">Out:</p> | ||||||
| <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>softmax-performance: | <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>softmax-performance: | ||||||
|           N      Triton  Torch (native)  Torch (jit) |           N      Triton  Torch (native)  Torch (jit) | ||||||
| 0     256.0  512.000001      546.133347   190.511628 | 0     256.0  512.000001      546.133347   186.181817 | ||||||
| 1     384.0  614.400016      558.545450   153.600004 | 1     384.0  614.400016      585.142862   153.600004 | ||||||
| 2     512.0  655.360017      606.814814   154.566038 | 2     512.0  655.360017      585.142849   154.566038 | ||||||
| 3     640.0  706.206879      640.000002   160.000000 | 3     640.0  706.206879      640.000002   160.000000 | ||||||
| 4     768.0  722.823517      664.216187   162.754967 | 4     768.0  722.823517      664.216187   162.754967 | ||||||
| ..      ...         ...             ...          ... | ..      ...         ...             ...          ... | ||||||
| 93  12160.0  812.359066      405.755985   199.038365 | 93  12160.0  812.359066      406.179533   198.530610 | ||||||
| 94  12288.0  812.429770      415.661740   199.298541 | 94  12288.0  812.429770      415.661740   198.895304 | ||||||
| 95  12416.0  812.498981      411.722274   198.854847 | 95  12416.0  812.498981      412.149375   198.457532 | ||||||
| 96  12544.0  812.566838      412.971190   199.111113 | 96  12544.0  810.925276      412.971190   198.815254 | ||||||
| 97  12672.0  812.633240      412.097543   199.167004 | 97  12672.0  811.007961      412.097543   198.873965 | ||||||
|  |  | ||||||
| [98 rows x 4 columns] | [98 rows x 4 columns] | ||||||
| </pre></div> | </pre></div> | ||||||
| @@ -392,7 +392,7 @@ We will then compare its performance against (1) <code class="code docutils lite | |||||||
| Note however that the PyTorch <cite>softmax</cite> operation is more general and will works on tensors of any shape.</p></li> | Note however that the PyTorch <cite>softmax</cite> operation is more general and will works on tensors of any shape.</p></li> | ||||||
| </ul> | </ul> | ||||||
| </div></blockquote> | </div></blockquote> | ||||||
| <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes  22.625 seconds)</p> | <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes  22.722 seconds)</p> | ||||||
| <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-02-fused-softmax-py"> | <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-02-fused-softmax-py"> | ||||||
| <div class="sphx-glr-download sphx-glr-download-python docutils container"> | <div class="sphx-glr-download sphx-glr-download-python docutils container"> | ||||||
| <p><a class="reference download internal" download="" href="../../_downloads/d91442ac2982c4e0cc3ab0f43534afbc/02-fused-softmax.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">02-fused-softmax.py</span></code></a></p> | <p><a class="reference download internal" download="" href="../../_downloads/d91442ac2982c4e0cc3ab0f43534afbc/02-fused-softmax.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">02-fused-softmax.py</span></code></a></p> | ||||||
|   | |||||||
| @@ -565,12 +565,12 @@ torch_output=tensor([[  1.1045, -36.9688,  31.4688,  ..., -11.3906,  24.4531, -3 | |||||||
| <p class="sphx-glr-script-out">Out:</p> | <p class="sphx-glr-script-out">Out:</p> | ||||||
| <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>matmul-performance: | <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>matmul-performance: | ||||||
|          M     cuBLAS  ...     Triton  Triton (+ LeakyReLU) |          M     cuBLAS  ...     Triton  Triton (+ LeakyReLU) | ||||||
| 0    256.0   2.730667  ...   3.276800              2.978909 | 0    256.0   2.730667  ...   2.978909              3.276800 | ||||||
| 1    384.0   7.372800  ...   8.507077              8.507077 | 1    384.0   7.372800  ...   7.899428              7.899428 | ||||||
| 2    512.0  14.563555  ...  15.420235             15.420235 | 2    512.0  14.563555  ...  15.420235             15.420235 | ||||||
| 3    640.0  22.260869  ...  24.380953             24.380953 | 3    640.0  22.260869  ...  24.380953             24.380953 | ||||||
| 4    768.0  32.768000  ...  35.389441             34.028308 | 4    768.0  32.768000  ...  35.389441             34.028308 | ||||||
| 5    896.0  39.025776  ...  40.140799             39.025776 | 5    896.0  37.971025  ...  40.140799             39.025776 | ||||||
| 6   1024.0  49.932191  ...  53.773130             52.428801 | 6   1024.0  49.932191  ...  53.773130             52.428801 | ||||||
| 7   1152.0  45.242181  ...  48.161033             47.396572 | 7   1152.0  45.242181  ...  48.161033             47.396572 | ||||||
| 8   1280.0  51.200001  ...  57.690139             57.690139 | 8   1280.0  51.200001  ...  57.690139             57.690139 | ||||||
| @@ -578,29 +578,29 @@ torch_output=tensor([[  1.1045, -36.9688,  31.4688,  ..., -11.3906,  24.4531, -3 | |||||||
| 10  1536.0  80.430545  ...  81.355034             79.526831 | 10  1536.0  80.430545  ...  81.355034             79.526831 | ||||||
| 11  1664.0  63.372618  ...  63.372618             62.492442 | 11  1664.0  63.372618  ...  63.372618             62.492442 | ||||||
| 12  1792.0  72.983276  ...  73.460287             59.467852 | 12  1792.0  72.983276  ...  73.460287             59.467852 | ||||||
| 13  1920.0  69.467336  ...  71.257735             70.892307 | 13  1920.0  69.120002  ...  71.257735             70.892307 | ||||||
| 14  2048.0  73.262953  ...  78.033565             76.959706 | 14  2048.0  73.262953  ...  78.033565             76.959706 | ||||||
| 15  2176.0  83.155572  ...  87.494120             85.632545 | 15  2176.0  83.155572  ...  87.876193             85.998493 | ||||||
| 16  2304.0  68.446623  ...  78.064941             77.057651 | 16  2304.0  68.251065  ...  78.064941             77.307030 | ||||||
| 17  2432.0  71.305746  ...  86.711310             85.393507 | 17  2432.0  71.487187  ...  86.979769             85.915795 | ||||||
| 18  2560.0  77.833728  ...  82.331658             81.512437 | 18  2560.0  78.019048  ...  82.747477             81.108913 | ||||||
| 19  2688.0  83.922689  ...  90.748936             88.836198 | 19  2688.0  83.922689  ...  90.316801             88.836198 | ||||||
| 20  2816.0  79.879498  ...  84.197315             82.446516 | 20  2816.0  82.135981  ...  85.017948             84.035084 | ||||||
| 21  2944.0  82.509987  ...  83.198715             81.967162 | 21  2944.0  81.967162  ...  83.060049             81.832567 | ||||||
| 22  3072.0  82.062468  ...  88.750943             87.516392 | 22  3072.0  81.121923  ...  89.593522             88.060814 | ||||||
| 23  3200.0  84.880639  ...  93.158662             93.841640 | 23  3200.0  84.768213  ...  97.116842             95.380032 | ||||||
| 24  3328.0  81.530349  ...  85.857242             84.298943 | 24  3328.0  83.613586  ...  85.602017             84.101981 | ||||||
| 25  3456.0  82.435141  ...  91.771848             90.892410 | 25  3456.0  81.849303  ...  86.503829             83.893412 | ||||||
| 26  3584.0  85.552231  ...  88.496679             87.381330 | 26  3584.0  86.457107  ...  98.699661             97.205829 | ||||||
| 27  3712.0  85.675250  ...  93.187820             87.706180 | 27  3712.0  82.491612  ...  89.273764             84.444075 | ||||||
| 28  3840.0  81.798814  ...  90.723546             86.535214 | 28  3840.0  85.070769  ...  87.217666             91.247522 | ||||||
| 29  3968.0  89.921841  ...  85.451873             88.040360 | 29  3968.0  89.690508  ...  92.024087             85.004484 | ||||||
| 30  4096.0  92.691803  ...  93.271527             87.381330 | 30  4096.0  94.320258  ...  90.200084             82.241256 | ||||||
|  |  | ||||||
| [31 rows x 5 columns] | [31 rows x 5 columns] | ||||||
| </pre></div> | </pre></div> | ||||||
| </div> | </div> | ||||||
| <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 6 minutes  9.590 seconds)</p> | <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 6 minutes  6.038 seconds)</p> | ||||||
| <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-03-matrix-multiplication-py"> | <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-03-matrix-multiplication-py"> | ||||||
| <div class="sphx-glr-download sphx-glr-download-python docutils container"> | <div class="sphx-glr-download sphx-glr-download-python docutils container"> | ||||||
| <p><a class="reference download internal" download="" href="../../_downloads/d5fee5b55a64e47f1b5724ec39adf171/03-matrix-multiplication.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">03-matrix-multiplication.py</span></code></a></p> | <p><a class="reference download internal" download="" href="../../_downloads/d5fee5b55a64e47f1b5724ec39adf171/03-matrix-multiplication.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">03-matrix-multiplication.py</span></code></a></p> | ||||||
|   | |||||||
| @@ -372,7 +372,7 @@ to explore the <cite>triton/language/random</cite> folder!</p> | |||||||
| <dd><p>Nitish Srivastava and Geoffrey Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, JMLR 2014</p> | <dd><p>Nitish Srivastava and Geoffrey Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, JMLR 2014</p> | ||||||
| </dd> | </dd> | ||||||
| </dl> | </dl> | ||||||
| <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 0 minutes  0.468 seconds)</p> | <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 0 minutes  0.476 seconds)</p> | ||||||
| <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-04-low-memory-dropout-py"> | <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-04-low-memory-dropout-py"> | ||||||
| <div class="sphx-glr-download sphx-glr-download-python docutils container"> | <div class="sphx-glr-download sphx-glr-download-python docutils container"> | ||||||
| <p><a class="reference download internal" download="" href="../../_downloads/c9aed78977a4c05741d675a38dde3d7d/04-low-memory-dropout.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">04-low-memory-dropout.py</span></code></a></p> | <p><a class="reference download internal" download="" href="../../_downloads/c9aed78977a4c05741d675a38dde3d7d/04-low-memory-dropout.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">04-low-memory-dropout.py</span></code></a></p> | ||||||
|   | |||||||
| @@ -194,36 +194,36 @@ to download the full example code</p> | |||||||
| <p class="sphx-glr-script-out">Out:</p> | <p class="sphx-glr-script-out">Out:</p> | ||||||
| <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>layer-norm: | <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>layer-norm: | ||||||
|           N      Triton       Torch        Apex |           N      Triton       Torch        Apex | ||||||
| 0    1024.0  585.142849  277.694907  481.882344 | 0    1024.0  585.142849  277.694907  468.114273 | ||||||
| 1    1536.0  630.153868  323.368435  511.999982 | 1    1536.0  630.153868  323.368435  511.999982 | ||||||
| 2    2048.0  682.666643  337.814445  520.126988 | 2    2048.0  682.666643  334.367358  520.126988 | ||||||
| 3    2560.0  694.237267  362.477870  512.000013 | 3    2560.0  694.237267  362.477870  512.000013 | ||||||
| 4    3072.0  712.347810  378.092307  501.551037 | 4    3072.0  712.347810  375.206126  501.551037 | ||||||
| 5    3584.0  725.873439  384.859062  451.527536 | 5    3584.0  725.873439  384.859062  455.111115 | ||||||
| 6    4096.0  728.177767  381.023256  451.972420 | 6    4096.0  728.177767  381.023256  458.293714 | ||||||
| 7    4608.0  676.403666  396.387087  428.651163 | 7    4608.0  676.403666  396.387087  431.157877 | ||||||
| 8    5120.0  688.403381  395.748783  420.102563 | 8    5120.0  688.403381  397.669909  422.268057 | ||||||
| 9    5632.0  709.543270  395.228063  415.262685 | 9    5632.0  704.000002  396.969169  417.185184 | ||||||
| 10   6144.0  702.171410  402.885254  411.313806 | 10   6144.0  702.171410  402.885254  411.313806 | ||||||
| 11   6656.0  700.631610  400.360920  400.360920 | 11   6656.0  705.271522  400.360920  400.360920 | ||||||
| 12   7168.0  690.891575  388.772874  384.859062 | 12   7168.0  690.891575  396.844306  387.459443 | ||||||
| 13   7680.0  682.666656  392.587863  386.415087 | 13   7680.0  682.666656  393.846167  387.634072 | ||||||
| 14   8192.0  639.375598  390.095241  370.259899 | 14   8192.0  639.375598  393.609605  372.363633 | ||||||
| 15   8704.0  624.502255  389.005597  379.465939 | 15   8704.0  630.153861  389.005597  380.502740 | ||||||
| 16   9216.0  606.814809  406.214877  382.010363 | 16   9216.0  609.322328  407.337026  383.999986 | ||||||
| 17   9728.0  587.350922  408.524944  382.427505 | 17   9728.0  589.575753  409.599987  383.369452 | ||||||
| 18  10240.0  566.920437  409.600010  382.803739 | 18  10240.0  566.920437  408.578556  382.803739 | ||||||
| 19  10752.0  549.623009  411.559798  381.445676 | 19  10752.0  549.623009  411.559798  381.445676 | ||||||
| 20  11264.0  534.789310  403.185684  371.595879 | 20  11264.0  536.380957  406.826188  373.134567 | ||||||
| 21  11776.0  523.377770  410.492372  376.831982 | 21  11776.0  523.377770  409.599991  377.587162 | ||||||
| 22  12288.0  518.754611  413.911572  383.251457 | 22  12288.0  516.031509  414.784810  383.251457 | ||||||
| 23  12800.0  505.679014  409.599981  377.163903 | 23  12800.0  505.679014  410.420828  376.470582 | ||||||
| 24  13312.0  495.330249  405.699062  376.976995 | 24  13312.0  494.180982  405.699062  376.976995 | ||||||
| 25  13824.0  482.934503  412.656711  379.389355 | 25  13824.0  482.934503  411.888257  379.389355 | ||||||
| 26  14336.0  471.967074  403.830973  371.158581 | 26  14336.0  470.997935  406.695045  374.185964 | ||||||
| 27  14848.0  461.297068  406.794504  374.712936 | 27  14848.0  461.297068  408.192434  375.304904 | ||||||
| 28  15360.0  454.269882  406.887417  378.092307 | 28  15360.0  454.269882  406.214870  378.092307 | ||||||
| 29  15872.0  447.098578  406.974373  375.668625 | 29  15872.0  447.098578  407.627589  376.783377 | ||||||
| </pre></div> | </pre></div> | ||||||
| </div> | </div> | ||||||
| <div class="line-block"> | <div class="line-block"> | ||||||
| @@ -537,7 +537,7 @@ to download the full example code</p> | |||||||
| <span class="n">bench_layer_norm</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">save_path</span><span class="o">=</span><span class="s1">'.'</span><span class="p">,</span> <span class="n">print_data</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> | <span class="n">bench_layer_norm</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">save_path</span><span class="o">=</span><span class="s1">'.'</span><span class="p">,</span> <span class="n">print_data</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> | ||||||
| </pre></div> | </pre></div> | ||||||
| </div> | </div> | ||||||
| <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 5 minutes  24.904 seconds)</p> | <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 5 minutes  25.911 seconds)</p> | ||||||
| <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-05-layer-norm-py"> | <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-05-layer-norm-py"> | ||||||
| <div class="sphx-glr-download sphx-glr-download-python docutils container"> | <div class="sphx-glr-download sphx-glr-download-python docutils container"> | ||||||
| <p><a class="reference download internal" download="" href="../../_downloads/935c0dd0fbeb4b2e69588471cbb2d4b2/05-layer-norm.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">05-layer-norm.py</span></code></a></p> | <p><a class="reference download internal" download="" href="../../_downloads/935c0dd0fbeb4b2e69588471cbb2d4b2/05-layer-norm.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">05-layer-norm.py</span></code></a></p> | ||||||
|   | |||||||
| @@ -174,7 +174,7 @@ | |||||||
|              |              | ||||||
|   <div class="section" id="computation-times"> |   <div class="section" id="computation-times"> | ||||||
| <span id="sphx-glr-getting-started-tutorials-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1> | <span id="sphx-glr-getting-started-tutorials-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1> | ||||||
| <p><strong>16:37.029</strong> total execution time for <strong>getting-started_tutorials</strong> files:</p> | <p><strong>16:37.024</strong> total execution time for <strong>getting-started_tutorials</strong> files:</p> | ||||||
| <table class="docutils align-default"> | <table class="docutils align-default"> | ||||||
| <colgroup> | <colgroup> | ||||||
| <col style="width: 85%" /> | <col style="width: 85%" /> | ||||||
| @@ -183,23 +183,23 @@ | |||||||
| </colgroup> | </colgroup> | ||||||
| <tbody> | <tbody> | ||||||
| <tr class="row-odd"><td><p><a class="reference internal" href="03-matrix-multiplication.html#sphx-glr-getting-started-tutorials-03-matrix-multiplication-py"><span class="std std-ref">Matrix Multiplication</span></a> (<code class="docutils literal notranslate"><span class="pre">03-matrix-multiplication.py</span></code>)</p></td> | <tr class="row-odd"><td><p><a class="reference internal" href="03-matrix-multiplication.html#sphx-glr-getting-started-tutorials-03-matrix-multiplication-py"><span class="std std-ref">Matrix Multiplication</span></a> (<code class="docutils literal notranslate"><span class="pre">03-matrix-multiplication.py</span></code>)</p></td> | ||||||
| <td><p>06:09.590</p></td> | <td><p>06:06.038</p></td> | ||||||
| <td><p>0.0 MB</p></td> | <td><p>0.0 MB</p></td> | ||||||
| </tr> | </tr> | ||||||
| <tr class="row-even"><td><p><a class="reference internal" href="05-layer-norm.html#sphx-glr-getting-started-tutorials-05-layer-norm-py"><span class="std std-ref">Layer Normalization</span></a> (<code class="docutils literal notranslate"><span class="pre">05-layer-norm.py</span></code>)</p></td> | <tr class="row-even"><td><p><a class="reference internal" href="05-layer-norm.html#sphx-glr-getting-started-tutorials-05-layer-norm-py"><span class="std std-ref">Layer Normalization</span></a> (<code class="docutils literal notranslate"><span class="pre">05-layer-norm.py</span></code>)</p></td> | ||||||
| <td><p>05:24.904</p></td> | <td><p>05:25.911</p></td> | ||||||
| <td><p>0.0 MB</p></td> | <td><p>0.0 MB</p></td> | ||||||
| </tr> | </tr> | ||||||
| <tr class="row-odd"><td><p><a class="reference internal" href="02-fused-softmax.html#sphx-glr-getting-started-tutorials-02-fused-softmax-py"><span class="std std-ref">Fused Softmax</span></a> (<code class="docutils literal notranslate"><span class="pre">02-fused-softmax.py</span></code>)</p></td> | <tr class="row-odd"><td><p><a class="reference internal" href="02-fused-softmax.html#sphx-glr-getting-started-tutorials-02-fused-softmax-py"><span class="std std-ref">Fused Softmax</span></a> (<code class="docutils literal notranslate"><span class="pre">02-fused-softmax.py</span></code>)</p></td> | ||||||
| <td><p>03:22.625</p></td> | <td><p>03:22.722</p></td> | ||||||
| <td><p>0.0 MB</p></td> | <td><p>0.0 MB</p></td> | ||||||
| </tr> | </tr> | ||||||
| <tr class="row-even"><td><p><a class="reference internal" href="01-vector-add.html#sphx-glr-getting-started-tutorials-01-vector-add-py"><span class="std std-ref">Vector Addition</span></a> (<code class="docutils literal notranslate"><span class="pre">01-vector-add.py</span></code>)</p></td> | <tr class="row-even"><td><p><a class="reference internal" href="01-vector-add.html#sphx-glr-getting-started-tutorials-01-vector-add-py"><span class="std std-ref">Vector Addition</span></a> (<code class="docutils literal notranslate"><span class="pre">01-vector-add.py</span></code>)</p></td> | ||||||
| <td><p>01:39.442</p></td> | <td><p>01:41.877</p></td> | ||||||
| <td><p>0.0 MB</p></td> | <td><p>0.0 MB</p></td> | ||||||
| </tr> | </tr> | ||||||
| <tr class="row-odd"><td><p><a class="reference internal" href="04-low-memory-dropout.html#sphx-glr-getting-started-tutorials-04-low-memory-dropout-py"><span class="std std-ref">Low-Memory Dropout</span></a> (<code class="docutils literal notranslate"><span class="pre">04-low-memory-dropout.py</span></code>)</p></td> | <tr class="row-odd"><td><p><a class="reference internal" href="04-low-memory-dropout.html#sphx-glr-getting-started-tutorials-04-low-memory-dropout-py"><span class="std std-ref">Low-Memory Dropout</span></a> (<code class="docutils literal notranslate"><span class="pre">04-low-memory-dropout.py</span></code>)</p></td> | ||||||
| <td><p>00:00.468</p></td> | <td><p>00:00.476</p></td> | ||||||
| <td><p>0.0 MB</p></td> | <td><p>0.0 MB</p></td> | ||||||
| </tr> | </tr> | ||||||
| </tbody> | </tbody> | ||||||
|   | |||||||
| @@ -1,4 +1,4 @@ | |||||||
| # Sphinx build info version 1 | # Sphinx build info version 1 | ||||||
| # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||||||
| config: bae9558105859570a135817d997f9621 | config: 1e7d2af2caf91a8ce34dbf668ec1ae0e | ||||||
| tags: 645f666f9bcd5a90fca523b33c5a78b7 | tags: 645f666f9bcd5a90fca523b33c5a78b7 | ||||||
|   | |||||||