[GH-PAGES] Updated website
| @@ -1,4 +1,4 @@ | |||||||
| # Sphinx build info version 1 | # Sphinx build info version 1 | ||||||
| # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||||||
| config: c24d43c3d9203b5bc4a3756159b70958 | config: bce1be3691c7f89ed1b8fc62439dfb46 | ||||||
| tags: 645f666f9bcd5a90fca523b33c5a78b7 | tags: 645f666f9bcd5a90fca523b33c5a78b7 | ||||||
|   | |||||||
| Before Width: | Height: | Size: 25 KiB After Width: | Height: | Size: 24 KiB | 
| Before Width: | Height: | Size: 16 KiB After Width: | Height: | Size: 15 KiB | 
| Before Width: | Height: | Size: 36 KiB After Width: | Height: | Size: 37 KiB | 
| Before Width: | Height: | Size: 23 KiB After Width: | Height: | Size: 23 KiB | 
| Before Width: | Height: | Size: 58 KiB After Width: | Height: | Size: 58 KiB | 
| Before Width: | Height: | Size: 33 KiB After Width: | Height: | Size: 34 KiB | 
| Before Width: | Height: | Size: 32 KiB After Width: | Height: | Size: 32 KiB | 
| Before Width: | Height: | Size: 20 KiB After Width: | Height: | Size: 20 KiB | 
| @@ -245,7 +245,7 @@ We can now run the decorated function above. Pass `print_data=True` to see the p | |||||||
|     10    4194304.0  780.190482  780.190482 |     10    4194304.0  780.190482  780.190482 | ||||||
|     11    8388608.0  812.429770  812.429770 |     11    8388608.0  812.429770  812.429770 | ||||||
|     12   16777216.0  833.084721  833.084721 |     12   16777216.0  833.084721  833.084721 | ||||||
|     13   33554432.0  842.004273  842.004273 |     13   33554432.0  842.004273  843.811163 | ||||||
|     14   67108864.0  847.448255  848.362445 |     14   67108864.0  847.448255  848.362445 | ||||||
|     15  134217728.0  849.737435  850.656574 |     15  134217728.0  849.737435  850.656574 | ||||||
|  |  | ||||||
| @@ -255,7 +255,7 @@ We can now run the decorated function above. Pass `print_data=True` to see the p | |||||||
|  |  | ||||||
| .. rst-class:: sphx-glr-timing | .. rst-class:: sphx-glr-timing | ||||||
|  |  | ||||||
|    **Total running time of the script:** ( 1 minutes  36.291 seconds) |    **Total running time of the script:** ( 1 minutes  43.473 seconds) | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _sphx_glr_download_getting-started_tutorials_01-vector-add.py: | .. _sphx_glr_download_getting-started_tutorials_01-vector-add.py: | ||||||
|   | |||||||
| @@ -278,17 +278,17 @@ We will then compare its performance against (1) :code:`torch.softmax` and (2) t | |||||||
|  |  | ||||||
|     softmax-performance: |     softmax-performance: | ||||||
|               N      Triton  Torch (native)  Torch (jit) |               N      Triton  Torch (native)  Torch (jit) | ||||||
|     0     256.0  546.133347      546.133347   190.511628 |     0     256.0  512.000001      546.133347   188.321838 | ||||||
|     1     384.0  585.142862      585.142862   153.600004 |     1     384.0  614.400016      585.142862   153.600004 | ||||||
|     2     512.0  655.360017      585.142849   156.038096 |     2     512.0  655.360017      606.814814   154.566038 | ||||||
|     3     640.0  682.666684      640.000002   160.000000 |     3     640.0  706.206879      640.000002   160.000000 | ||||||
|     4     768.0  722.823517      664.216187   163.839992 |     4     768.0  722.823517      664.216187   162.754967 | ||||||
|     ..      ...         ...             ...          ... |     ..      ...         ...             ...          ... | ||||||
|     93  12160.0  814.058574      405.755985   198.936606 |     93  12160.0  815.765209      406.179533   198.936606 | ||||||
|     94  12288.0  814.111783      415.661740   198.995960 |     94  12288.0  814.111783      415.661740   199.197579 | ||||||
|     95  12416.0  814.163950      411.722274   198.755369 |     95  12416.0  814.163950      412.149375   198.755369 | ||||||
|     96  12544.0  814.214963      412.971190   198.913776 |     96  12544.0  814.214963      412.971190   199.012395 | ||||||
|     97  12672.0  814.265046      412.516771   199.069228 |     97  12672.0  814.265046      412.097543   198.971549 | ||||||
|  |  | ||||||
|     [98 rows x 4 columns] |     [98 rows x 4 columns] | ||||||
|  |  | ||||||
| @@ -306,7 +306,7 @@ In the above plot, we can see that: | |||||||
|  |  | ||||||
| .. rst-class:: sphx-glr-timing | .. rst-class:: sphx-glr-timing | ||||||
|  |  | ||||||
|    **Total running time of the script:** ( 3 minutes  20.551 seconds) |    **Total running time of the script:** ( 3 minutes  21.420 seconds) | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _sphx_glr_download_getting-started_tutorials_02-fused-softmax.py: | .. _sphx_glr_download_getting-started_tutorials_02-fused-softmax.py: | ||||||
|   | |||||||
| @@ -458,37 +458,37 @@ We can now compare the performance of our kernel against that of cuBLAS. Here we | |||||||
|  |  | ||||||
|     matmul-performance: |     matmul-performance: | ||||||
|              M     cuBLAS  ...     Triton  Triton (+ LeakyReLU) |              M     cuBLAS  ...     Triton  Triton (+ LeakyReLU) | ||||||
|     0    256.0   2.978909  ...   2.978909              3.276800 |     0    256.0   2.730667  ...   2.978909              2.978909 | ||||||
|     1    384.0   7.372800  ...   8.507077              8.507077 |     1    384.0   7.372800  ...   8.507077              8.507077 | ||||||
|     2    512.0  14.563555  ...  16.384000             16.384000 |     2    512.0  14.563555  ...  16.384000             15.420235 | ||||||
|     3    640.0  22.260869  ...  24.380953             24.380953 |     3    640.0  22.260869  ...  24.380953             24.380953 | ||||||
|     4    768.0  32.768000  ...  34.028308             34.028308 |     4    768.0  32.768000  ...  34.028308             34.028308 | ||||||
|     5    896.0  37.971025  ...  40.140799             39.025776 |     5    896.0  39.025776  ...  40.140799             39.025776 | ||||||
|     6   1024.0  49.932191  ...  52.428801             52.428801 |     6   1024.0  49.932191  ...  52.428801             52.428801 | ||||||
|     7   1152.0  45.242181  ...  46.656000             46.656000 |     7   1152.0  45.242181  ...  46.656000             46.656000 | ||||||
|     8   1280.0  51.200001  ...  56.888887             56.888887 |     8   1280.0  51.200001  ...  56.888887             56.888887 | ||||||
|     9   1408.0  64.138541  ...  67.305878             66.485074 |     9   1408.0  64.138541  ...  67.305878             66.485074 | ||||||
|     10  1536.0  80.430545  ...  79.526831             79.526831 |     10  1536.0  80.430545  ...  79.526831             79.526831 | ||||||
|     11  1664.0  62.929456  ...  62.492442             62.061463 |     11  1664.0  62.929456  ...  62.061463             62.061463 | ||||||
|     12  1792.0  72.512412  ...  72.047592             72.047592 |     12  1792.0  72.512412  ...  72.047592             72.047592 | ||||||
|     13  1920.0  69.120002  ...  70.172588             70.530615 |     13  1920.0  69.467336  ...  70.172588             70.172588 | ||||||
|     14  2048.0  73.584279  ...  76.959706             76.608294 |     14  2048.0  73.262953  ...  76.608294             76.260072 | ||||||
|     15  2176.0  83.155572  ...  86.367588             85.269692 |     15  2176.0  83.155572  ...  85.998493             85.269692 | ||||||
|     16  2304.0  68.446623  ...  77.057651             76.076024 |     16  2304.0  68.251065  ...  76.319081             76.809875 | ||||||
|     17  2432.0  71.305746  ...  84.367759             85.134737 |     17  2432.0  71.487187  ...  74.521127             84.877538 | ||||||
|     18  2560.0  78.019048  ...  80.908642             80.709358 |     18  2560.0  78.019048  ...  81.310171             81.108913 | ||||||
|     19  2688.0  83.369354  ...  89.676257             89.254248 |     19  2688.0  83.369354  ...  89.254248             89.676257 | ||||||
|     20  2816.0  79.733474  ...  83.392363             83.233226 |     20  2816.0  83.392363  ...  82.135981             83.392363 | ||||||
|     21  2944.0  81.967162  ...  82.237674             82.102191 |     21  2944.0  81.967162  ...  81.967162             81.698415 | ||||||
|     22  3072.0  81.825298  ...  88.612060             88.473602 |     22  3072.0  80.659693  ...  88.750943             88.750943 | ||||||
|     23  3200.0  84.880639  ...  95.096582             95.096582 |     23  3200.0  84.432717  ...  94.955488             95.096582 | ||||||
|     24  3328.0  83.808259  ...  84.101981             83.905938 |     24  3328.0  82.939284  ...  84.695641             81.254285 | ||||||
|     25  3456.0  78.655188  ...  85.950501             88.400840 |     25  3456.0  80.300370  ...  82.435141             88.497878 | ||||||
|     26  3584.0  87.296493  ...  97.628001             98.268190 |     26  3584.0  83.177979  ...  88.586589             94.947616 | ||||||
|     27  3712.0  80.757757  ...  88.326564             85.822459 |     27  3712.0  85.675250  ...  89.035062             87.170458 | ||||||
|     28  3840.0  83.027026  ...  91.059692             84.613126 |     28  3840.0  78.433173  ...  84.355978             91.511791 | ||||||
|     29  3968.0  90.724116  ...  87.347124             90.791620 |     29  3968.0  86.849777  ...  90.791620             84.040329 | ||||||
|     30  4096.0  86.313653  ...  86.424811             91.056800 |     30  4096.0  93.077479  ...  86.480498             83.365047 | ||||||
|  |  | ||||||
|     [31 rows x 5 columns] |     [31 rows x 5 columns] | ||||||
|  |  | ||||||
| @@ -498,7 +498,7 @@ We can now compare the performance of our kernel against that of cuBLAS. Here we | |||||||
|  |  | ||||||
| .. rst-class:: sphx-glr-timing | .. rst-class:: sphx-glr-timing | ||||||
|  |  | ||||||
|    **Total running time of the script:** ( 5 minutes  56.072 seconds) |    **Total running time of the script:** ( 6 minutes  1.040 seconds) | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _sphx_glr_download_getting-started_tutorials_03-matrix-multiplication.py: | .. _sphx_glr_download_getting-started_tutorials_03-matrix-multiplication.py: | ||||||
|   | |||||||
| @@ -38,36 +38,36 @@ Layer Normalization | |||||||
|  |  | ||||||
|     layer-norm-backward: |     layer-norm-backward: | ||||||
|               N      Triton       Torch        Apex |               N      Triton       Torch        Apex | ||||||
|     0    1024.0  307.200008   99.902435  307.200008 |     0    1024.0  307.200008   98.303995  307.200008 | ||||||
|     1    1536.0  351.085717  135.032961  341.333333 |     1    1536.0  347.773587  134.540150  341.333333 | ||||||
|     2    2048.0  420.102553  162.754967  327.679984 |     2    2048.0  420.102553  161.684218  325.509933 | ||||||
|     3    2560.0  458.507457  182.857144  330.322572 |     3    2560.0  458.507457  181.775141  325.079368 | ||||||
|     4    3072.0  515.580429  191.501303  319.168834 |     4    3072.0  511.999982  192.501302  320.556515 | ||||||
|     5    3584.0  547.872604  207.768111  311.652167 |     5    3584.0  551.384634  208.271186  311.652167 | ||||||
|     6    4096.0  568.231237  221.905193  301.546004 |     6    4096.0  568.231237  220.412561  299.707322 | ||||||
|     7    4608.0  504.986315  232.336141  287.999990 |     7    4608.0  507.302750  232.825259  287.999990 | ||||||
|     8    5120.0  531.948056  242.366855  285.104413 |     8    5120.0  529.655159  242.845844  287.775181 | ||||||
|     9    5632.0  538.517949  243.545956  290.683877 |     9    5632.0  545.032265  243.545956  289.438969 | ||||||
|     10   6144.0  546.133354  250.349744  288.000001 |     10   6144.0  548.163546  248.661056  286.879370 | ||||||
|     11   6656.0  536.053693  256.000009  286.279570 |     11   6656.0  534.260858  256.000009  285.767438 | ||||||
|     12   7168.0  510.480705  252.988236  277.024148 |     12   7168.0  507.469040  260.654538  286.242939 | ||||||
|     13   7680.0  482.513091  267.130429  284.444450 |     13   7680.0  479.999983  262.564106  278.850215 | ||||||
|     14   8192.0  463.698115  269.326017  282.482757 |     14   8192.0  462.607053  267.493874  284.939124 | ||||||
|     15   8704.0  417.791980  264.425310  282.673891 |     15   8704.0  417.791980  267.815384  284.599455 | ||||||
|     16   9216.0  431.157889  274.762727  291.031570 |     16   9216.0  431.157889  272.729961  289.129410 | ||||||
|     17   9728.0  439.683593  281.630872  290.027323 |     17   9728.0  439.683593  280.278512  290.027323 | ||||||
|     18  10240.0  446.025405  286.100109  289.811322 |     18  10240.0  450.109870  286.433562  290.153487 | ||||||
|     19  10752.0  425.120247  246.935876  289.941565 |     19  10752.0  426.525614  247.172406  290.922209 | ||||||
|     20  11264.0  425.056596  243.765566  283.371073 |     20  11264.0  427.071098  245.760001  286.676558 | ||||||
|     21  11776.0  423.089806  249.888595  289.129414 |     21  11776.0  423.089806  249.888595  288.981596 | ||||||
|     22  12288.0  421.302872  254.673582  295.207195 |     22  12288.0  419.504980  254.673582  294.323369 | ||||||
|     23  12800.0  414.574901  254.515329  290.909089 |     23  12800.0  414.016170  253.674644  288.180121 | ||||||
|     24  13312.0  414.381327  253.561895  289.129403 |     24  13312.0  411.181478  252.759501  289.916513 | ||||||
|     25  13824.0  406.090579  257.790206  293.088338 |     25  13824.0  404.112047  257.190689  292.056329 | ||||||
|     26  14336.0  394.116833  256.381525  290.349381 |     26  14336.0  393.215988  254.485198  286.719986 | ||||||
|     27  14848.0  385.245405  255.999999  287.844912 |     27  14848.0  385.245405  257.665934  289.246765 | ||||||
|     28  15360.0  376.932517  262.751252  291.184839 |     28  15360.0  373.874218  257.970599  286.211174 | ||||||
|     29  15872.0  370.913333  262.166551  291.118085 |     29  15872.0  370.913333  261.806182  289.899545 | ||||||
|  |  | ||||||
|  |  | ||||||
|  |  | ||||||
| @@ -339,7 +339,7 @@ Layer Normalization | |||||||
|  |  | ||||||
| .. rst-class:: sphx-glr-timing | .. rst-class:: sphx-glr-timing | ||||||
|  |  | ||||||
|    **Total running time of the script:** ( 2 minutes  11.560 seconds) |    **Total running time of the script:** ( 2 minutes  12.002 seconds) | ||||||
|  |  | ||||||
|  |  | ||||||
| .. _sphx_glr_download_getting-started_tutorials_05-layer-norm.py: | .. _sphx_glr_download_getting-started_tutorials_05-layer-norm.py: | ||||||
|   | |||||||
| @@ -5,16 +5,16 @@ | |||||||
|  |  | ||||||
| Computation times | Computation times | ||||||
| ================= | ================= | ||||||
| **13:04.956** total execution time for **getting-started_tutorials** files: | **13:18.417** total execution time for **getting-started_tutorials** files: | ||||||
|  |  | ||||||
| +---------------------------------------------------------------------------------------------------------+-----------+--------+ | +---------------------------------------------------------------------------------------------------------+-----------+--------+ | ||||||
| | :ref:`sphx_glr_getting-started_tutorials_03-matrix-multiplication.py` (``03-matrix-multiplication.py``) | 05:56.072 | 0.0 MB | | | :ref:`sphx_glr_getting-started_tutorials_03-matrix-multiplication.py` (``03-matrix-multiplication.py``) | 06:01.040 | 0.0 MB | | ||||||
| +---------------------------------------------------------------------------------------------------------+-----------+--------+ | +---------------------------------------------------------------------------------------------------------+-----------+--------+ | ||||||
| | :ref:`sphx_glr_getting-started_tutorials_02-fused-softmax.py` (``02-fused-softmax.py``)                 | 03:20.551 | 0.0 MB | | | :ref:`sphx_glr_getting-started_tutorials_02-fused-softmax.py` (``02-fused-softmax.py``)                 | 03:21.420 | 0.0 MB | | ||||||
| +---------------------------------------------------------------------------------------------------------+-----------+--------+ | +---------------------------------------------------------------------------------------------------------+-----------+--------+ | ||||||
| | :ref:`sphx_glr_getting-started_tutorials_05-layer-norm.py` (``05-layer-norm.py``)                       | 02:11.560 | 0.0 MB | | | :ref:`sphx_glr_getting-started_tutorials_05-layer-norm.py` (``05-layer-norm.py``)                       | 02:12.002 | 0.0 MB | | ||||||
| +---------------------------------------------------------------------------------------------------------+-----------+--------+ | +---------------------------------------------------------------------------------------------------------+-----------+--------+ | ||||||
| | :ref:`sphx_glr_getting-started_tutorials_01-vector-add.py` (``01-vector-add.py``)                       | 01:36.291 | 0.0 MB | | | :ref:`sphx_glr_getting-started_tutorials_01-vector-add.py` (``01-vector-add.py``)                       | 01:43.473 | 0.0 MB | | ||||||
| +---------------------------------------------------------------------------------------------------------+-----------+--------+ | +---------------------------------------------------------------------------------------------------------+-----------+--------+ | ||||||
| | :ref:`sphx_glr_getting-started_tutorials_04-low-memory-dropout.py` (``04-low-memory-dropout.py``)       | 00:00.482 | 0.0 MB | | | :ref:`sphx_glr_getting-started_tutorials_04-low-memory-dropout.py` (``04-low-memory-dropout.py``)       | 00:00.482 | 0.0 MB | | ||||||
| +---------------------------------------------------------------------------------------------------------+-----------+--------+ | +---------------------------------------------------------------------------------------------------------+-----------+--------+ | ||||||
|   | |||||||
| @@ -335,12 +335,12 @@ for different problem sizes.</p> | |||||||
| 10    4194304.0  780.190482  780.190482 | 10    4194304.0  780.190482  780.190482 | ||||||
| 11    8388608.0  812.429770  812.429770 | 11    8388608.0  812.429770  812.429770 | ||||||
| 12   16777216.0  833.084721  833.084721 | 12   16777216.0  833.084721  833.084721 | ||||||
| 13   33554432.0  842.004273  842.004273 | 13   33554432.0  842.004273  843.811163 | ||||||
| 14   67108864.0  847.448255  848.362445 | 14   67108864.0  847.448255  848.362445 | ||||||
| 15  134217728.0  849.737435  850.656574 | 15  134217728.0  849.737435  850.656574 | ||||||
| </pre></div> | </pre></div> | ||||||
| </div> | </div> | ||||||
| <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  36.291 seconds)</p> | <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  43.473 seconds)</p> | ||||||
| <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-01-vector-add-py"> | <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-01-vector-add-py"> | ||||||
| <div class="sphx-glr-download sphx-glr-download-python docutils container"> | <div class="sphx-glr-download sphx-glr-download-python docutils container"> | ||||||
| <p><a class="reference download internal" download="" href="../../_downloads/62d97d49a32414049819dd8bb8378080/01-vector-add.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">01-vector-add.py</span></code></a></p> | <p><a class="reference download internal" download="" href="../../_downloads/62d97d49a32414049819dd8bb8378080/01-vector-add.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">01-vector-add.py</span></code></a></p> | ||||||
|   | |||||||
| @@ -369,17 +369,17 @@ We will then compare its performance against (1) <code class="code docutils lite | |||||||
| <p class="sphx-glr-script-out">Out:</p> | <p class="sphx-glr-script-out">Out:</p> | ||||||
| <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>softmax-performance: | <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>softmax-performance: | ||||||
|           N      Triton  Torch (native)  Torch (jit) |           N      Triton  Torch (native)  Torch (jit) | ||||||
| 0     256.0  546.133347      546.133347   190.511628 | 0     256.0  512.000001      546.133347   188.321838 | ||||||
| 1     384.0  585.142862      585.142862   153.600004 | 1     384.0  614.400016      585.142862   153.600004 | ||||||
| 2     512.0  655.360017      585.142849   156.038096 | 2     512.0  655.360017      606.814814   154.566038 | ||||||
| 3     640.0  682.666684      640.000002   160.000000 | 3     640.0  706.206879      640.000002   160.000000 | ||||||
| 4     768.0  722.823517      664.216187   163.839992 | 4     768.0  722.823517      664.216187   162.754967 | ||||||
| ..      ...         ...             ...          ... | ..      ...         ...             ...          ... | ||||||
| 93  12160.0  814.058574      405.755985   198.936606 | 93  12160.0  815.765209      406.179533   198.936606 | ||||||
| 94  12288.0  814.111783      415.661740   198.995960 | 94  12288.0  814.111783      415.661740   199.197579 | ||||||
| 95  12416.0  814.163950      411.722274   198.755369 | 95  12416.0  814.163950      412.149375   198.755369 | ||||||
| 96  12544.0  814.214963      412.971190   198.913776 | 96  12544.0  814.214963      412.971190   199.012395 | ||||||
| 97  12672.0  814.265046      412.516771   199.069228 | 97  12672.0  814.265046      412.097543   198.971549 | ||||||
|  |  | ||||||
| [98 rows x 4 columns] | [98 rows x 4 columns] | ||||||
| </pre></div> | </pre></div> | ||||||
| @@ -392,7 +392,7 @@ We will then compare its performance against (1) <code class="code docutils lite | |||||||
| Note however that the PyTorch <cite>softmax</cite> operation is more general and will works on tensors of any shape.</p></li> | Note however that the PyTorch <cite>softmax</cite> operation is more general and will works on tensors of any shape.</p></li> | ||||||
| </ul> | </ul> | ||||||
| </div></blockquote> | </div></blockquote> | ||||||
| <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes  20.551 seconds)</p> | <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes  21.420 seconds)</p> | ||||||
| <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-02-fused-softmax-py"> | <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-02-fused-softmax-py"> | ||||||
| <div class="sphx-glr-download sphx-glr-download-python docutils container"> | <div class="sphx-glr-download sphx-glr-download-python docutils container"> | ||||||
| <p><a class="reference download internal" download="" href="../../_downloads/d91442ac2982c4e0cc3ab0f43534afbc/02-fused-softmax.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">02-fused-softmax.py</span></code></a></p> | <p><a class="reference download internal" download="" href="../../_downloads/d91442ac2982c4e0cc3ab0f43534afbc/02-fused-softmax.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">02-fused-softmax.py</span></code></a></p> | ||||||
|   | |||||||
| @@ -564,42 +564,42 @@ torch_output=tensor([[  1.1045, -36.9688,  31.4688,  ..., -11.3906,  24.4531, -3 | |||||||
| <p class="sphx-glr-script-out">Out:</p> | <p class="sphx-glr-script-out">Out:</p> | ||||||
| <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>matmul-performance: | <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>matmul-performance: | ||||||
|          M     cuBLAS  ...     Triton  Triton (+ LeakyReLU) |          M     cuBLAS  ...     Triton  Triton (+ LeakyReLU) | ||||||
| 0    256.0   2.978909  ...   2.978909              3.276800 | 0    256.0   2.730667  ...   2.978909              2.978909 | ||||||
| 1    384.0   7.372800  ...   8.507077              8.507077 | 1    384.0   7.372800  ...   8.507077              8.507077 | ||||||
| 2    512.0  14.563555  ...  16.384000             16.384000 | 2    512.0  14.563555  ...  16.384000             15.420235 | ||||||
| 3    640.0  22.260869  ...  24.380953             24.380953 | 3    640.0  22.260869  ...  24.380953             24.380953 | ||||||
| 4    768.0  32.768000  ...  34.028308             34.028308 | 4    768.0  32.768000  ...  34.028308             34.028308 | ||||||
| 5    896.0  37.971025  ...  40.140799             39.025776 | 5    896.0  39.025776  ...  40.140799             39.025776 | ||||||
| 6   1024.0  49.932191  ...  52.428801             52.428801 | 6   1024.0  49.932191  ...  52.428801             52.428801 | ||||||
| 7   1152.0  45.242181  ...  46.656000             46.656000 | 7   1152.0  45.242181  ...  46.656000             46.656000 | ||||||
| 8   1280.0  51.200001  ...  56.888887             56.888887 | 8   1280.0  51.200001  ...  56.888887             56.888887 | ||||||
| 9   1408.0  64.138541  ...  67.305878             66.485074 | 9   1408.0  64.138541  ...  67.305878             66.485074 | ||||||
| 10  1536.0  80.430545  ...  79.526831             79.526831 | 10  1536.0  80.430545  ...  79.526831             79.526831 | ||||||
| 11  1664.0  62.929456  ...  62.492442             62.061463 | 11  1664.0  62.929456  ...  62.061463             62.061463 | ||||||
| 12  1792.0  72.512412  ...  72.047592             72.047592 | 12  1792.0  72.512412  ...  72.047592             72.047592 | ||||||
| 13  1920.0  69.120002  ...  70.172588             70.530615 | 13  1920.0  69.467336  ...  70.172588             70.172588 | ||||||
| 14  2048.0  73.584279  ...  76.959706             76.608294 | 14  2048.0  73.262953  ...  76.608294             76.260072 | ||||||
| 15  2176.0  83.155572  ...  86.367588             85.269692 | 15  2176.0  83.155572  ...  85.998493             85.269692 | ||||||
| 16  2304.0  68.446623  ...  77.057651             76.076024 | 16  2304.0  68.251065  ...  76.319081             76.809875 | ||||||
| 17  2432.0  71.305746  ...  84.367759             85.134737 | 17  2432.0  71.487187  ...  74.521127             84.877538 | ||||||
| 18  2560.0  78.019048  ...  80.908642             80.709358 | 18  2560.0  78.019048  ...  81.310171             81.108913 | ||||||
| 19  2688.0  83.369354  ...  89.676257             89.254248 | 19  2688.0  83.369354  ...  89.254248             89.676257 | ||||||
| 20  2816.0  79.733474  ...  83.392363             83.233226 | 20  2816.0  83.392363  ...  82.135981             83.392363 | ||||||
| 21  2944.0  81.967162  ...  82.237674             82.102191 | 21  2944.0  81.967162  ...  81.967162             81.698415 | ||||||
| 22  3072.0  81.825298  ...  88.612060             88.473602 | 22  3072.0  80.659693  ...  88.750943             88.750943 | ||||||
| 23  3200.0  84.880639  ...  95.096582             95.096582 | 23  3200.0  84.432717  ...  94.955488             95.096582 | ||||||
| 24  3328.0  83.808259  ...  84.101981             83.905938 | 24  3328.0  82.939284  ...  84.695641             81.254285 | ||||||
| 25  3456.0  78.655188  ...  85.950501             88.400840 | 25  3456.0  80.300370  ...  82.435141             88.497878 | ||||||
| 26  3584.0  87.296493  ...  97.628001             98.268190 | 26  3584.0  83.177979  ...  88.586589             94.947616 | ||||||
| 27  3712.0  80.757757  ...  88.326564             85.822459 | 27  3712.0  85.675250  ...  89.035062             87.170458 | ||||||
| 28  3840.0  83.027026  ...  91.059692             84.613126 | 28  3840.0  78.433173  ...  84.355978             91.511791 | ||||||
| 29  3968.0  90.724116  ...  87.347124             90.791620 | 29  3968.0  86.849777  ...  90.791620             84.040329 | ||||||
| 30  4096.0  86.313653  ...  86.424811             91.056800 | 30  4096.0  93.077479  ...  86.480498             83.365047 | ||||||
|  |  | ||||||
| [31 rows x 5 columns] | [31 rows x 5 columns] | ||||||
| </pre></div> | </pre></div> | ||||||
| </div> | </div> | ||||||
| <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 5 minutes  56.072 seconds)</p> | <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 6 minutes  1.040 seconds)</p> | ||||||
| <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-03-matrix-multiplication-py"> | <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-03-matrix-multiplication-py"> | ||||||
| <div class="sphx-glr-download sphx-glr-download-python docutils container"> | <div class="sphx-glr-download sphx-glr-download-python docutils container"> | ||||||
| <p><a class="reference download internal" download="" href="../../_downloads/d5fee5b55a64e47f1b5724ec39adf171/03-matrix-multiplication.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">03-matrix-multiplication.py</span></code></a></p> | <p><a class="reference download internal" download="" href="../../_downloads/d5fee5b55a64e47f1b5724ec39adf171/03-matrix-multiplication.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">03-matrix-multiplication.py</span></code></a></p> | ||||||
|   | |||||||
| @@ -194,36 +194,36 @@ to download the full example code</p> | |||||||
| <p class="sphx-glr-script-out">Out:</p> | <p class="sphx-glr-script-out">Out:</p> | ||||||
| <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>layer-norm-backward: | <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>layer-norm-backward: | ||||||
|           N      Triton       Torch        Apex |           N      Triton       Torch        Apex | ||||||
| 0    1024.0  307.200008   99.902435  307.200008 | 0    1024.0  307.200008   98.303995  307.200008 | ||||||
| 1    1536.0  351.085717  135.032961  341.333333 | 1    1536.0  347.773587  134.540150  341.333333 | ||||||
| 2    2048.0  420.102553  162.754967  327.679984 | 2    2048.0  420.102553  161.684218  325.509933 | ||||||
| 3    2560.0  458.507457  182.857144  330.322572 | 3    2560.0  458.507457  181.775141  325.079368 | ||||||
| 4    3072.0  515.580429  191.501303  319.168834 | 4    3072.0  511.999982  192.501302  320.556515 | ||||||
| 5    3584.0  547.872604  207.768111  311.652167 | 5    3584.0  551.384634  208.271186  311.652167 | ||||||
| 6    4096.0  568.231237  221.905193  301.546004 | 6    4096.0  568.231237  220.412561  299.707322 | ||||||
| 7    4608.0  504.986315  232.336141  287.999990 | 7    4608.0  507.302750  232.825259  287.999990 | ||||||
| 8    5120.0  531.948056  242.366855  285.104413 | 8    5120.0  529.655159  242.845844  287.775181 | ||||||
| 9    5632.0  538.517949  243.545956  290.683877 | 9    5632.0  545.032265  243.545956  289.438969 | ||||||
| 10   6144.0  546.133354  250.349744  288.000001 | 10   6144.0  548.163546  248.661056  286.879370 | ||||||
| 11   6656.0  536.053693  256.000009  286.279570 | 11   6656.0  534.260858  256.000009  285.767438 | ||||||
| 12   7168.0  510.480705  252.988236  277.024148 | 12   7168.0  507.469040  260.654538  286.242939 | ||||||
| 13   7680.0  482.513091  267.130429  284.444450 | 13   7680.0  479.999983  262.564106  278.850215 | ||||||
| 14   8192.0  463.698115  269.326017  282.482757 | 14   8192.0  462.607053  267.493874  284.939124 | ||||||
| 15   8704.0  417.791980  264.425310  282.673891 | 15   8704.0  417.791980  267.815384  284.599455 | ||||||
| 16   9216.0  431.157889  274.762727  291.031570 | 16   9216.0  431.157889  272.729961  289.129410 | ||||||
| 17   9728.0  439.683593  281.630872  290.027323 | 17   9728.0  439.683593  280.278512  290.027323 | ||||||
| 18  10240.0  446.025405  286.100109  289.811322 | 18  10240.0  450.109870  286.433562  290.153487 | ||||||
| 19  10752.0  425.120247  246.935876  289.941565 | 19  10752.0  426.525614  247.172406  290.922209 | ||||||
| 20  11264.0  425.056596  243.765566  283.371073 | 20  11264.0  427.071098  245.760001  286.676558 | ||||||
| 21  11776.0  423.089806  249.888595  289.129414 | 21  11776.0  423.089806  249.888595  288.981596 | ||||||
| 22  12288.0  421.302872  254.673582  295.207195 | 22  12288.0  419.504980  254.673582  294.323369 | ||||||
| 23  12800.0  414.574901  254.515329  290.909089 | 23  12800.0  414.016170  253.674644  288.180121 | ||||||
| 24  13312.0  414.381327  253.561895  289.129403 | 24  13312.0  411.181478  252.759501  289.916513 | ||||||
| 25  13824.0  406.090579  257.790206  293.088338 | 25  13824.0  404.112047  257.190689  292.056329 | ||||||
| 26  14336.0  394.116833  256.381525  290.349381 | 26  14336.0  393.215988  254.485198  286.719986 | ||||||
| 27  14848.0  385.245405  255.999999  287.844912 | 27  14848.0  385.245405  257.665934  289.246765 | ||||||
| 28  15360.0  376.932517  262.751252  291.184839 | 28  15360.0  373.874218  257.970599  286.211174 | ||||||
| 29  15872.0  370.913333  262.166551  291.118085 | 29  15872.0  370.913333  261.806182  289.899545 | ||||||
| </pre></div> | </pre></div> | ||||||
| </div> | </div> | ||||||
| <div class="line-block"> | <div class="line-block"> | ||||||
| @@ -487,7 +487,7 @@ to download the full example code</p> | |||||||
| <span class="n">bench_layer_norm</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">save_path</span><span class="o">=</span><span class="s1">'.'</span><span class="p">,</span> <span class="n">print_data</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> | <span class="n">bench_layer_norm</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">save_path</span><span class="o">=</span><span class="s1">'.'</span><span class="p">,</span> <span class="n">print_data</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> | ||||||
| </pre></div> | </pre></div> | ||||||
| </div> | </div> | ||||||
| <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  11.560 seconds)</p> | <p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  12.002 seconds)</p> | ||||||
| <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-05-layer-norm-py"> | <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-getting-started-tutorials-05-layer-norm-py"> | ||||||
| <div class="sphx-glr-download sphx-glr-download-python docutils container"> | <div class="sphx-glr-download sphx-glr-download-python docutils container"> | ||||||
| <p><a class="reference download internal" download="" href="../../_downloads/935c0dd0fbeb4b2e69588471cbb2d4b2/05-layer-norm.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">05-layer-norm.py</span></code></a></p> | <p><a class="reference download internal" download="" href="../../_downloads/935c0dd0fbeb4b2e69588471cbb2d4b2/05-layer-norm.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">05-layer-norm.py</span></code></a></p> | ||||||
|   | |||||||
| @@ -174,7 +174,7 @@ | |||||||
|              |              | ||||||
|   <div class="section" id="computation-times"> |   <div class="section" id="computation-times"> | ||||||
| <span id="sphx-glr-getting-started-tutorials-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1> | <span id="sphx-glr-getting-started-tutorials-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1> | ||||||
| <p><strong>13:04.956</strong> total execution time for <strong>getting-started_tutorials</strong> files:</p> | <p><strong>13:18.417</strong> total execution time for <strong>getting-started_tutorials</strong> files:</p> | ||||||
| <table class="docutils align-default"> | <table class="docutils align-default"> | ||||||
| <colgroup> | <colgroup> | ||||||
| <col style="width: 85%" /> | <col style="width: 85%" /> | ||||||
| @@ -183,19 +183,19 @@ | |||||||
| </colgroup> | </colgroup> | ||||||
| <tbody> | <tbody> | ||||||
| <tr class="row-odd"><td><p><a class="reference internal" href="03-matrix-multiplication.html#sphx-glr-getting-started-tutorials-03-matrix-multiplication-py"><span class="std std-ref">Matrix Multiplication</span></a> (<code class="docutils literal notranslate"><span class="pre">03-matrix-multiplication.py</span></code>)</p></td> | <tr class="row-odd"><td><p><a class="reference internal" href="03-matrix-multiplication.html#sphx-glr-getting-started-tutorials-03-matrix-multiplication-py"><span class="std std-ref">Matrix Multiplication</span></a> (<code class="docutils literal notranslate"><span class="pre">03-matrix-multiplication.py</span></code>)</p></td> | ||||||
| <td><p>05:56.072</p></td> | <td><p>06:01.040</p></td> | ||||||
| <td><p>0.0 MB</p></td> | <td><p>0.0 MB</p></td> | ||||||
| </tr> | </tr> | ||||||
| <tr class="row-even"><td><p><a class="reference internal" href="02-fused-softmax.html#sphx-glr-getting-started-tutorials-02-fused-softmax-py"><span class="std std-ref">Fused Softmax</span></a> (<code class="docutils literal notranslate"><span class="pre">02-fused-softmax.py</span></code>)</p></td> | <tr class="row-even"><td><p><a class="reference internal" href="02-fused-softmax.html#sphx-glr-getting-started-tutorials-02-fused-softmax-py"><span class="std std-ref">Fused Softmax</span></a> (<code class="docutils literal notranslate"><span class="pre">02-fused-softmax.py</span></code>)</p></td> | ||||||
| <td><p>03:20.551</p></td> | <td><p>03:21.420</p></td> | ||||||
| <td><p>0.0 MB</p></td> | <td><p>0.0 MB</p></td> | ||||||
| </tr> | </tr> | ||||||
| <tr class="row-odd"><td><p><a class="reference internal" href="05-layer-norm.html#sphx-glr-getting-started-tutorials-05-layer-norm-py"><span class="std std-ref">Layer Normalization</span></a> (<code class="docutils literal notranslate"><span class="pre">05-layer-norm.py</span></code>)</p></td> | <tr class="row-odd"><td><p><a class="reference internal" href="05-layer-norm.html#sphx-glr-getting-started-tutorials-05-layer-norm-py"><span class="std std-ref">Layer Normalization</span></a> (<code class="docutils literal notranslate"><span class="pre">05-layer-norm.py</span></code>)</p></td> | ||||||
| <td><p>02:11.560</p></td> | <td><p>02:12.002</p></td> | ||||||
| <td><p>0.0 MB</p></td> | <td><p>0.0 MB</p></td> | ||||||
| </tr> | </tr> | ||||||
| <tr class="row-even"><td><p><a class="reference internal" href="01-vector-add.html#sphx-glr-getting-started-tutorials-01-vector-add-py"><span class="std std-ref">Vector Addition</span></a> (<code class="docutils literal notranslate"><span class="pre">01-vector-add.py</span></code>)</p></td> | <tr class="row-even"><td><p><a class="reference internal" href="01-vector-add.html#sphx-glr-getting-started-tutorials-01-vector-add-py"><span class="std std-ref">Vector Addition</span></a> (<code class="docutils literal notranslate"><span class="pre">01-vector-add.py</span></code>)</p></td> | ||||||
| <td><p>01:36.291</p></td> | <td><p>01:43.473</p></td> | ||||||
| <td><p>0.0 MB</p></td> | <td><p>0.0 MB</p></td> | ||||||
| </tr> | </tr> | ||||||
| <tr class="row-odd"><td><p><a class="reference internal" href="04-low-memory-dropout.html#sphx-glr-getting-started-tutorials-04-low-memory-dropout-py"><span class="std std-ref">Low-Memory Dropout</span></a> (<code class="docutils literal notranslate"><span class="pre">04-low-memory-dropout.py</span></code>)</p></td> | <tr class="row-odd"><td><p><a class="reference internal" href="04-low-memory-dropout.html#sphx-glr-getting-started-tutorials-04-low-memory-dropout-py"><span class="std std-ref">Low-Memory Dropout</span></a> (<code class="docutils literal notranslate"><span class="pre">04-low-memory-dropout.py</span></code>)</p></td> | ||||||
|   | |||||||
| @@ -1,4 +1,4 @@ | |||||||
| # Sphinx build info version 1 | # Sphinx build info version 1 | ||||||
| # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||||||
| config: 68db870c73f557363c04552ac96820c6 | config: 230ac3d7f462d25f7d66d70d6722c182 | ||||||
| tags: 645f666f9bcd5a90fca523b33c5a78b7 | tags: 645f666f9bcd5a90fca523b33c5a78b7 | ||||||
|   | |||||||