[GH-PAGES] Updated website

2022-07-14 07:22:19 +00:00
parent 3e815114fd
commit d1c6625bfd
179 changed files with 2617 additions and 369 deletions
--- a/master/_sources/getting-started/tutorials/07-libdevice-function.rst.txt
+++ b/master/_sources/getting-started/tutorials/07-libdevice-function.rst.txt
@@ -0,0 +1,183 @@
+
+.. DO NOT EDIT.
+.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
+.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
+.. "getting-started/tutorials/07-libdevice-function.py"
+.. LINE NUMBERS ARE GIVEN BELOW.
+
+.. only:: html
+
+    .. note::
+        :class: sphx-glr-download-link-note
+
+        Click :ref:`here <sphx_glr_download_getting-started_tutorials_07-libdevice-function.py>`
+        to download the full example code
+
+.. rst-class:: sphx-glr-example-title
+
+.. _sphx_glr_getting-started_tutorials_07-libdevice-function.py:
+
+
+Libdevice function
+===============
+Triton can invoke a custom function from an external library.
+In this example, we will use the `libdevice` library to apply `asin` on a tensor.
+Please refer to https://docs.nvidia.com/cuda/libdevice-users-guide/index.html regarding the semantics of all available libdevice functions.
+
+In `trition/language/libdevice.py`, we try to aggregate functions with the same computation but different data types together.
+For example, both `__nv_asin` and `__nvasinf` calculate the principal value of the arc sine of the input, but `__nv_asin` operates on `double` and `__nv_asinf` operates on `float`.
+Using triton, you can simply call `tl.libdevice.asinf`.
+triton automatically selects the correct underlying device function to invoke based on input and output types.
+
+.. GENERATED FROM PYTHON SOURCE LINES 15-17
+
+asin Kernel
+--------------------------
+
+.. GENERATED FROM PYTHON SOURCE LINES 17-39
+
+.. code-block:: default
+
+
+    import torch
+
+    import triton
+    import triton.language as tl
+
+
+    @triton.jit
+    def asin_kernel(
+        x_ptr,
+        y_ptr,
+        n_elements,
+        BLOCK_SIZE: tl.constexpr,
+    ):
+        pid = tl.program_id(axis=0)
+        block_start = pid * BLOCK_SIZE
+        offsets = block_start + tl.arange(0, BLOCK_SIZE)
+        mask = offsets < n_elements
+        x = tl.load(x_ptr + offsets, mask=mask)
+        x = tl.libdevice.asin(x)
+        tl.store(y_ptr + offsets, x, mask=mask)
+
+
+
+
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 40-43
+
+Using the default libdevice library path
+--------------------------
+We can use the default libdevice library path encoded in `triton/language/libdevice.py`
+
+.. GENERATED FROM PYTHON SOURCE LINES 43-61
+
+.. code-block:: default
+
+
+
+    torch.manual_seed(0)
+    size = 98432
+    x = torch.rand(size, device='cuda')
+    output_triton = torch.zeros(size, device='cuda')
+    output_torch = torch.asin(x)
+    assert x.is_cuda and output_triton.is_cuda
+    n_elements = output_torch.numel()
+    grid = lambda meta: (triton.cdiv(n_elements, meta['BLOCK_SIZE']),)
+    asin_kernel[grid](x, output_triton, n_elements, BLOCK_SIZE=1024)
+    print(output_torch)
+    print(output_triton)
+    print(
+        f'The maximum difference between torch and triton is '
+        f'{torch.max(torch.abs(output_torch - output_triton))}'
+    )
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    tensor([0.4105, 0.5430, 0.0249,  ..., 0.0424, 0.5351, 0.8149], device='cuda:0')
+    tensor([0.4105, 0.5430, 0.0249,  ..., 0.0424, 0.5351, 0.8149], device='cuda:0')
+    The maximum difference between torch and triton is 2.384185791015625e-07
+
+
+
+
+.. GENERATED FROM PYTHON SOURCE LINES 62-65
+
+Customize the libdevice library path
+--------------------------
+We can also customize the libdevice library path by passing the path to the `libdevice` library to the `asin` kernel.
+
+.. GENERATED FROM PYTHON SOURCE LINES 65-75
+
+.. code-block:: default
+
+
+    output_triton = torch.empty_like(x)
+    asin_kernel[grid](x, output_triton, n_elements, BLOCK_SIZE=1024,
+                      extern_libs={'libdevice': '/usr/local/cuda/nvvm/libdevice/libdevice.10.bc'})
+    print(output_torch)
+    print(output_triton)
+    print(
+        f'The maximum difference between torch and triton is '
+        f'{torch.max(torch.abs(output_torch - output_triton))}'
+    )
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    tensor([0.4105, 0.5430, 0.0249,  ..., 0.0424, 0.5351, 0.8149], device='cuda:0')
+    tensor([0.4105, 0.5430, 0.0249,  ..., 0.0424, 0.5351, 0.8149], device='cuda:0')
+    The maximum difference between torch and triton is 2.384185791015625e-07
+
+
+
+
+
+.. rst-class:: sphx-glr-timing
+
+   **Total running time of the script:** ( 0 minutes  0.501 seconds)
+
+
+.. _sphx_glr_download_getting-started_tutorials_07-libdevice-function.py:
+
+
+.. only :: html
+
+ .. container:: sphx-glr-footer
+    :class: sphx-glr-footer-example
+
+
+
+  .. container:: sphx-glr-download sphx-glr-download-python
+
+     :download:`Download Python source code: 07-libdevice-function.py <07-libdevice-function.py>`
+
+
+
+  .. container:: sphx-glr-download sphx-glr-download-jupyter
+
+     :download:`Download Jupyter notebook: 07-libdevice-function.ipynb <07-libdevice-function.ipynb>`
+
+
+.. only:: html
+
+ .. rst-class:: sphx-glr-signature
+
+    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_