Files
triton/master/.doctrees/getting-started/tutorials/07-libdevice-function.doctree

151 lines
22 KiB
Plaintext
Raw Normal View History

2022-07-14 07:22:19 +00:00
<EFBFBD><05><>W<00>sphinx.addnodes<65><73>document<6E><74><EFBFBD>)<29><>}<7D>(<28> rawsource<63><65><00><>children<65>]<5D>(<28>docutils.nodes<65><73>comment<6E><74><EFBFBD>)<29><>}<7D>(h<05> DO NOT EDIT.<2E>h]<5D>h <09>Text<78><74><EFBFBD><EFBFBD> DO NOT EDIT.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<06>parent<6E>h uba<62>
attributes<EFBFBD>}<7D>(<28>ids<64>]<5D><>classes<65>]<5D><>names<65>]<5D><>dupnames<65>]<5D><>backrefs<66>]<5D><> xml:space<63><65>preserve<76>u<EFBFBD>tagname<6D>h
2022-08-04 00:49:04 +00:00
hhhh<03>source<63><65>r/tmp/tmpv9mhi1e6/7b91c7befd5df91603afe3d61b6d3823e36d3f7a/docs/getting-started/tutorials/07-libdevice-function.rst<73><74>line<6E>Kubh )<29><>}<7D>(h<05>8THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.<2E>h]<5D>h<11>8THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhh)ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
2022-07-14 07:22:19 +00:00
hhhhh&h'h(Kubh )<29><>}<7D>(h<05>-TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:<3A>h]<5D>h<11>-TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhh7ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hhhhh&h'h(Kubh )<29><>}<7D>(h<05>4"getting-started/tutorials/07-libdevice-function.py"<22>h]<5D>h<11>4"getting-started/tutorials/07-libdevice-function.py"<22><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhhEubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hhhhh&h'h(Kubh )<29><>}<7D>(h<05>LINE NUMBERS ARE GIVEN BELOW.<2E>h]<5D>h<11>LINE NUMBERS ARE GIVEN BELOW.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhhSubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hhhhh&h'h(Kubh<00>only<6C><79><EFBFBD>)<29><>}<7D>(hhh]<5D>h <09>note<74><65><EFBFBD>)<29><>}<7D>(h<05>zClick :ref:`here <sphx_glr_download_getting-started_tutorials_07-libdevice-function.py>`
to download the full example code<64>h]<5D>h <09> paragraph<70><68><EFBFBD>)<29><>}<7D>(h<05>zClick :ref:`here <sphx_glr_download_getting-started_tutorials_07-libdevice-function.py>`
to download the full example code<64>h]<5D>(h<11>Click <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>Click <20>hhnubh<00> pending_xref<65><66><EFBFBD>)<29><>}<7D>(h<05>R:ref:`here <sphx_glr_download_getting-started_tutorials_07-libdevice-function.py>`<60>h]<5D>h <09>inline<6E><65><EFBFBD>)<29><>}<7D>(hh{h]<5D>h<11>here<72><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhhubah}<7D>(h]<5D>h]<5D>(<28>xref<65><66>std<74><64>std-ref<65>eh]<5D>h]<5D>h!]<5D>uh%h}hhyubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refdoc<6F><63>/getting-started/tutorials/07-libdevice-function<6F><6E> refdomain<69>h<EFBFBD><68>reftype<70><65>ref<65><66> refexplicit<69><74><EFBFBD>refwarn<72><6E><EFBFBD> reftarget<65><74>Dsphx_glr_download_getting-started_tutorials_07-libdevice-function.py<70>uh%hwh&h'h(K hhnubh<11>"
to download the full example code<64><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>"
to download the full example code<64>hhnubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K hhhubah}<7D>(h]<5D>h]<5D><>sphx-glr-download-link-note<74>ah]<5D>h]<5D>h!]<5D>uh%hfhhchhh&h'h(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>expr<70><72>html<6D>uh%hahhh&h'h(Khhubh <09>target<65><74><EFBFBD>)<29><>}<7D>(h<05>@.. _sphx_glr_getting-started_tutorials_07-libdevice-function.py:<3A>h]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refid<69><64>;sphx-glr-getting-started-tutorials-07-libdevice-function-py<70>uh%h<>h(Khhhhh&h'ubh <09>section<6F><6E><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09>title<6C><65><EFBFBD>)<29><>}<7D>(h<05>Libdevice function<6F>h]<5D>h<11>Libdevice function<6F><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hh<>hhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(Kubhm)<29><>}<7D>(hXTriton can invoke a custom function from an external library.
In this example, we will use the `libdevice` library to apply `asin` on a tensor.
Please refer to https://docs.nvidia.com/cuda/libdevice-users-guide/index.html regarding the semantics of all available libdevice functions.<2E>h]<5D>(h<11>_Triton can invoke a custom function from an external library.
In this example, we will use the <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>_Triton can invoke a custom function from an external library.
In this example, we will use the <20>hh<>hhh&Nh(Nubh <09>title_reference<63><65><EFBFBD>)<29><>}<7D>(h<05> `libdevice`<60>h]<5D>h<11> libdevice<63><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhh<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hh<>ubh<11> library to apply <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05> library to apply <20>hh<>hhh&Nh(Nubh<62>)<29><>}<7D>(h<05>`asin`<60>h]<5D>h<11>asin<69><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhh<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hh<>ubh<11> on a tensor.
Please refer to <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05> on a tensor.
Please refer to <20>hh<>hhh&Nh(Nubh <09> reference<63><65><EFBFBD>)<29><>}<7D>(h<05>=https://docs.nvidia.com/cuda/libdevice-users-guide/index.html<6D>h]<5D>h<11>=https://docs.nvidia.com/cuda/libdevice-users-guide/index.html<6D><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refuri<72>juh%j hh<>ubh<11>> regarding the semantics of all available libdevice functions.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>> regarding the semantics of all available libdevice functions.<2E>hh<>hhh&Nh(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Khh<>hhubhm)<29><>}<7D>(hX<>In `trition/language/libdevice.py`, we try to aggregate functions with the same computation but different data types together.
For example, both `__nv_asin` and `__nvasinf` calculate the principal value of the arc sine of the input, but `__nv_asin` operates on `double` and `__nv_asinf` operates on `float`.
Using triton, you can simply call `tl.libdevice.asinf`.
triton automatically selects the correct underlying device function to invoke based on input and output types.<2E>h]<5D>(h<11>In <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>In <20>hj(hhh&Nh(Nubh<62>)<29><>}<7D>(h<05>`trition/language/libdevice.py`<60>h]<5D>h<11>trition/language/libdevice.py<70><79><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj1ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj(ubh<11>o, we try to aggregate functions with the same computation but different data types together.
For example, both <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>o, we try to aggregate functions with the same computation but different data types together.
For example, both <20>hj(hhh&Nh(Nubh<62>)<29><>}<7D>(h<05> `__nv_asin`<60>h]<5D>h<11> __nv_asin<69><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjDubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj(ubh<11> and <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05> and <20>hj(hhh&Nh(Nubh<62>)<29><>}<7D>(h<05> `__nvasinf`<60>h]<5D>h<11> __nvasinf<6E><66><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjWubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj(ubh<11>A calculate the principal value of the arc sine of the input, but <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>A calculate the principal value of the arc sine of the input, but <20>hj(hhh&Nh(Nubh<62>)<29><>}<7D>(h<05> `__nv_asin`<60>h]<5D>h<11> __nv_asin<69><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj(ubh<11> operates on <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05> operates on <20>hj(hhh&Nh(Nubh<62>)<29><>}<7D>(h<05>`double`<60>h]<5D>h<11>double<6C><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj}ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj(ubh<11> and <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjVhj(ubh<62>)<29><>}<7D>(h<05> `__nv_asinf`<60>h]<5D>h<11>
__nv_asinf<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj(ubh<11> operates on <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj|hj(ubh<62>)<29><>}<7D>(h<05>`float`<60>h]<5D>h<11>float<61><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj(ubh<11>$.
Using triton, you can simply call <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>$.
Using triton, you can simply call <20>hj(hhh&Nh(Nubh<62>)<29><>}<7D>(h<05>`tl.libdevice.asinf`<60>h]<5D>h<11>tl.libdevice.asinf<6E><66><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj(ubh<11>p.
triton automatically selects the correct underlying device function to invoke based on input and output types.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>p.
triton automatically selects the correct underlying device function to invoke based on input and output types.<2E>hj(hhh&Nh(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Khh<>hhubh )<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 15-17<31>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 15-17<31><37><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hh<>hhh&h'h(K!ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05> asin Kernel<65>h]<5D>h<11> asin Kernel<65><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hj<>hhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj<>hhh&h'h(K#ubh )<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 17-39<33>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 17-39<33><39><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hj<>hhh&h'h(K&ubh <09> literal_block<63><6B><EFBFBD>)<29><>}<7D>(hX<>import torch
import triton
import triton.language as tl
@triton.jit
def asin_kernel(
x_ptr,
y_ptr,
n_elements,
BLOCK_SIZE: tl.constexpr,
):
pid = tl.program_id(axis=0)
block_start = pid * BLOCK_SIZE
offsets = block_start + tl.arange(0, BLOCK_SIZE)
mask = offsets < n_elements
x = tl.load(x_ptr + offsets, mask=mask)
x = tl.libdevice.asin(x)
tl.store(y_ptr + offsets, x, mask=mask)<29>h]<5D>hX<>import torch
import triton
import triton.language as tl
@triton.jit
def asin_kernel(
x_ptr,
y_ptr,
n_elements,
BLOCK_SIZE: tl.constexpr,
):
pid = tl.program_id(axis=0)
block_start = pid * BLOCK_SIZE
offsets = block_start + tl.arange(0, BLOCK_SIZE)
mask = offsets < n_elements
x = tl.load(x_ptr + offsets, mask=mask)
x = tl.libdevice.asin(x)
tl.store(y_ptr + offsets, x, mask=mask)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$<24>force<63><65><EFBFBD>language<67><65>default<6C><74>highlight_args<67>}<7D>uh%j<>h&h'h(K'hj<>hhubh )<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 40-43<34>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 40-43<34><33><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hj<>hhh&h'h(KGubeh}<7D>(h]<5D><> asin-kernel<65>ah]<5D>h]<5D><> asin kernel<65>ah]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(K#ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>(Using the default libdevice library path<74>h]<5D>h<11>(Using the default libdevice library path<74><68><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj*hj(hhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj%hhh&h'h(KIubhm)<29><>}<7D>(h<05>WWe can use the default libdevice library path encoded in `triton/language/libdevice.py`<60>h]<5D>(h<11>9We can use the default libdevice library path encoded in <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>9We can use the default libdevice library path encoded in <20>hj6hhh&Nh(Nubh<62>)<29><>}<7D>(h<05>`triton/language/libdevice.py`<60>h]<5D>h<11>triton/language/libdevice.py<70><79><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj?ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj6ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(KJhj%hhubh )<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 43-61<36>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 43-61<36><31><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjSubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hj%hhh&h'h(KMubj<62>)<29><>}<7D>(hX torch.manual_seed(0)
size = 98432
x = torch.rand(size, device='cuda')
output_triton = torch.zeros(size, device='cuda')
output_torch = torch.asin(x)
assert x.is_cuda and output_triton.is_cuda
n_elements = output_torch.numel()
grid = lambda meta: (triton.cdiv(n_elements, meta['BLOCK_SIZE']),)
asin_kernel[grid](x, output_triton, n_elements, BLOCK_SIZE=1024)
print(output_torch)
print(output_triton)
print(
f'The maximum difference between torch and triton is '
f'{torch.max(torch.abs(output_torch - output_triton))}'
)<29>h]<5D>hX torch.manual_seed(0)
size = 98432
x = torch.rand(size, device='cuda')
output_triton = torch.zeros(size, device='cuda')
output_torch = torch.asin(x)
assert x.is_cuda and output_triton.is_cuda
n_elements = output_torch.numel()
grid = lambda meta: (triton.cdiv(n_elements, meta['BLOCK_SIZE']),)
asin_kernel[grid](x, output_triton, n_elements, BLOCK_SIZE=1024)
print(output_torch)
print(output_triton)
print(
f'The maximum difference between torch and triton is '
f'{torch.max(torch.abs(output_torch - output_triton))}'
)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjaubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$j
<00>j <00>default<6C>j }<7D>uh%j<>h&h'h(KNhj%hhubhm)<29><>}<7D>(h<05>Out:<3A>h]<5D>h<11>Out:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjshjqhhh&Nh(Nubah}<7D>(h]<5D>h]<5D><>sphx-glr-script-out<75>ah]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Khhj%hhubj<62>)<29><>}<7D>(h<05><>tensor([0.4105, 0.5430, 0.0249, ..., 0.0424, 0.5351, 0.8149], device='cuda:0')
tensor([0.4105, 0.5430, 0.0249, ..., 0.0424, 0.5351, 0.8149], device='cuda:0')
The maximum difference between torch and triton is 2.384185791015625e-07<30>h]<5D>h<11><>tensor([0.4105, 0.5430, 0.0249, ..., 0.0424, 0.5351, 0.8149], device='cuda:0')
tensor([0.4105, 0.5430, 0.0249, ..., 0.0424, 0.5351, 0.8149], device='cuda:0')
The maximum difference between torch and triton is 2.384185791015625e-07<30><37><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>j|ah]<5D>h]<5D>h!]<5D>h#h$j
<00>j <00>none<6E>j }<7D>uh%j<>h&h'h(Kjhj%hhubh )<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 62-65<36>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 62-65<36><35><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hj%hhh&h'h(Ktubeh}<7D>(h]<5D><>(using-the-default-libdevice-library-path<74>ah]<5D>h]<5D><>(using the default libdevice library path<74>ah]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(KIubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>$Customize the libdevice library path<74>h]<5D>h<11>$Customize the libdevice library path<74><68><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hj<>hhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj<>hhh&h'h(Kvubhm)<29><>}<7D>(h<05>uWe can also customize the libdevice library path by passing the path to the `libdevice` library to the `asin` kernel.<2E>h]<5D>(h<11>LWe can also customize the libdevice library path by passing the path to the <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>LWe can also customize the libdevice library path by passing the path to the <20>hj<>hhh&Nh(Nubh<62>)<29><>}<7D>(h<05> `libdevice`<60>h]<5D>h<11> libdevice<63><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj<>ubh<11> library to the <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05> library to the <20>hj<>hhh&Nh(Nubh<62>)<29><>}<7D>(h<05>`asin`<60>h]<5D>h<11>asin<69><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj<>ubh<11> kernel.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05> kernel.<2E>hj<>hhh&Nh(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Kwhj<>hhubh )<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 65-75<37>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 65-75<37><35><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hj<>hhh&h'h(Kzubj<62>)<29><>}<7D>(hXloutput_triton = torch.empty_like(x)
asin_kernel[grid](x, output_triton, n_elements, BLOCK_SIZE=1024,
extern_libs={'libdevice': '/usr/local/cuda/nvvm/libdevice/libdevice.10.bc'})
print(output_torch)
print(output_triton)
print(
f'The maximum difference between torch and triton is '
f'{torch.max(torch.abs(output_torch - output_triton))}'
)<29>h]<5D>hXloutput_triton = torch.empty_like(x)
asin_kernel[grid](x, output_triton, n_elements, BLOCK_SIZE=1024,
extern_libs={'libdevice': '/usr/local/cuda/nvvm/libdevice/libdevice.10.bc'})
print(output_torch)
print(output_triton)
print(
f'The maximum difference between torch and triton is '
f'{torch.max(torch.abs(output_torch - output_triton))}'
)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$j
<00>j <00>default<6C>j }<7D>uh%j<>h&h'h(K{hj<>hhubhm)<29><>}<7D>(h<05>Out:<3A>h]<5D>h<11>Out:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hj
hhh&Nh(Nubah}<7D>(h]<5D>h]<5D><>sphx-glr-script-out<75>ah]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj<>hhubj<62>)<29><>}<7D>(h<05><>tensor([0.4105, 0.5430, 0.0249, ..., 0.0424, 0.5351, 0.8149], device='cuda:0')
tensor([0.4105, 0.5430, 0.0249, ..., 0.0424, 0.5351, 0.8149], device='cuda:0')
The maximum difference between torch and triton is 2.384185791015625e-07<30>h]<5D>h<11><>tensor([0.4105, 0.5430, 0.0249, ..., 0.0424, 0.5351, 0.8149], device='cuda:0')
tensor([0.4105, 0.5430, 0.0249, ..., 0.0424, 0.5351, 0.8149], device='cuda:0')
The maximum difference between torch and triton is 2.384185791015625e-07<30><37><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjubah}<7D>(h]<5D>h]<5D>jah]<5D>h]<5D>h!]<5D>h#h$j
2022-08-04 00:49:04 +00:00
<00>j <00>none<6E>j }<7D>uh%j<>h&h'h(K<>hj<>hhubhm)<29><>}<7D>(h<05>A**Total running time of the script:** ( 0 minutes 0.010 seconds)<29>h]<5D>(h <09>strong<6E><67><EFBFBD>)<29><>}<7D>(h<05>%**Total running time of the script:**<2A>h]<5D>h<11>!Total running time of the script:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj/ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%j-hj)ubh<11> ( 0 minutes 0.010 seconds)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05> ( 0 minutes 0.010 seconds)<29>hj)hhh&Nh(Nubeh}<7D>(h]<5D>h]<5D><>sphx-glr-timing<6E>ah]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj<>hhubh<62>)<29><>}<7D>(h<05>I.. _sphx_glr_download_getting-started_tutorials_07-libdevice-function.py:<3A>h]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>hČDsphx-glr-download-getting-started-tutorials-07-libdevice-function-py<70>uh%h<>h(K<>hj<>hhh&h'ubhb)<29><>}<7D>(hhh]<5D>h <09> container<65><72><EFBFBD>)<29><>}<7D>(hX=.. container:: sphx-glr-download sphx-glr-download-python
2022-07-14 07:22:19 +00:00
:download:`Download Python source code: 07-libdevice-function.py <07-libdevice-function.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: 07-libdevice-function.ipynb <07-libdevice-function.ipynb>`<60>h]<5D>(jX)<29><>}<7D>(h<05>\:download:`Download Python source code: 07-libdevice-function.py <07-libdevice-function.py>`<60>h]<5D>hm)<29><>}<7D>(hj_h]<5D>h<00>download_reference<63><65><EFBFBD>)<29><>}<7D>(hj_h]<5D>h <09>literal<61><6C><EFBFBD>)<29><>}<7D>(hj_h]<5D>h<11>5Download Python source code: 07-libdevice-function.py<70><79><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjkubah}<7D>(h]<5D>h]<5D>(h<><68>download<61>eh]<5D>h]<5D>h!]<5D>uh%jihjfubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refdoc<6F>h<EFBFBD><68> refdomain<69>h<06>reftype<70>ju<00> refexplicit<69><74><EFBFBD>refwarn<72><6E>h<EFBFBD><68>07-libdevice-function.py<70><79>filename<6D><65>93ff29f967ace7985da24aab10352fc76/07-libdevice-function.py<70>uh%jdh&h'h(K<>hjaubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj]ubah}<7D>(h]<5D>h]<5D>(<28>sphx-glr-download<61><64>sphx-glr-download-python<6F>eh]<5D>h]<5D>h!]<5D>uh%jWhjYubjX)<29><>}<7D>(h<05>`:download:`Download Jupyter notebook: 07-libdevice-function.ipynb <07-libdevice-function.ipynb>`<60>h]<5D>hm)<29><>}<7D>(hj<>h]<5D>je)<29><>}<7D>(hj<>h]<5D>jj)<29><>}<7D>(hj<>h]<5D>h<11>6Download Jupyter notebook: 07-libdevice-function.ipynb<6E><62><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>(h<><68>download<61>eh]<5D>h]<5D>h!]<5D>uh%jihj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refdoc<6F>h<EFBFBD><68> refdomain<69>h<06>reftype<70>j<EFBFBD><00> refexplicit<69><74><EFBFBD>refwarn<72><6E>h<EFBFBD><68>07-libdevice-function.ipynb<6E>j<EFBFBD><00><1bc2e471d2fb0ec017c4d1d0890db4e2/07-libdevice-function.ipynb<6E>uh%jdh&h'h(K<>hj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj<>ubah}<7D>(h]<5D>h]<5D>(<28>sphx-glr-download<61><64>sphx-glr-download-jupyter<65>eh]<5D>h]<5D>h!]<5D>uh%jWhjYubeh}<7D>(h]<5D>h]<5D>(<28>sphx-glr-footer<65><72>class<73><73>sphx-glr-footer-example<6C>eh]<5D>h]<5D>h!]<5D>uh%jWhjThhh&Nh(Nubah}<7D>(h]<5D>jSah]<5D>h]<5D><>Dsphx_glr_download_getting-started_tutorials_07-libdevice-function.py<70>ah]<5D>h!]<5D>h<EFBFBD><68>html<6D>uh%hahhh&h'h(K<>hj<><00>expect_referenced_by_name<6D>}<7D>j<EFBFBD>jIs<>expect_referenced_by_id<69>}<7D>jSjIsubhb)<29><>}<7D>(hhh]<5D>hm)<29><>}<7D>(h<05>I`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_<>h]<5D>(j )<29><>}<7D>(hj<>h]<5D>h<11>#Gallery generated by Sphinx-Gallery<72><79><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>#Gallery generated by Sphinx-Gallery<72>hj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>name<6D><65>#Gallery generated by Sphinx-Gallery<72><79>refuri<72><69> https://sphinx-gallery.github.io<69>uh%j hj<>ubh<62>)<29><>}<7D>(h<05># <https://sphinx-gallery.github.io><3E>h]<5D>h}<7D>(h]<5D><>#gallery-generated-by-sphinx-gallery<72>ah]<5D>h]<5D><>#gallery generated by sphinx-gallery<72>ah]<5D>h!]<5D><>refuri<72>j<EFBFBD>uh%h<><68>
referenced<EFBFBD>Khj<>ubeh}<7D>(h]<5D>h]<5D><>sphx-glr-signature<72>ah]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj<>hhubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h<EFBFBD><68>html<6D>uh%hahhh&h'h(K<>hj<>ubeh}<7D>(h]<5D><>$customize-the-libdevice-library-path<74>ah]<5D>h]<5D><>$customize the libdevice library path<74>ah]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(Kvubeh}<7D>(h]<5D>(<28>libdevice-function<6F>h<EFBFBD>eh]<5D><>sphx-glr-example-title<6C>ah]<5D>(<28>libdevice function<6F><6E>;sphx_glr_getting-started_tutorials_07-libdevice-function.py<70>eh]<5D>h!]<5D>uh%h<>hhhhh&h'h(Kj<>}<7D>j!h<>sj<73>}<7D>h<EFBFBD>h<EFBFBD>subeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>source<63>h'uh%h<01>current_source<63>N<EFBFBD> current_line<6E>N<EFBFBD>settings<67><73>docutils.frontend<6E><64>Values<65><73><EFBFBD>)<29><>}<7D>(h<>N<EFBFBD> generator<6F>N<EFBFBD> datestamp<6D>N<EFBFBD> source_link<6E>N<EFBFBD>
source_url<EFBFBD>N<EFBFBD> toc_backlinks<6B><73>entry<72><79>footnote_backlinks<6B>K<01> sectnum_xform<72>K<01>strip_comments<74>N<EFBFBD>strip_elements_with_classes<65>N<EFBFBD> strip_classes<65>N<EFBFBD> report_level<65>K<02>
halt_level<EFBFBD>K<05>exit_status_level<65>K<05>debug<75>N<EFBFBD>warning_stream<61>N<EFBFBD> traceback<63><6B><EFBFBD>input_encoding<6E><67> utf-8-sig<69><67>input_encoding_error_handler<65><72>strict<63><74>output_encoding<6E><67>utf-8<><38>output_encoding_error_handler<65>jI<00>error_encoding<6E><67>utf-8<><38>error_encoding_error_handler<65><72>backslashreplace<63><65> language_code<64><65>en<65><6E>record_dependencies<65>N<EFBFBD>config<69>N<EFBFBD> id_prefix<69>h<06>auto_id_prefix<69><78>id<69><64> dump_settings<67>N<EFBFBD>dump_internals<6C>N<EFBFBD>dump_transforms<6D>N<EFBFBD>dump_pseudo_xml<6D>N<EFBFBD>expose_internals<6C>N<EFBFBD>strict_visitor<6F>N<EFBFBD>_disable_config<69>N<EFBFBD>_source<63>h'<27> _destination<6F>N<EFBFBD> _config_files<65>]<5D><>pep_references<65>N<EFBFBD> pep_base_url<72><6C> https://www.python.org/dev/peps/<2F><>pep_file_url_template<74><65>pep-%04d<34><64>rfc_references<65>N<EFBFBD> rfc_base_url<72><6C>https://tools.ietf.org/html/<2F><> tab_width<74>K<08>trim_footnote_reference_space<63><65><EFBFBD>file_insertion_enabled<65><64><EFBFBD> raw_enabled<65>K<01>syntax_highlight<68><74>long<6E><67> smart_quotes<65><73><EFBFBD>smartquotes_locales<65>]<5D><>character_level_inline_markup<75><70><EFBFBD>doctitle_xform<72><6D><EFBFBD> docinfo_xform<72>K<01>sectsubtitle_xform<72><6D><EFBFBD>embed_stylesheet<65><74><EFBFBD>cloak_email_addresses<65><73><EFBFBD>env<6E>Nub<75>reporter<65>N<EFBFBD>indirect_targets<74>]<5D><>substitution_defs<66>}<7D><>substitution_names<65>}<7D><>refnames<65>}<7D><>refids<64>}<7D>(h<>]<5D>h<EFBFBD>ajS]<5D>jIau<61>nameids<64>}<7D>(j!h<>j jj"jj<>j<>jjj<>jSj<>j<>u<> nametypes<65>}<7D>(j!<00>j Nj"Nj<4E>NjNj<4E><00>j<EFBFBD><00>uh}<7D>(h<>h<EFBFBD>jh<>jj<>j<>j%jj<>jSjTj<>j<>u<> footnote_refs<66>}<7D><> citation_refs<66>}<7D><> autofootnotes<65>]<5D><>autofootnote_refs<66>]<5D><>symbol_footnotes<65>]<5D><>symbol_footnote_refs<66>]<5D><> footnotes<65>]<5D><> citations<6E>]<5D><>autofootnote_start<72>K<01>symbol_footnote_start<72>K<00>
id_counter<EFBFBD><EFBFBD> collections<6E><73>Counter<65><72><EFBFBD>}<7D><><EFBFBD>R<EFBFBD><52>parse_messages<65>]<5D>(h <09>system_message<67><65><EFBFBD>)<29><>}<7D>(hhh]<5D>(hm)<29><>}<7D>(h<05>Title underline too short.<2E>h]<5D>h<11>Title underline too short.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhj<>ubj<62>)<29><>}<7D>(h<05>"Libdevice function
===============<3D>h]<5D>h<11>"Libdevice function
===============<3D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%j<>hj<>h&h'ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>level<65>K<02>type<70><65>WARNING<4E><47>line<6E>K<16>source<63>h'uh%j<>hh<>hhh&h'h(Kubj<62>)<29><>}<7D>(hhh]<5D>(hm)<29><>}<7D>(hhh]<5D>h<11>Title underline too short.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhj<>ubj<62>)<29><>}<7D>(h<05>CUsing the default libdevice library path
--------------------------<2D>h]<5D>h<11>CUsing the default libdevice library path
--------------------------<2D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%j<>hj<>ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>level<65>K<02>type<70>j<EFBFBD><00>line<6E>KI<4B>source<63>h'uh%j<>ubj<62>)<29><>}<7D>(hhh]<5D>(hm)<29><>}<7D>(h<05>Title underline too short.<2E>h]<5D>h<11>Title underline too short.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhj<>ubj<62>)<29><>}<7D>(h<05>CUsing the default libdevice library path
--------------------------<2D>h]<5D>h<11>CUsing the default libdevice library path
--------------------------<2D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%j<>hj<>h&h'ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>level<65>K<02>type<70>j<EFBFBD><00>line<6E>KI<4B>source<63>h'uh%j<>hj%hhh&h'h(KIubj<62>)<29><>}<7D>(hhh]<5D>(hm)<29><>}<7D>(hhh]<5D>h<11>Title underline too short.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj(ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhj%ubj<62>)<29><>}<7D>(h<05>?Customize the libdevice library path
--------------------------<2D>h]<5D>h<11>?Customize the libdevice library path
--------------------------<2D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj5ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%j<>hj%ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>level<65>K<02>type<70>j<EFBFBD><00>line<6E>Kv<4B>source<63>h'uh%j<>ubj<62>)<29><>}<7D>(hhh]<5D>(hm)<29><>}<7D>(h<05>Title underline too short.<2E>h]<5D>h<11>Title underline too short.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjPubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhjMubj<62>)<29><>}<7D>(h<05>?Customize the libdevice library path
--------------------------<2D>h]<5D>h<11>?Customize the libdevice library path
--------------------------<2D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj^ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%j<>hjMh&h'ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>level<65>K<02>type<70>j<EFBFBD><00>line<6E>Kv<4B>source<63>h'uh%j<>hj<>hhh&h'h(Kvube<62>transform_messages<65>]<5D>(j<>)<29><>}<7D>(hhh]<5D>hm)<29><>}<7D>(hhh]<5D>h<11>aHyperlink target "sphx-glr-getting-started-tutorials-07-libdevice-function-py" is not referenced.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj{ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhjxubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>level<65>K<01>type<70><65>INFO<46><4F>source<63>h'<27>line<6E>Kuh%j<>ubj<62>)<29><>}<7D>(hhh]<5D>hm)<29><>}<7D>(hhh]<5D>h<11>jHyperlink target "sphx-glr-download-getting-started-tutorials-07-libdevice-function-py" is not referenced.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>level<65>K<01>type<70>j<EFBFBD><00>source<63>h'<27>line<6E>K<EFBFBD>uh%j<>ube<62> transformer<65>N<EFBFBD>
decoration<EFBFBD>Nhhub.