v1.1.2/.doctrees/getting-started/tutorials/01-vector-add.doctree

<EFBFBD><05><>l<00>sphinx.addnodes<65><73>document<6E><74><EFBFBD>)<29><>}<7D>(<28>	rawsource<63><65><00><>children<65>]<5D>(<28>docutils.nodes<65><73>comment<6E><74><EFBFBD>)<29><>}<7D>(h<05>DO NOT EDIT.<2E>h]<5D>h	<09>Text<78><74><EFBFBD><EFBFBD>DO NOT EDIT.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<06>parent<6E>huba<62>
attributes<EFBFBD>}<7D>(<28>ids<64>]<5D><>classes<65>]<5D><>names<65>]<5D><>dupnames<65>]<5D><>backrefs<66>]<5D><>	xml:space<63><65>preserve<76>u<EFBFBD>tagname<6D>h
hhhh<03>source<63><65>j/tmp/tmpv9mhi1e6/2d6df9b518a8152f777eb79b6b0a84becb706353/docs/getting-started/tutorials/01-vector-add.rst<73><74>line<6E>Kubh)<29><>}<7D>(h<05>8THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.<2E>h]<5D>h<11>8THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhh)ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hhhhh&h'h(Kubh)<29><>}<7D>(h<05>-TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:<3A>h]<5D>h<11>-TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhh7ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hhhhh&h'h(Kubh)<29><>}<7D>(h<05>,"getting-started/tutorials/01-vector-add.py"<22>h]<5D>h<11>,"getting-started/tutorials/01-vector-add.py"<22><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhhEubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hhhhh&h'h(Kubh)<29><>}<7D>(h<05>LINE NUMBERS ARE GIVEN BELOW.<2E>h]<5D>h<11>LINE NUMBERS ARE GIVEN BELOW.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhhSubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hhhhh&h'h(Kubh<00>only<6C><79><EFBFBD>)<29><>}<7D>(hhh]<5D>h	<09>note<74><65><EFBFBD>)<29><>}<7D>(h<05>rClick :ref:`here <sphx_glr_download_getting-started_tutorials_01-vector-add.py>`
to download the full example code<64>h]<5D>h	<09>	paragraph<70><68><EFBFBD>)<29><>}<7D>(h<05>rClick :ref:`here <sphx_glr_download_getting-started_tutorials_01-vector-add.py>`
to download the full example code<64>h]<5D>(h<11>Click <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>Click <20>hhnubh<00>pending_xref<65><66><EFBFBD>)<29><>}<7D>(h<05>J:ref:`here <sphx_glr_download_getting-started_tutorials_01-vector-add.py>`<60>h]<5D>h	<09>inline<6E><65><EFBFBD>)<29><>}<7D>(hh{h]<5D>h<11>here<72><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhhubah}<7D>(h]<5D>h]<5D>(<28>xref<65><66>std<74><64>std-ref<65>eh]<5D>h]<5D>h!]<5D>uh%h}hhyubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refdoc<6F><63>'getting-started/tutorials/01-vector-add<64><64>	refdomain<69>h<EFBFBD><68>reftype<70><65>ref<65><66>refexplicit<69><74><EFBFBD>refwarn<72><6E><EFBFBD>	reftarget<65><74><sphx_glr_download_getting-started_tutorials_01-vector-add.py<70>uh%hwh&h'h(K
hhnubh<11>"
to download the full example code<64><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>"
to download the full example code<64>hhnubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K
hhhubah}<7D>(h]<5D>h]<5D><>sphx-glr-download-link-note<74>ah]<5D>h]<5D>h!]<5D>uh%hfhhchhh&h'h(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>expr<70><72>html<6D>uh%hahhh&h'h(Khhubh	<09>target<65><74><EFBFBD>)<29><>}<7D>(h<05>8.. _sphx_glr_getting-started_tutorials_01-vector-add.py:<3A>h]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refid<69><64>3sphx-glr-getting-started-tutorials-01-vector-add-py<70>uh%h<>h(Khhhhh&h'ubh	<09>section<6F><6E><EFBFBD>)<29><>}<7D>(hhh]<5D>(h	<09>title<6C><65><EFBFBD>)<29><>}<7D>(h<05>Vector Addition<6F>h]<5D>h<11>Vector Addition<6F><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hh<>hhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(Kubhm)<29><>}<7D>(h<05>WIn this tutorial, you will write a simple vector addition using Triton and learn about:<3A>h]<5D>h<11>WIn this tutorial, you will write a simple vector addition using Triton and learn about:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hh<>hhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Khh<>hhubh	<09>bullet_list<73><74><EFBFBD>)<29><>}<7D>(hhh]<5D>(h	<09>	list_item<65><6D><EFBFBD>)<29><>}<7D>(h<05>%The basic programming model of Triton<6F>h]<5D>hm)<29><>}<7D>(hh<>h]<5D>h<11>%The basic programming model of Triton<6F><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hh<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Khh<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(Nubh<62>)<29><>}<7D>(h<05>CThe `triton.jit` decorator, which is used to define Triton kernels.<2E>h]<5D>hm)<29><>}<7D>(hj	h]<5D>(h<11>The <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>The <20>hjubh	<09>title_reference<63><65><EFBFBD>)<29><>}<7D>(h<05>`triton.jit`<60>h]<5D>h<11>
triton.jit<69><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%jhjubh<11>3 decorator, which is used to define Triton kernels.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>3 decorator, which is used to define Triton kernels.<2E>hjubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Khjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(Nubh<62>)<29><>}<7D>(h<05>lThe best practices for validating and benchmarking your custom ops against native reference implementations
<EFBFBD>h]<5D>hm)<29><>}<7D>(h<05>kThe best practices for validating and benchmarking your custom ops against native reference implementations<6E>h]<5D>h<11>kThe best practices for validating and benchmarking your custom ops against native reference implementations<6E><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj:hj8ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Khj4ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>bullet<65><74>-<2D>uh%h<>h&h'h(Khh<>hhubh)<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 12-14<31>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 12-14<31><34><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjTubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hh<>hhh&h'h(Kubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>Compute Kernel<65>h]<5D>h<11>Compute Kernel<65><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjghjehhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hjbhhh&h'h(K ubh)<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 14-49<34>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 14-49<34><39><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjsubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hjbhhh&h'h(K#ubh	<09>
literal_block<63><6B><EFBFBD>)<29><>}<7D>(hXaimport torch
import triton
import triton.language as tl


@triton.jit
def add_kernel(
    x_ptr,  # *Pointer* to first input vector
    y_ptr,  # *Pointer* to second input vector
    output_ptr,  # *Pointer* to output vector
    n_elements,  # Size of the vector
    **meta,  # Optional meta-parameters for the kernel
):
    BLOCK_SIZE = meta['BLOCK_SIZE']  # How many inputs each program should process
    # There are multiple 'program's processing different data. We identify which program
    # we are here
    pid = tl.program_id(axis=0)  # We use a 1D launch grid so axis is 0
    # This program will process inputs that are offset from the initial data.
    # for instance, if you had a vector of length 256 and block_size of 64, the programs
    # would each access the elements [0:64, 64:128, 128:192, 192:256].
    # Note that offsets is a list of pointers
    block_start = pid * BLOCK_SIZE
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    # Create a mask to guard memory operations against out-of-bounds accesses
    mask = offsets < n_elements
    # Load x and y from DRAM, masking out any extar elements in case the input is not a
    # multiple of the block size
    x = tl.load(x_ptr + offsets, mask=mask)
    y = tl.load(y_ptr + offsets, mask=mask)
    output = x + y
    # Write x + y back to DRAM
    tl.store(output_ptr + offsets, output, mask=mask)<29>h]<5D>hXaimport torch
import triton
import triton.language as tl


@triton.jit
def add_kernel(
    x_ptr,  # *Pointer* to first input vector
    y_ptr,  # *Pointer* to second input vector
    output_ptr,  # *Pointer* to output vector
    n_elements,  # Size of the vector
    **meta,  # Optional meta-parameters for the kernel
):
    BLOCK_SIZE = meta['BLOCK_SIZE']  # How many inputs each program should process
    # There are multiple 'program's processing different data. We identify which program
    # we are here
    pid = tl.program_id(axis=0)  # We use a 1D launch grid so axis is 0
    # This program will process inputs that are offset from the initial data.
    # for instance, if you had a vector of length 256 and block_size of 64, the programs
    # would each access the elements [0:64, 64:128, 128:192, 192:256].
    # Note that offsets is a list of pointers
    block_start = pid * BLOCK_SIZE
    offsets = block_start + tl.arange(0, BLOCK_SIZE)
    # Create a mask to guard memory operations against out-of-bounds accesses
    mask = offsets < n_elements
    # Load x and y from DRAM, masking out any extar elements in case the input is not a
    # multiple of the block size
    x = tl.load(x_ptr + offsets, mask=mask)
    y = tl.load(y_ptr + offsets, mask=mask)
    output = x + y
    # Write x + y back to DRAM
    tl.store(output_ptr + offsets, output, mask=mask)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$<24>force<63><65><EFBFBD>language<67><65>default<6C><74>highlight_args<67>}<7D>uh%j<>h&h'h(K$hjbhhubh)<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 50-52<35>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 50-52<35><32><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hjbhhh&h'h(KQubhm)<29><>}<7D>(h<05><>Let's also declare a helper function to (1) allocate the `z` tensor
and (2) enqueue the above kernel with appropriate grid/block sizes.<2E>h]<5D>(h<11>;Let’s also declare a helper function to (1) allocate the <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>9Let's also declare a helper function to (1) allocate the <20>hj<>hhh&Nh(Nubj)<29><>}<7D>(h<05>`z`<60>h]<5D>h<11>z<><7A><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%jhj<>ubh<11>K tensor
and (2) enqueue the above kernel with appropriate grid/block sizes.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>K tensor
and (2) enqueue the above kernel with appropriate grid/block sizes.<2E>hj<>hhh&Nh(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(KRhjbhhubh)<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 52-73<37>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 52-73<37><33><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hjbhhh&h'h(KVubj<62>)<29><>}<7D>(hX)def add(x: torch.Tensor, y: torch.Tensor):
    # We need to preallocate the output
    output = torch.empty_like(x)
    assert x.is_cuda and y.is_cuda and output.is_cuda
    n_elements = output.numel()
    # The SPMD launch grid denotes the number of kernel instances that run in parallel.
    # It is analogous to CUDA launch grids. It can be either Tuple[int], or Callable(metaparameters) -> Tuple[int]
    # In this case, we use a 1D grid where the size is the number of blocks
    grid = lambda meta: (triton.cdiv(n_elements, meta['BLOCK_SIZE']),)
    # NOTE:
    #  - each torch.tensor object is implicitly converted into a pointer to its first element.
    #  - `triton.jit`'ed functions can be index with a launch grid to obtain a callable GPU kernel
    #  - don't forget to pass meta-parameters as keywords arguments
    pgm = add_kernel[grid](x, y, output, n_elements, BLOCK_SIZE=1024)
    # We return a handle to z but, since `torch.cuda.synchronize()` hasn't been called, the kernel is still
    # running asynchronously at this point.
    return output<75>h]<5D>hX)def add(x: torch.Tensor, y: torch.Tensor):
    # We need to preallocate the output
    output = torch.empty_like(x)
    assert x.is_cuda and y.is_cuda and output.is_cuda
    n_elements = output.numel()
    # The SPMD launch grid denotes the number of kernel instances that run in parallel.
    # It is analogous to CUDA launch grids. It can be either Tuple[int], or Callable(metaparameters) -> Tuple[int]
    # In this case, we use a 1D grid where the size is the number of blocks
    grid = lambda meta: (triton.cdiv(n_elements, meta['BLOCK_SIZE']),)
    # NOTE:
    #  - each torch.tensor object is implicitly converted into a pointer to its first element.
    #  - `triton.jit`'ed functions can be index with a launch grid to obtain a callable GPU kernel
    #  - don't forget to pass meta-parameters as keywords arguments
    pgm = add_kernel[grid](x, y, output, n_elements, BLOCK_SIZE=1024)
    # We return a handle to z but, since `torch.cuda.synchronize()` hasn't been called, the kernel is still
    # running asynchronously at this point.
    return output<75><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$j<><00>j<EFBFBD><00>default<6C>j<EFBFBD>}<7D>uh%j<>h&h'h(KWhjbhhubh)<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 74-75<37>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 74-75<37><35><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hjbhhh&h'h(Kvubhm)<29><>}<7D>(h<05>yWe can now use the above function to compute the element-wise sum of two `torch.tensor` objects and test its correctness:<3A>h]<5D>(h<11>IWe can now use the above function to compute the element-wise sum of two <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>IWe can now use the above function to compute the element-wise sum of two <20>hj<>hhh&Nh(Nubj)<29><>}<7D>(h<05>`torch.tensor`<60>h]<5D>h<11>torch.tensor<6F><72><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%jhj<>ubh<11>" objects and test its correctness:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>" objects and test its correctness:<3A>hj<>hhh&Nh(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Kwhjbhhubh)<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 75-89<38>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 75-89<38><39><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hjbhhh&h'h(Kzubj<62>)<29><>}<7D>(hXAtorch.manual_seed(0)
size = 98432
x = torch.rand(size, device='cuda')
y = torch.rand(size, device='cuda')
output_torch = x + y
output_triton = add(x, y)
print(output_torch)
print(output_triton)
print(
    f'The maximum difference between torch and triton is '
    f'{torch.max(torch.abs(output_torch - output_triton))}'
)<29>h]<5D>hXAtorch.manual_seed(0)
size = 98432
x = torch.rand(size, device='cuda')
y = torch.rand(size, device='cuda')
output_torch = x + y
output_triton = add(x, y)
print(output_torch)
print(output_triton)
print(
    f'The maximum difference between torch and triton is '
    f'{torch.max(torch.abs(output_torch - output_triton))}'
)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj"ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$j<><00>j<EFBFBD><00>default<6C>j<EFBFBD>}<7D>uh%j<>h&h'h(K{hjbhhubhm)<29><>}<7D>(h<05>Out:<3A>h]<5D>h<11>Out:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj4hj2hhh&Nh(Nubah}<7D>(h]<5D>h]<5D><>sphx-glr-script-out<75>ah]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hjbhhubj<62>)<29><>}<7D>(h<05><>tensor([1.3713, 1.3076, 0.4940,  ..., 0.6724, 1.2141, 0.9733], device='cuda:0')
tensor([1.3713, 1.3076, 0.4940,  ..., 0.6724, 1.2141, 0.9733], device='cuda:0')
The maximum difference between torch and triton is 0.0<EFBFBD>h]<5D>h<11><>tensor([1.3713, 1.3076, 0.4940,  ..., 0.6724, 1.2141, 0.9733], device='cuda:0')
tensor([1.3713, 1.3076, 0.4940,  ..., 0.6724, 1.2141, 0.9733], device='cuda:0')
The maximum difference between torch and triton is 0.0<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjAubah}<7D>(h]<5D>h]<5D>j=ah]<5D>h]<5D>h!]<5D>h#h$j<><00>j<EFBFBD><00>none<6E>j<EFBFBD>}<7D>uh%j<>h&h'h(K<>hjbhhubh)<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 90-91<39>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 90-91<39><31><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjQubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hjbhhh&h'h(K<>ubhm)<29><>}<7D>(h<05>Seems like we're good to go!<21>h]<5D>h<11>Seems like we’re good to go!<21><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjahj_hhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hjbhhubh)<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 93-98<39>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 93-98<39><38><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjmubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hjbhhh&h'h(K<>ubeh}<7D>(h]<5D><>compute-kernel<65>ah]<5D>h]<5D><>compute kernel<65>ah]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(K ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>	Benchmark<72>h]<5D>h<11>	Benchmark<72><6B><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hj<>hhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj<>hhh&h'h(K<>ubhm)<29><>}<7D>(hXWe can now benchmark our custom op on vectors of increasing sizes to get a sense of how it does relative to PyTorch.
To make things easier, Triton has a set of built-in utilities that allow us to concisely plot the performance of your custom ops
for different problem sizes.<2E>h]<5D>hXWe can now benchmark our custom op on vectors of increasing sizes to get a sense of how it does relative to PyTorch.
To make things easier, Triton has a set of built-in utilities that allow us to concisely plot the performance of your custom ops
for different problem sizes.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hj<>hhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj<>hhubh)<29><>}<7D>(h<05>)GENERATED FROM PYTHON SOURCE LINES 98-127<32>h]<5D>h<11>)GENERATED FROM PYTHON SOURCE LINES 98-127<32><37><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hj<>hhh&h'h(K<>ubj<62>)<29><>}<7D>(hX@triton.testing.perf_report(
    triton.testing.Benchmark(
        x_names=['size'],  # argument names to use as an x-axis for the plot
        x_vals=[
            2 ** i for i in range(12, 28, 1)
        ],  # different possible values for `x_name`
        x_log=True,  # x axis is logarithmic
        line_arg='provider',  # argument name whose value corresponds to a different line in the plot
        line_vals=['triton', 'torch'],  # possible values for `line_arg`
        line_names=['Triton', 'Torch'],  # label name for the lines
        styles=[('blue', '-'), ('green', '-')],  # line styles
        ylabel='GB/s',  # label name for the y-axis
        plot_name='vector-add-performance',  # name for the plot. Used also as a file name for saving the plot.
        args={},  # values for function arguments not in `x_names` and `y_name`
    )
)
def benchmark(size, provider):
    x = torch.rand(size, device='cuda', dtype=torch.float32)
    y = torch.rand(size, device='cuda', dtype=torch.float32)
    if provider == 'torch':
        ms, min_ms, max_ms = triton.testing.do_bench(lambda: x + y)
    if provider == 'triton':
        ms, min_ms, max_ms = triton.testing.do_bench(lambda: add(x, y))
    gbps = lambda ms: 12 * size / ms * 1e-6
    return gbps(ms), gbps(max_ms), gbps(min_ms)<29>h]<5D>hX@triton.testing.perf_report(
    triton.testing.Benchmark(
        x_names=['size'],  # argument names to use as an x-axis for the plot
        x_vals=[
            2 ** i for i in range(12, 28, 1)
        ],  # different possible values for `x_name`
        x_log=True,  # x axis is logarithmic
        line_arg='provider',  # argument name whose value corresponds to a different line in the plot
        line_vals=['triton', 'torch'],  # possible values for `line_arg`
        line_names=['Triton', 'Torch'],  # label name for the lines
        styles=[('blue', '-'), ('green', '-')],  # line styles
        ylabel='GB/s',  # label name for the y-axis
        plot_name='vector-add-performance',  # name for the plot. Used also as a file name for saving the plot.
        args={},  # values for function arguments not in `x_names` and `y_name`
    )
)
def benchmark(size, provider):
    x = torch.rand(size, device='cuda', dtype=torch.float32)
    y = torch.rand(size, device='cuda', dtype=torch.float32)
    if provider == 'torch':
        ms, min_ms, max_ms = triton.testing.do_bench(lambda: x + y)
    if provider == 'triton':
        ms, min_ms, max_ms = triton.testing.do_bench(lambda: add(x, y))
    gbps = lambda ms: 12 * size / ms * 1e-6
    return gbps(ms), gbps(max_ms), gbps(min_ms)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$j<><00>j<EFBFBD><00>default<6C>j<EFBFBD>}<7D>uh%j<>h&h'h(K<>hj<>hhubh)<29><>}<7D>(h<05>*GENERATED FROM PYTHON SOURCE LINES 128-130<33>h]<5D>h<11>*GENERATED FROM PYTHON SOURCE LINES 128-130<33><30><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hj<>hhh&h'h(K<>ubhm)<29><>}<7D>(h<05><>We can now run the decorated function above. Pass `print_data=True` to see the performance number, `show_plots=True` to plot them, and/or
`save_path='/path/to/results/' to save them to disk along with raw CSV data<74>h]<5D>(h<11>2We can now run the decorated function above. Pass <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>2We can now run the decorated function above. Pass <20>hj<>hhh&Nh(Nubj)<29><>}<7D>(h<05>`print_data=True`<60>h]<5D>h<11>print_data=True<75><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%jhj<>ubh<11>  to see the performance number, <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>  to see the performance number, <20>hj<>hhh&Nh(Nubj)<29><>}<7D>(h<05>`show_plots=True`<60>h]<5D>h<11>show_plots=True<75><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%jhj<>ubh<11> to plot them, and/or
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05> to plot them, and/or
<EFBFBD>hj<>hhh&Nh(Nubh	<09>problematic<69><63><EFBFBD>)<29><>}<7D>(h<05>`<60>h]<5D>h<11>`<60><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D><>id2<64>ah]<5D>h]<5D>h]<5D>h!]<5D><>refid<69><64>id1<64>uh%j<>hj<>ubh<11>Nsave_path=’/path/to/results/’ to save them to disk along with raw CSV data<74><61><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>Jsave_path='/path/to/results/' to save them to disk along with raw CSV data<74>hj<>hhh&Nh(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj<>hhubh)<29><>}<7D>(h<05>*GENERATED FROM PYTHON SOURCE LINES 130-131<33>h]<5D>h<11>*GENERATED FROM PYTHON SOURCE LINES 130-131<33><31><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hj<>hhh&h'h(K<>ubj<62>)<29><>}<7D>(h<05>/benchmark.run(print_data=True, show_plots=True)<29>h]<5D>h<11>/benchmark.run(print_data=True, show_plots=True)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj)ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$j<><00>j<EFBFBD><00>default<6C>j<EFBFBD>}<7D>uh%j<>h&h'h(K<>hj<>hhubh	<09>image<67><65><EFBFBD>)<29><>}<7D>(h<05><>.. image:: /getting-started/tutorials/images/sphx_glr_01-vector-add_001.png
    :alt: 01 vector add
    :class: sphx-glr-single-img

<EFBFBD>h]<5D>h}<7D>(h]<5D>h]<5D><>sphx-glr-single-img<6D>ah]<5D>h]<5D>h!]<5D><>alt<6C><74>
01 vector add<64><64>uri<72><69>?getting-started/tutorials/images/sphx_glr_01-vector-add_001.png<6E><67>
candidates<EFBFBD>}<7D><>*<2A>jIsuh%j9hj<>hhh&h'h(Nubhm)<29><>}<7D>(h<05>Out:<3A>h]<5D>h<11>Out:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjOhjMhhh&Nh(Nubah}<7D>(h]<5D>h]<5D><>sphx-glr-script-out<75>ah]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj<>hhubj<62>)<29><>}<7D>(hX<>vector-add-performance:
           size      Triton       Torch
0        4096.0    9.600000    9.600000
1        8192.0   19.200000   19.200000
2       16384.0   38.400001   31.999999
3       32768.0   76.800002   76.800002
4       65536.0  127.999995  127.999995
5      131072.0  219.428568  219.428568
6      262144.0  341.333321  384.000001
7      524288.0  472.615390  472.615390
8     1048576.0  614.400016  614.400016
9     2097152.0  722.823517  722.823517
10    4194304.0  780.190482  780.190482
11    8388608.0  812.429770  812.429770
12   16777216.0  833.084721  833.084721
13   33554432.0  842.004273  843.811163
14   67108864.0  847.448255  848.362445
15  134217728.0  849.737435  850.656574<EFBFBD>h]<5D>hX<>vector-add-performance:
           size      Triton       Torch
0        4096.0    9.600000    9.600000
1        8192.0   19.200000   19.200000
2       16384.0   38.400001   31.999999
3       32768.0   76.800002   76.800002
4       65536.0  127.999995  127.999995
5      131072.0  219.428568  219.428568
6      262144.0  341.333321  384.000001
7      524288.0  472.615390  472.615390
8     1048576.0  614.400016  614.400016
9     2097152.0  722.823517  722.823517
10    4194304.0  780.190482  780.190482
11    8388608.0  812.429770  812.429770
12   16777216.0  833.084721  833.084721
13   33554432.0  842.004273  843.811163
14   67108864.0  847.448255  848.362445
15  134217728.0  849.737435  850.656574<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj\ubah}<7D>(h]<5D>h]<5D>jXah]<5D>h]<5D>h!]<5D>h#h$j<><00>j<EFBFBD><00>none<6E>j<EFBFBD>}<7D>uh%j<>h&h'h(K<>hj<>hhubhm)<29><>}<7D>(h<05>B**Total running time of the script:** ( 1 minutes  38.029 seconds)<29>h]<5D>(h	<09>strong<6E><67><EFBFBD>)<29><>}<7D>(h<05>%**Total running time of the script:**<2A>h]<5D>h<11>!Total running time of the script:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjrubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%jphjlubh<11> ( 1 minutes  38.029 seconds)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05> ( 1 minutes  38.029 seconds)<29>hjlhhh&Nh(Nubeh}<7D>(h]<5D>h]<5D><>sphx-glr-timing<6E>ah]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Mhj<>hhubh<62>)<29><>}<7D>(h<05>A.. _sphx_glr_download_getting-started_tutorials_01-vector-add.py:<3A>h]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>hČ<sphx-glr-download-getting-started-tutorials-01-vector-add-py<70>uh%h<>h(Mhj<>hhh&h'ubhb)<29><>}<7D>(hhh]<5D>h	<09>	container<65><72><EFBFBD>)<29><>}<7D>(hX.. container:: sphx-glr-download sphx-glr-download-python

   :download:`Download Python source code: 01-vector-add.py <01-vector-add.py>`


.. container:: sphx-glr-download sphx-glr-download-jupyter

   :download:`Download Jupyter notebook: 01-vector-add.ipynb <01-vector-add.ipynb>`<60>h]<5D>(j<>)<29><>}<7D>(h<05>L:download:`Download Python source code: 01-vector-add.py <01-vector-add.py>`<60>h]<5D>hm)<29><>}<7D>(hj<>h]<5D>h<00>download_reference<63><65><EFBFBD>)<29><>}<7D>(hj<>h]<5D>h	<09>literal<61><6C><EFBFBD>)<29><>}<7D>(hj<>h]<5D>h<11>-Download Python source code: 01-vector-add.py<70><79><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>(h<><68>download<61>eh]<5D>h]<5D>h!]<5D>uh%j<>hj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refdoc<6F>h<EFBFBD><68>	refdomain<69>h<06>reftype<70>j<EFBFBD><00>refexplicit<69><74><EFBFBD>refwarn<72><6E>h<EFBFBD><68>01-vector-add.py<70><79>filename<6D><65>162d97d49a32414049819dd8bb8378080/01-vector-add.py<70>uh%j<>h&h'h(Mhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Mhj<>ubah}<7D>(h]<5D>h]<5D>(<28>sphx-glr-download<61><64>sphx-glr-download-python<6F>eh]<5D>h]<5D>h!]<5D>uh%j<>hj<>ubj<62>)<29><>}<7D>(h<05>P:download:`Download Jupyter notebook: 01-vector-add.ipynb <01-vector-add.ipynb>`<60>h]<5D>hm)<29><>}<7D>(hj<>h]<5D>j<EFBFBD>)<29><>}<7D>(hj<>h]<5D>j<EFBFBD>)<29><>}<7D>(hj<>h]<5D>h<11>.Download Jupyter notebook: 01-vector-add.ipynb<6E><62><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>(h<><68>download<61>eh]<5D>h]<5D>h!]<5D>uh%j<>hj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refdoc<6F>h<EFBFBD><68>	refdomain<69>h<06>reftype<70>j<EFBFBD><00>refexplicit<69><74><EFBFBD>refwarn<72><6E>h<EFBFBD><68>01-vector-add.ipynb<6E>j<EFBFBD><00>4f191ee1e78dc52eb5f7cba88f71cef2f/01-vector-add.ipynb<6E>uh%j<>h&h'h(Mhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Mhj<>ubah}<7D>(h]<5D>h]<5D>(<28>sphx-glr-download<61><64>sphx-glr-download-jupyter<65>eh]<5D>h]<5D>h!]<5D>uh%j<>hj<>ubeh}<7D>(h]<5D>h]<5D>(<28>sphx-glr-footer<65><72>class<73><73>sphx-glr-footer-example<6C>eh]<5D>h]<5D>h!]<5D>uh%j<>hj<>hhh&Nh(Nubah}<7D>(h]<5D>j<EFBFBD>ah]<5D>h]<5D><><sphx_glr_download_getting-started_tutorials_01-vector-add.py<70>ah]<5D>h!]<5D>h<EFBFBD><68>html<6D>uh%hahhh&h'h(Mhj<><00>expect_referenced_by_name<6D>}<7D>jj<>s<>expect_referenced_by_id<69>}<7D>j<EFBFBD>j<>subhb)<29><>}<7D>(hhh]<5D>hm)<29><>}<7D>(h<05>I`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_<>h]<5D>(h	<09>	reference<63><65><EFBFBD>)<29><>}<7D>(hj%h]<5D>h<11>#Gallery generated by Sphinx-Gallery<72><79><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>#Gallery generated by Sphinx-Gallery<72>hj)ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>name<6D><65>#Gallery generated by Sphinx-Gallery<72><79>refuri<72><69> https://sphinx-gallery.github.io<69>uh%j'hj#ubh<62>)<29><>}<7D>(h<05># <https://sphinx-gallery.github.io><3E>h]<5D>h}<7D>(h]<5D><>#gallery-generated-by-sphinx-gallery<72>ah]<5D>h]<5D><>#gallery generated by sphinx-gallery<72>ah]<5D>h!]<5D><>refuri<72>j:uh%h<><68>
referenced<EFBFBD>Khj#ubeh}<7D>(h]<5D>h]<5D><>sphx-glr-signature<72>ah]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Mhj hhubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h<EFBFBD><68>html<6D>uh%hahhh&h'h(Mhj<>ubeh}<7D>(h]<5D><>	benchmark<72>ah]<5D>h]<5D><>	benchmark<72>ah]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(K<>ubeh}<7D>(h]<5D>(<28>vector-addition<6F>h<EFBFBD>eh]<5D><>sphx-glr-example-title<6C>ah]<5D>(<28>vector addition<6F><6E>3sphx_glr_getting-started_tutorials_01-vector-add.py<70>eh]<5D>h!]<5D>uh%h<>hhhhh&h'h(Kj}<7D>jfh<>sj}<7D>h<EFBFBD>h<EFBFBD>subeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>source<63>h'uh%h<01>current_source<63>N<EFBFBD>current_line<6E>N<EFBFBD>settings<67><73>docutils.frontend<6E><64>Values<65><73><EFBFBD>)<29><>}<7D>(h<>N<EFBFBD>	generator<6F>N<EFBFBD>	datestamp<6D>N<EFBFBD>source_link<6E>N<EFBFBD>
source_url<EFBFBD>N<EFBFBD>
toc_backlinks<6B><73>entry<72><79>footnote_backlinks<6B>K<01>
sectnum_xform<72>K<01>strip_comments<74>N<EFBFBD>strip_elements_with_classes<65>N<EFBFBD>
strip_classes<65>N<EFBFBD>report_level<65>K<02>
halt_level<EFBFBD>K<05>exit_status_level<65>K<05>debug<75>N<EFBFBD>warning_stream<61>N<EFBFBD>	traceback<63><6B><EFBFBD>input_encoding<6E><67>	utf-8-sig<69><67>input_encoding_error_handler<65><72>strict<63><74>output_encoding<6E><67>utf-8<><38>output_encoding_error_handler<65>j<EFBFBD><00>error_encoding<6E><67>utf-8<><38>error_encoding_error_handler<65><72>backslashreplace<63><65>
language_code<64><65>en<65><6E>record_dependencies<65>N<EFBFBD>config<69>N<EFBFBD>	id_prefix<69>h<06>auto_id_prefix<69><78>id<69><64>
dump_settings<67>N<EFBFBD>dump_internals<6C>N<EFBFBD>dump_transforms<6D>N<EFBFBD>dump_pseudo_xml<6D>N<EFBFBD>expose_internals<6C>N<EFBFBD>strict_visitor<6F>N<EFBFBD>_disable_config<69>N<EFBFBD>_source<63>h'<27>_destination<6F>N<EFBFBD>
_config_files<65>]<5D><>pep_references<65>N<EFBFBD>pep_base_url<72><6C> https://www.python.org/dev/peps/<2F><>pep_file_url_template<74><65>pep-%04d<34><64>rfc_references<65>N<EFBFBD>rfc_base_url<72><6C>https://tools.ietf.org/html/<2F><>	tab_width<74>K<08>trim_footnote_reference_space<63><65><EFBFBD>file_insertion_enabled<65><64><EFBFBD>raw_enabled<65>K<01>syntax_highlight<68><74>long<6E><67>smart_quotes<65><73><EFBFBD>smartquotes_locales<65>]<5D><>character_level_inline_markup<75><70><EFBFBD>doctitle_xform<72><6D><EFBFBD>
docinfo_xform<72>K<01>sectsubtitle_xform<72><6D><EFBFBD>embed_stylesheet<65><74><EFBFBD>cloak_email_addresses<65><73><EFBFBD>env<6E>Nub<75>reporter<65>N<EFBFBD>indirect_targets<74>]<5D><>substitution_defs<66>}<7D><>substitution_names<65>}<7D><>refnames<65>}<7D><>refids<64>}<7D>(h<>]<5D>h<EFBFBD>aj<61>]<5D>j<EFBFBD>au<61>nameids<64>}<7D>(jfh<>jejaj<>j}j\jYjj<>jDjAu<>	nametypes<65>}<7D>(jf<00>jeNj<4E>Nj\Nj<00>jD<00>uh}<7D>(h<>h<EFBFBD>jah<>j}jbjYj<>jh	<09>system_message<67><65><EFBFBD>)<29><>}<7D>(hhh]<5D>hm)<29><>}<7D>(h<05>LInline interpreted text or phrase reference start-string without end-string.<2E>h]<5D>h<11>LInline interpreted text or phrase reference start-string without end-string.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhj<>ubah}<7D>(h]<5D>jah]<5D>h]<5D>h]<5D>h!]<5D>j	a<>level<65>K<02>type<70><65>WARNING<4E><47>line<6E>KҌsource<63>h'uh%j<>hj<>hhh&h'h(K<>ubj	j<>j<>j<>jAj;u<>
footnote_refs<66>}<7D><>
citation_refs<66>}<7D><>
autofootnotes<65>]<5D><>autofootnote_refs<66>]<5D><>symbol_footnotes<65>]<5D><>symbol_footnote_refs<66>]<5D><>	footnotes<65>]<5D><>	citations<6E>]<5D><>autofootnote_start<72>K<01>symbol_footnote_start<72>K<00>
id_counter<EFBFBD><EFBFBD>collections<6E><73>Counter<65><72><EFBFBD>}<7D>j<EFBFBD>Ks<><73>R<EFBFBD><52>parse_messages<65>]<5D>j<EFBFBD>a<>transform_messages<65>]<5D>(j<>)<29><>}<7D>(hhh]<5D>hm)<29><>}<7D>(hhh]<5D>h<11>YHyperlink target "sphx-glr-getting-started-tutorials-01-vector-add-py" is not referenced.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhj
ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>level<65>K<01>type<70><65>INFO<46><4F>source<63>h'<27>line<6E>Kuh%j<>ubj<62>)<29><>}<7D>(hhh]<5D>hm)<29><>}<7D>(hhh]<5D>h<11>bHyperlink target "sphx-glr-download-getting-started-tutorials-01-vector-add-py" is not referenced.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj+ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhj(ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>level<65>K<01>type<70>j%<00>source<63>h'<27>line<6E>Muh%j<>ube<62>transformer<65>N<EFBFBD>
decoration<EFBFBD>Nhhub.