Files
triton/master/.doctrees/getting-started/tutorials/04-low-memory-dropout.doctree

308 lines
35 KiB
Plaintext
Raw Normal View History

2022-02-09 07:15:50 +00:00
<EFBFBD><05><><EFBFBD><00>sphinx.addnodes<65><73>document<6E><74><EFBFBD>)<29><>}<7D>(<28> rawsource<63><65><00><>children<65>]<5D>(<28>docutils.nodes<65><73>comment<6E><74><EFBFBD>)<29><>}<7D>(h<05> DO NOT EDIT.<2E>h]<5D>h <09>Text<78><74><EFBFBD><EFBFBD> DO NOT EDIT.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<06>parent<6E>h uba<62>
attributes<EFBFBD>}<7D>(<28>ids<64>]<5D><>classes<65>]<5D><>names<65>]<5D><>dupnames<65>]<5D><>backrefs<66>]<5D><> xml:space<63><65>preserve<76>u<EFBFBD>tagname<6D>h
2022-02-28 00:41:59 +00:00
hhhh<03>source<63><65>r/tmp/tmpcut4r5hv/d9dd97492f228020573b39a9cec14ee3b8776957/docs/getting-started/tutorials/04-low-memory-dropout.rst<73><74>line<6E>Kubh )<29><>}<7D>(h<05>8THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.<2E>h]<5D>h<11>8THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhh)ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
2022-02-09 07:15:50 +00:00
hhhhh&h'h(Kubh )<29><>}<7D>(h<05>-TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:<3A>h]<5D>h<11>-TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhh7ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hhhhh&h'h(Kubh )<29><>}<7D>(h<05>4"getting-started/tutorials/04-low-memory-dropout.py"<22>h]<5D>h<11>4"getting-started/tutorials/04-low-memory-dropout.py"<22><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhhEubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hhhhh&h'h(Kubh )<29><>}<7D>(h<05>LINE NUMBERS ARE GIVEN BELOW.<2E>h]<5D>h<11>LINE NUMBERS ARE GIVEN BELOW.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhhSubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hhhhh&h'h(Kubh<00>only<6C><79><EFBFBD>)<29><>}<7D>(hhh]<5D>h <09>note<74><65><EFBFBD>)<29><>}<7D>(h<05>zClick :ref:`here <sphx_glr_download_getting-started_tutorials_04-low-memory-dropout.py>`
to download the full example code<64>h]<5D>h <09> paragraph<70><68><EFBFBD>)<29><>}<7D>(h<05>zClick :ref:`here <sphx_glr_download_getting-started_tutorials_04-low-memory-dropout.py>`
to download the full example code<64>h]<5D>(h<11>Click <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>Click <20>hhnubh<00> pending_xref<65><66><EFBFBD>)<29><>}<7D>(h<05>R:ref:`here <sphx_glr_download_getting-started_tutorials_04-low-memory-dropout.py>`<60>h]<5D>h <09>inline<6E><65><EFBFBD>)<29><>}<7D>(hh{h]<5D>h<11>here<72><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhhubah}<7D>(h]<5D>h]<5D>(<28>xref<65><66>std<74><64>std-ref<65>eh]<5D>h]<5D>h!]<5D>uh%h}hhyubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refdoc<6F><63>/getting-started/tutorials/04-low-memory-dropout<75><74> refdomain<69>h<EFBFBD><68>reftype<70><65>ref<65><66> refexplicit<69><74><EFBFBD>refwarn<72><6E><EFBFBD> reftarget<65><74>Dsphx_glr_download_getting-started_tutorials_04-low-memory-dropout.py<70>uh%hwh&h'h(K hhnubh<11>"
to download the full example code<64><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>"
to download the full example code<64>hhnubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K hhhubah}<7D>(h]<5D>h]<5D><>sphx-glr-download-link-note<74>ah]<5D>h]<5D>h!]<5D>uh%hfhhchhh&h'h(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>expr<70><72>html<6D>uh%hahhh&h'h(Khhubh <09>target<65><74><EFBFBD>)<29><>}<7D>(h<05>@.. _sphx_glr_getting-started_tutorials_04-low-memory-dropout.py:<3A>h]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refid<69><64>;sphx-glr-getting-started-tutorials-04-low-memory-dropout-py<70>uh%h<>h(Khhhhh&h'ubh <09>section<6F><6E><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09>title<6C><65><EFBFBD>)<29><>}<7D>(h<05>Low-Memory Dropout<75>h]<5D>h<11>Low-Memory Dropout<75><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hh<>hhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(Kubhm)<29><>}<7D>(hX.In this tutorial, you will write a memory-efficient implementation of dropout whose state
will be composed of a single int32 seed. This differs from more traditional implementations of dropout,
whose state is generally composed of a bit mask tensor of the same shape as the input. You will learn about:<3A>h]<5D>hX.In this tutorial, you will write a memory-efficient implementation of dropout whose state
will be composed of a single int32 seed. This differs from more traditional implementations of dropout,
whose state is generally composed of a bit mask tensor of the same shape as the input. You will learn about:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hh<>hhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Khh<>hhubh <09> bullet_list<73><74><EFBFBD>)<29><>}<7D>(hhh]<5D>(h <09> list_item<65><6D><EFBFBD>)<29><>}<7D>(h<05>@The limitations of naive implementations of Dropout with PyTorch<63>h]<5D>hm)<29><>}<7D>(hh<>h]<5D>h<11>@The limitations of naive implementations of Dropout with PyTorch<63><68><EFBFBD><EFBFBD><EFBFBD>}<7D>(hh<>hh<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Khh<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(Nubh<62>)<29><>}<7D>(h<05>3Parallel pseudo-random number generation in Triton
<EFBFBD>h]<5D>hm)<29><>}<7D>(h<05>2Parallel pseudo-random number generation in Triton<6F>h]<5D>h<11>2Parallel pseudo-random number generation in Triton<6F><6E><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj hj ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Khjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>bullet<65><74>-<2D>uh%h<>h&h'h(Khh<>hhubh )<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 14-29<32>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 14-29<32><39><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj'ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hh<>hhh&h'h(K ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>Baseline<6E>h]<5D>h<11>Baseline<6E><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj:hj8hhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj5hhh&h'h(K"ubhm)<29><>}<7D>(h<05><>The *dropout* operator was first introduced in [SRIVASTAVA2014]_ as a way to improve the performance
of deep neural networks in low-data regime (i.e. regularization).<2E>h]<5D>(h<11>The <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>The <20>hjFhhh&Nh(Nubh <09>emphasis<69><73><EFBFBD>)<29><>}<7D>(h<05> *dropout*<2A>h]<5D>h<11>dropout<75><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjQubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%jOhjFubh<11>" operator was first introduced in <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>" operator was first introduced in <20>hjFhhh&Nh(Nubhx)<29><>}<7D>(h<05>SRIVASTAVA2014<31>h]<5D>h~)<29><>}<7D>(hjfh]<5D>h<11>[SRIVASTAVA2014]<5D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjhubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h}hjdubah}<7D>(h]<5D><>id1<64>ah]<5D>h]<5D>h]<5D>h!]<5D><> refdomain<69><6E>citation<6F><6E>reftype<70><65>ref<65><66> reftarget<65>jf<00>refwarn<72><6E><EFBFBD>support_smartquotes<65><73>uh%hwh&h'h(K#hjFhhubh<11>f as a way to improve the performance
of deep neural networks in low-data regime (i.e. regularization).<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>f as a way to improve the performance
of deep neural networks in low-data regime (i.e. regularization).<2E>hjFhhh&Nh(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K#hj5hhubhm)<29><>}<7D>(hX2It takes a vector as input and produces a vector of the same shape as output. Each scalar in the
output has a probability :math:`p` of being changed to zero and otherwise it is copied from the input.
This forces the network to perform well even when only :math:`1 - p` scalars from the input are available.<2E>h]<5D>(h<11>zIt takes a vector as input and produces a vector of the same shape as output. Each scalar in the
output has a probability <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>zIt takes a vector as input and produces a vector of the same shape as output. Each scalar in the
output has a probability <20>hj<>hhh&Nh(Nubh <09>math<74><68><EFBFBD>)<29><>}<7D>(h<05> :math:`p`<60>h]<5D>h<11>p<><70><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%j<>hj<>ubh<11>| of being changed to zero and otherwise it is copied from the input.
This forces the network to perform well even when only <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>| of being changed to zero and otherwise it is copied from the input.
This forces the network to perform well even when only <20>hj<>hhh&Nh(Nubj<62>)<29><>}<7D>(h<05> :math:`1 - p`<60>h]<5D>h<11>1 - p<><70><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%j<>hj<>ubh<11>& scalars from the input are available.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>& scalars from the input are available.<2E>hj<>hhh&Nh(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K&hj5hhubhm)<29><>}<7D>(hXzAt evaluation time we want to use the full power of the network so we set :math:`p=0`. Naively this would
increase the norm of the output (which can be a bad thing, e.g. it can lead to artificial decrease
in the output softmax temperature). To prevent this we multiply the output by :math:`\frac{1}{1 - p}`, which
keeps the norm consistent regardless of the dropout probability.<2E>h]<5D>(h<11>JAt evaluation time we want to use the full power of the network so we set <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>JAt evaluation time we want to use the full power of the network so we set <20>hj<>hhh&Nh(Nubj<62>)<29><>}<7D>(h<05> :math:`p=0`<60>h]<5D>h<11>p=0<><30><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%j<>hj<>ubh<11><>. Naively this would
increase the norm of the output (which can be a bad thing, e.g. it can lead to artificial decrease
in the output softmax temperature). To prevent this we multiply the output by <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05><>. Naively this would
increase the norm of the output (which can be a bad thing, e.g. it can lead to artificial decrease
in the output softmax temperature). To prevent this we multiply the output by <20>hj<>hhh&Nh(Nubj<62>)<29><>}<7D>(h<05>:math:`\frac{1}{1 - p}`<60>h]<5D>h<11>\frac{1}{1 - p}<7D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%j<>hj<>ubh<11>H, which
keeps the norm consistent regardless of the dropout probability.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>H, which
keeps the norm consistent regardless of the dropout probability.<2E>hj<>hhh&Nh(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K*hj5hhubhm)<29><>}<7D>(h<05>7Let's first take a look at the baseline implementation.<2E>h]<5D>h<11>9Lets first take a look at the baseline implementation.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hj<>hhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K/hj5hhubh )<29><>}<7D>(h<05>(GENERATED FROM PYTHON SOURCE LINES 29-82<38>h]<5D>h<11>(GENERATED FROM PYTHON SOURCE LINES 29-82<38><32><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hj5hhh&h'h(K2ubh <09> literal_block<63><6B><EFBFBD>)<29><>}<7D>(hX<>import tabulate
import torch
import triton
import triton.language as tl
@triton.jit
def _dropout(
x_ptr, # pointer to the input
x_keep_ptr, # pointer to a mask of 0s and 1s
output_ptr, # pointer to the output
n_elements, # number of elements in the `x` tensor
p, # probability that an element of `x` is changed to zero
BLOCK_SIZE: tl.constexpr,
):
pid = tl.program_id(axis=0)
block_start = pid * BLOCK_SIZE
offsets = block_start + tl.arange(0, BLOCK_SIZE)
mask = offsets < n_elements
# Load data
x = tl.load(x_ptr + offsets, mask=mask)
x_keep = tl.load(x_keep_ptr + offsets, mask=mask)
# The line below is the crucial part, described in the paragraph above!
output = tl.where(x_keep, x / (1 - p), 0.0)
# Write-back output
tl.store(output_ptr + offsets, output, mask=mask)
def dropout(x, x_keep, p):
output = torch.empty_like(x)
assert x.is_contiguous()
n_elements = x.numel()
grid = lambda meta: (triton.cdiv(n_elements, meta['BLOCK_SIZE']),)
_dropout[grid](x, x_keep, output, n_elements, p, BLOCK_SIZE=1024)
return output
# Input tensor
x = torch.randn(size=(10,)).cuda()
# Dropout mask
p = 0.5
x_keep = (torch.rand(size=(10,)) > p).to(torch.int32).cuda()
#
output = dropout(x, x_keep=x_keep, p=p)
print(tabulate.tabulate([
["input"] + x.tolist(),
["keep mask"] + x_keep.tolist(),
["output"] + output.tolist()
]))<29>h]<5D>hX<>import tabulate
import torch
import triton
import triton.language as tl
@triton.jit
def _dropout(
x_ptr, # pointer to the input
x_keep_ptr, # pointer to a mask of 0s and 1s
output_ptr, # pointer to the output
n_elements, # number of elements in the `x` tensor
p, # probability that an element of `x` is changed to zero
BLOCK_SIZE: tl.constexpr,
):
pid = tl.program_id(axis=0)
block_start = pid * BLOCK_SIZE
offsets = block_start + tl.arange(0, BLOCK_SIZE)
mask = offsets < n_elements
# Load data
x = tl.load(x_ptr + offsets, mask=mask)
x_keep = tl.load(x_keep_ptr + offsets, mask=mask)
# The line below is the crucial part, described in the paragraph above!
output = tl.where(x_keep, x / (1 - p), 0.0)
# Write-back output
tl.store(output_ptr + offsets, output, mask=mask)
def dropout(x, x_keep, p):
output = torch.empty_like(x)
assert x.is_contiguous()
n_elements = x.numel()
grid = lambda meta: (triton.cdiv(n_elements, meta['BLOCK_SIZE']),)
_dropout[grid](x, x_keep, output, n_elements, p, BLOCK_SIZE=1024)
return output
# Input tensor
x = torch.randn(size=(10,)).cuda()
# Dropout mask
p = 0.5
x_keep = (torch.rand(size=(10,)) > p).to(torch.int32).cuda()
#
output = dropout(x, x_keep=x_keep, p=p)
print(tabulate.tabulate([
["input"] + x.tolist(),
["keep mask"] + x_keep.tolist(),
["output"] + output.tolist()
]))<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$<24>force<63><65><EFBFBD>language<67><65>default<6C><74>highlight_args<67>}<7D>uh%jh&h'h(K3hj5hhubhm)<29><>}<7D>(h<05>Out:<3A>h]<5D>h<11>Out:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj-hj+hhh&Nh(Nubah}<7D>(h]<5D>h]<5D><>sphx-glr-script-out<75>ah]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Kphj5hhubj)<29><>}<7D>(hX!--------- ------- --------- -------- -------- -------- -------- -------- -------- --------- ---------
input 1.541 -0.293429 -2.17879 0.568431 -1.08452 -1.3986 0.403347 0.838026 -0.719258 -0.403344
keep mask 1 1 0 1 0 1 1 0 0 0
output 3.08199 -0.586858 0 1.13686 0 -2.79719 0.806694 0 0 0
--------- ------- --------- -------- -------- -------- -------- -------- -------- --------- ---------<2D>h]<5D>hX!--------- ------- --------- -------- -------- -------- -------- -------- -------- --------- ---------
input 1.541 -0.293429 -2.17879 0.568431 -1.08452 -1.3986 0.403347 0.838026 -0.719258 -0.403344
keep mask 1 1 0 1 0 1 1 0 0 0
output 3.08199 -0.586858 0 1.13686 0 -2.79719 0.806694 0 0 0
--------- ------- --------- -------- -------- -------- -------- -------- -------- --------- ---------<2D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj:ubah}<7D>(h]<5D>h]<5D>j6ah]<5D>h]<5D>h!]<5D>h#h$j&<00>j'<00>none<6E>j)}<7D>uh%jh&h'h(Krhj5hhubh )<29><>}<7D>(h<05>)GENERATED FROM PYTHON SOURCE LINES 83-101<30>h]<5D>h<11>)GENERATED FROM PYTHON SOURCE LINES 83-101<30><31><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjJubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hj5hhh&h'h(K~ubeh}<7D>(h]<5D><>baseline<6E>ah]<5D>h]<5D><>baseline<6E>ah]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(K"ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>Seeded dropout<75>h]<5D>h<11>Seeded dropout<75><74><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjehjchhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj`hhh&h'h(K<>ubhm)<29><>}<7D>(hXCAbove implementation of dropout works fine, but it can be a bit awkward to deal with. Firstly
we need to store the dropout mask for backpropagation. Secondly, dropout state management can get
very tricky when using recompute/checkpointing (e.g. see all the notes about `preserve_rng_state` in
https://pytorch.org/docs/1.9.0/checkpoint.html). In this tutorial we'll describe an alternative implementation
that (1) has a smaller memory footprint; (2) requires less data movement; and (3) simplifies the management
of persisting randomness across multiple invocations of the kernel.<2E>h]<5D>(hX Above implementation of dropout works fine, but it can be a bit awkward to deal with. Firstly
we need to store the dropout mask for backpropagation. Secondly, dropout state management can get
very tricky when using recompute/checkpointing (e.g. see all the notes about <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hX Above implementation of dropout works fine, but it can be a bit awkward to deal with. Firstly
we need to store the dropout mask for backpropagation. Secondly, dropout state management can get
very tricky when using recompute/checkpointing (e.g. see all the notes about <20>hjqhhh&Nh(Nubh <09>title_reference<63><65><EFBFBD>)<29><>}<7D>(h<05>`preserve_rng_state`<60>h]<5D>h<11>preserve_rng_state<74><65><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj|ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%jzhjqubh<11> in
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05> in
<EFBFBD>hjqhhh&Nh(Nubh <09> reference<63><65><EFBFBD>)<29><>}<7D>(h<05>.https://pytorch.org/docs/1.9.0/checkpoint.html<6D>h]<5D>h<11>.https://pytorch.org/docs/1.9.0/checkpoint.html<6D><6C><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refuri<72>j<EFBFBD>uh%j<>hjqubh<11><>). In this tutorial well describe an alternative implementation
that (1) has a smaller memory footprint; (2) requires less data movement; and (3) simplifies the management
of persisting randomness across multiple invocations of the kernel.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05><>). In this tutorial we'll describe an alternative implementation
that (1) has a smaller memory footprint; (2) requires less data movement; and (3) simplifies the management
of persisting randomness across multiple invocations of the kernel.<2E>hjqhhh&Nh(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj`hhubhm)<29><>}<7D>(hXvPseudorandom number generation in Triton is simple! In this tutorial we will use the
:code:`triton.language.rand` function which generates a block of uniformly distributed :code:`float32`
values in [0, 1), given a seed and a block of :code:`int32` offsets. But if you need it, Triton also provides
other :ref:`random number generation strategies <Random Number Generation>`.<2E>h]<5D>(h<11>UPseudorandom number generation in Triton is simple! In this tutorial we will use the
<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>UPseudorandom number generation in Triton is simple! In this tutorial we will use the
<EFBFBD>hj<>hhh&Nh(Nubh <09>literal<61><6C><EFBFBD>)<29><>}<7D>(h<05>:code:`triton.language.rand`<60>h]<5D>h<11>triton.language.rand<6E><64><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>triton.language.rand<6E>hj<>ubah}<7D>(h]<5D>h]<5D><>code<64>ah]<5D>h]<5D>h!]<5D>uh%j<>hj<>ubh<11>; function which generates a block of uniformly distributed <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>; function which generates a block of uniformly distributed <20>hj<>hhh&Nh(Nubj<62>)<29><>}<7D>(h<05>:code:`float32`<60>h]<5D>h<11>float32<33><32><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>float32<33>hj<>ubah}<7D>(h]<5D>h]<5D>j<EFBFBD>ah]<5D>h]<5D>h!]<5D>uh%j<>hj<>ubh<11>/
values in [0, 1), given a seed and a block of <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>/
values in [0, 1), given a seed and a block of <20>hj<>hhh&Nh(Nubj<62>)<29><>}<7D>(h<05> :code:`int32`<60>h]<5D>h<11>int32<33><32><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>int32<33>hj<>ubah}<7D>(h]<5D>h]<5D>j<EFBFBD>ah]<5D>h]<5D>h!]<5D>uh%j<>hj<>ubh<11>9 offsets. But if you need it, Triton also provides
other <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>9 offsets. But if you need it, Triton also provides
other <20>hj<>hhh&Nh(Nubhx)<29><>}<7D>(h<05>E:ref:`random number generation strategies <Random Number Generation>`<60>h]<5D>h~)<29><>}<7D>(hj<>h]<5D>h<11>#random number generation strategies<65><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>(h<><68>std<74><64>std-ref<65>eh]<5D>h]<5D>h!]<5D>uh%h}hj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refdoc<6F>h<EFBFBD><68> refdomain<69>j<00>reftype<70><65>ref<65><66> refexplicit<69><74><EFBFBD>refwarn<72><6E>h<EFBFBD><68>random number generation<6F>uh%hwh&h'h(K<>hj<>ubh<11>.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>.<2E>hj<>hhh&Nh(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj`hhubhg)<29><>}<7D>(h<05>^Triton's implementation of PRNG is based on the Philox algorithm (described on [SALMON2011]_).<2E>h]<5D>hm)<29><>}<7D>(hj h]<5D>(h<11>QTritons implementation of PRNG is based on the Philox algorithm (described on <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>OTriton's implementation of PRNG is based on the Philox algorithm (described on <20>hj"ubhx)<29><>}<7D>(h<05>
SALMON2011<EFBFBD>h]<5D>h~)<29><>}<7D>(hj,h]<5D>h<11> [SALMON2011]<5D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj.ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h}hj*ubah}<7D>(h]<5D><>id2<64>ah]<5D>h]<5D>h]<5D>h!]<5D><> refdomain<69>j}<00>reftype<70>j<00> reftarget<65>j,<00>refwarn<72><6E><EFBFBD>support_smartquotes<65><73>uh%hwh&h'h(K<>hj"ubh<11>).<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>).<2E>hj"ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hfhj`hhh&h'h(Nubhm)<29><>}<7D>(h<05>Let's put it all together.<2E>h]<5D>h<11>Lets put it all together.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjZhjXhhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj`hhubh )<29><>}<7D>(h<05>*GENERATED FROM PYTHON SOURCE LINES 101-149<34>h]<5D>h<11>*GENERATED FROM PYTHON SOURCE LINES 101-149<34><39><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjfubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hj`hhh&h'h(K<>ubj)<29><>}<7D>(hXU@triton.jit
def _seeded_dropout(
x_ptr,
output_ptr,
n_elements,
p,
seed,
BLOCK_SIZE: tl.constexpr,
):
# compute memory offsets of elements handled by this instance
pid = tl.program_id(axis=0)
block_start = pid * BLOCK_SIZE
offsets = block_start + tl.arange(0, BLOCK_SIZE)
# load data from x
mask = offsets < n_elements
x = tl.load(x_ptr + offsets, mask=mask)
# randomly prune it
random = tl.rand(seed, offsets)
x_keep = random > p
# write-back
output = tl.where(x_keep, x / (1 - p), 0.0)
tl.store(output_ptr + offsets, output, mask=mask)
def seeded_dropout(x, p, seed):
output = torch.empty_like(x)
assert x.is_contiguous()
n_elements = x.numel()
grid = lambda meta: (triton.cdiv(n_elements, meta['BLOCK_SIZE']),)
_seeded_dropout[grid](x, output, n_elements, p, seed, BLOCK_SIZE=1024)
return output
x = torch.randn(size=(10,)).cuda()
# Compare this to the baseline - dropout mask is never instantiated!
output = seeded_dropout(x, p=0.5, seed=123)
output2 = seeded_dropout(x, p=0.5, seed=123)
output3 = seeded_dropout(x, p=0.5, seed=512)
print(tabulate.tabulate([
["input"] + x.tolist(),
["output (seed = 123)"] + output.tolist(),
["output (seed = 123)"] + output2.tolist(),
["output (seed = 512)"] + output3.tolist()
]))<29>h]<5D>hXU@triton.jit
def _seeded_dropout(
x_ptr,
output_ptr,
n_elements,
p,
seed,
BLOCK_SIZE: tl.constexpr,
):
# compute memory offsets of elements handled by this instance
pid = tl.program_id(axis=0)
block_start = pid * BLOCK_SIZE
offsets = block_start + tl.arange(0, BLOCK_SIZE)
# load data from x
mask = offsets < n_elements
x = tl.load(x_ptr + offsets, mask=mask)
# randomly prune it
random = tl.rand(seed, offsets)
x_keep = random > p
# write-back
output = tl.where(x_keep, x / (1 - p), 0.0)
tl.store(output_ptr + offsets, output, mask=mask)
def seeded_dropout(x, p, seed):
output = torch.empty_like(x)
assert x.is_contiguous()
n_elements = x.numel()
grid = lambda meta: (triton.cdiv(n_elements, meta['BLOCK_SIZE']),)
_seeded_dropout[grid](x, output, n_elements, p, seed, BLOCK_SIZE=1024)
return output
x = torch.randn(size=(10,)).cuda()
# Compare this to the baseline - dropout mask is never instantiated!
output = seeded_dropout(x, p=0.5, seed=123)
output2 = seeded_dropout(x, p=0.5, seed=123)
output3 = seeded_dropout(x, p=0.5, seed=512)
print(tabulate.tabulate([
["input"] + x.tolist(),
["output (seed = 123)"] + output.tolist(),
["output (seed = 123)"] + output2.tolist(),
["output (seed = 512)"] + output3.tolist()
]))<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjtubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$j&<00>j'<00>default<6C>j)}<7D>uh%jh&h'h(K<>hj`hhubhm)<29><>}<7D>(h<05>Out:<3A>h]<5D>h<11>Out:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hj<>hhh&Nh(Nubah}<7D>(h]<5D>h]<5D><>sphx-glr-script-out<75>ah]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj`hhubj)<29><>}<7D>(hX<>------------------- --------- -------- -------- ------- -------- -------- --------- --------- --------- ---------
input -0.952835 0.371721 0.408716 1.42142 0.149397 -0.67086 -0.214186 -0.431969 -0.707878 -0.106434
output (seed = 123) 0 0.743443 0 0 0 -1.34172 0 0 -1.41576 -0.212868
output (seed = 123) 0 0.743443 0 0 0 -1.34172 0 0 -1.41576 -0.212868
output (seed = 512) 0 0 0.817432 2.84284 0 -1.34172 -0.428372 0 0 0
------------------- --------- -------- -------- ------- -------- -------- --------- --------- --------- ---------<2D>h]<5D>hX<>------------------- --------- -------- -------- ------- -------- -------- --------- --------- --------- ---------
input -0.952835 0.371721 0.408716 1.42142 0.149397 -0.67086 -0.214186 -0.431969 -0.707878 -0.106434
output (seed = 123) 0 0.743443 0 0 0 -1.34172 0 0 -1.41576 -0.212868
output (seed = 123) 0 0.743443 0 0 0 -1.34172 0 0 -1.41576 -0.212868
output (seed = 512) 0 0 0.817432 2.84284 0 -1.34172 -0.428372 0 0 0
------------------- --------- -------- -------- ------- -------- -------- --------- --------- --------- ---------<2D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>j<EFBFBD>ah]<5D>h]<5D>h!]<5D>h#h$j&<00>j'<00>none<6E>j)}<7D>uh%jh&h'h(K<>hj`hhubh )<29><>}<7D>(h<05>*GENERATED FROM PYTHON SOURCE LINES 150-153<35>h]<5D>h<11>*GENERATED FROM PYTHON SOURCE LINES 150-153<35><33><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hj`hhh&h'h(K<>ubhm)<29><>}<7D>(h<05><>Et Voilà! We have a triton kernel that applies the same dropout mask provided the seed is the same!
If you'd like explore further applications of pseudorandomness in GPU programming, we encourage you
to explore the `triton/language/random` folder!<21>h]<5D>(h<11><>Et Voilà! We have a triton kernel that applies the same dropout mask provided the seed is the same!
If youd like explore further applications of pseudorandomness in GPU programming, we encourage you
to explore the <20><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05><>Et Voilà! We have a triton kernel that applies the same dropout mask provided the seed is the same!
If you'd like explore further applications of pseudorandomness in GPU programming, we encourage you
to explore the <20>hj<>hhh&Nh(Nubj{)<29><>}<7D>(h<05>`triton/language/random`<60>h]<5D>h<11>triton/language/random<6F><6D><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%jzhj<>ubh<11> folder!<21><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05> folder!<21>hj<>hhh&Nh(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj`hhubh )<29><>}<7D>(h<05>*GENERATED FROM PYTHON SOURCE LINES 155-160<36>h]<5D>h<11>*GENERATED FROM PYTHON SOURCE LINES 155-160<36><30><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hj`hhh&h'h(K<>ubeh}<7D>(h]<5D><>seeded-dropout<75>ah]<5D>h]<5D><>seeded dropout<75>ah]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(K<>ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05> Exercises<65>h]<5D>h<11> Exercises<65><73><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hj<>hhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj<>hhh&h'h(K<>ubh <09>enumerated_list<73><74><EFBFBD>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>SExtend the kernel to operate over a matrix and use a vector of seeds - one per row.<2E>h]<5D>hm)<29><>}<7D>(hjh]<5D>h<11>SExtend the kernel to operate over a matrix and use a vector of seeds - one per row.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj<>hhh&h'h(Nubh<62>)<29><>}<7D>(h<05>Add support for striding.<2E>h]<5D>hm)<29><>}<7D>(hjh]<5D>h<11>Add support for striding.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjhjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj<>hhh&h'h(Nubh<62>)<29><>}<7D>(h<05><>(challenge) Implement a kernel for sparse Johnson-Lindenstrauss transform which generates the projection matrix one the fly each time using a seed.
<EFBFBD>h]<5D>hm)<29><>}<7D>(h<05><>(challenge) Implement a kernel for sparse Johnson-Lindenstrauss transform which generates the projection matrix one the fly each time using a seed.<2E>h]<5D>h<11><>(challenge) Implement a kernel for sparse Johnson-Lindenstrauss transform which generates the projection matrix one the fly each time using a seed.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj3hj1ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj-ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hj<>hhh&h'h(Nubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>enumtype<70><65>arabic<69><63>prefix<69>h<06>suffix<69>juh%j<>hj<>hhh&h'h(K<>ubh )<29><>}<7D>(h<05>*GENERATED FROM PYTHON SOURCE LINES 162-167<36>h]<5D>h<11>*GENERATED FROM PYTHON SOURCE LINES 162-167<36><37><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjOubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%h
hj<>hhh&h'h(K<>ubeh}<7D>(h]<5D><> exercises<65>ah]<5D>h]<5D><> exercises<65>ah]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(K<>ubh<62>)<29><>}<7D>(hhh]<5D>(h<>)<29><>}<7D>(h<05>
References<EFBFBD>h]<5D>h<11>
References<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjjhjhhhh&Nh(Nubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%h<>hjehhh&h'h(K<>ubh j}<00><>)<29><>}<7D>(h<05>sJohn K. Salmon, Mark A. Moraes, Ron O. Dror, and David E. Shaw, "Parallel Random Numbers: As Easy as 1, 2, 3", 2011<31>h]<5D>(h <09>label<65><6C><EFBFBD>)<29><>}<7D>(h<05>
SALMON2011<EFBFBD>h]<5D>h<11>
SALMON2011<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj}ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>support_smartquotes<65><73>uh%j{hjwubhm)<29><>}<7D>(hjyh]<5D>h<11>wJohn K. Salmon, Mark A. Moraes, Ron O. Dror, and David E. Shaw, “Parallel Random Numbers: As Easy as 1, 2, 3”, 2011<31><31><EFBFBD><EFBFBD><EFBFBD>}<7D>(hjyhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hjwubeh}<7D>(h]<5D><>
salmon2011<EFBFBD>ah]<5D>h]<5D><>
salmon2011<EFBFBD>ah]<5D>h!]<5D>j=a<>docname<6D>h<EFBFBD>uh%j}h&h'h(K<>hjehh<03>resolved<65>Kubjv)<29><>}<7D>(h<05><>Nitish Srivastava and Geoffrey Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting", JMLR 2014
2022-02-28 00:41:59 +00:00
<EFBFBD>h]<5D>(j|)<29><>}<7D>(h<05>SRIVASTAVA2014<31>h]<5D>h<11>SRIVASTAVA2014<31><34><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>j<EFBFBD><00>uh%j{hj<>ubhm)<29><>}<7D>(h<05><>Nitish Srivastava and Geoffrey Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting", JMLR 2014<31>h]<5D>h<11><>Nitish Srivastava and Geoffrey Hinton and Alex Krizhevsky and Ilya Sutskever and Ruslan Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, JMLR 2014<31><34><EFBFBD><EFBFBD><EFBFBD>}<7D>(hj<>hj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hj<>ubeh}<7D>(h]<5D><>srivastava2014<31>ah]<5D>h]<5D><>srivastava2014<31>ah]<5D>h!]<5D>jwaj<61>h<>uh%j}h&h'h(K<>hjehhj<>Kubhm)<29><>}<7D>(h<05>A**Total running time of the script:** ( 0 minutes 0.483 seconds)<29>h]<5D>(h <09>strong<6E><67><EFBFBD>)<29><>}<7D>(h<05>%**Total running time of the script:**<2A>h]<5D>h<11>!Total running time of the script:<3A><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%j<>hj<>ubh<11> ( 0 minutes 0.483 seconds)<29><><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05> ( 0 minutes 0.483 seconds)<29>hj<>hhh&Nh(Nubeh}<7D>(h]<5D>h]<5D><>sphx-glr-timing<6E>ah]<5D>h]<5D>h!]<5D>uh%hlh&h'h(K<>hjehhubh<62>)<29><>}<7D>(h<05>I.. _sphx_glr_download_getting-started_tutorials_04-low-memory-dropout.py:<3A>h]<5D>h}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>hČDsphx-glr-download-getting-started-tutorials-04-low-memory-dropout-py<70>uh%h<>h(K<>hjehhh&h'ubhb)<29><>}<7D>(hhh]<5D>h <09> container<65><72><EFBFBD>)<29><>}<7D>(hX=.. container:: sphx-glr-download sphx-glr-download-python
2022-02-09 07:15:50 +00:00
:download:`Download Python source code: 04-low-memory-dropout.py <04-low-memory-dropout.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: 04-low-memory-dropout.ipynb <04-low-memory-dropout.ipynb>`<60>h]<5D>(j<>)<29><>}<7D>(h<05>\:download:`Download Python source code: 04-low-memory-dropout.py <04-low-memory-dropout.py>`<60>h]<5D>hm)<29><>}<7D>(hjh]<5D>h<00>download_reference<63><65><EFBFBD>)<29><>}<7D>(hjh]<5D>j<EFBFBD>)<29><>}<7D>(hjh]<5D>h<11>5Download Python source code: 04-low-memory-dropout.py<70><79><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj ubah}<7D>(h]<5D>h]<5D>(h<><68>download<61>eh]<5D>h]<5D>h!]<5D>uh%j<>hjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refdoc<6F>h<EFBFBD><68> refdomain<69>h<06>reftype<70>j<00> refexplicit<69><74><EFBFBD>refwarn<72><6E>h<EFBFBD><68>04-low-memory-dropout.py<70><79>filename<6D><65>9c9aed78977a4c05741d675a38dde3d7d/04-low-memory-dropout.py<70>uh%jh&h'h(Mhjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Mhj<>ubah}<7D>(h]<5D>h]<5D>(<28>sphx-glr-download<61><64>sphx-glr-download-python<6F>eh]<5D>h]<5D>h!]<5D>uh%j<>hj<>ubj<62>)<29><>}<7D>(h<05>`:download:`Download Jupyter notebook: 04-low-memory-dropout.ipynb <04-low-memory-dropout.ipynb>`<60>h]<5D>hm)<29><>}<7D>(hj7h]<5D>j)<29><>}<7D>(hj7h]<5D>j<EFBFBD>)<29><>}<7D>(hj7h]<5D>h<11>6Download Jupyter notebook: 04-low-memory-dropout.ipynb<6E><62><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj?ubah}<7D>(h]<5D>h]<5D>(h<><68>download<61>eh]<5D>h]<5D>h!]<5D>uh%j<>hj<ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>refdoc<6F>h<EFBFBD><68> refdomain<69>h<06>reftype<70>jI<00> refexplicit<69><74><EFBFBD>refwarn<72><6E>h<EFBFBD><68>04-low-memory-dropout.ipynb<6E>j%<00><bc847dec325798bdc436c4ef5ac8b78a/04-low-memory-dropout.ipynb<6E>uh%jh&h'h(Mhj9ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Mhj5ubah}<7D>(h]<5D>h]<5D>(<28>sphx-glr-download<61><64>sphx-glr-download-jupyter<65>eh]<5D>h]<5D>h!]<5D>uh%j<>hj<>ubeh}<7D>(h]<5D>h]<5D>(<28>sphx-glr-footer<65><72>class<73><73>sphx-glr-footer-example<6C>eh]<5D>h]<5D>h!]<5D>uh%j<>hj<>hhh&Nh(Nubah}<7D>(h]<5D>j<EFBFBD>ah]<5D>h]<5D><>Dsphx_glr_download_getting-started_tutorials_04-low-memory-dropout.py<70>ah]<5D>h!]<5D>h<EFBFBD><68>html<6D>uh%hahhh&h'h(K<>hje<00>expect_referenced_by_name<6D>}<7D>juj<>s<>expect_referenced_by_id<69>}<7D>j<EFBFBD>j<>subhb)<29><>}<7D>(hhh]<5D>hm)<29><>}<7D>(h<05>I`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_<>h]<5D>(j<>)<29><>}<7D>(hj<>h]<5D>h<11>#Gallery generated by Sphinx-Gallery<72><79><EFBFBD><EFBFBD><EFBFBD>}<7D>(h<05>#Gallery generated by Sphinx-Gallery<72>hj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>name<6D><65>#Gallery generated by Sphinx-Gallery<72><79>refuri<72><69> https://sphinx-gallery.github.io<69>uh%j<>hj<>ubh<62>)<29><>}<7D>(h<05># <https://sphinx-gallery.github.io><3E>h]<5D>h}<7D>(h]<5D><>#gallery-generated-by-sphinx-gallery<72>ah]<5D>h]<5D><>#gallery generated by sphinx-gallery<72>ah]<5D>h!]<5D><>refuri<72>j<EFBFBD>uh%h<><68>
referenced<EFBFBD>Khj<>ubeh}<7D>(h]<5D>h]<5D><>sphx-glr-signature<72>ah]<5D>h]<5D>h!]<5D>uh%hlh&h'h(Mhj}hhubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h<EFBFBD><68>html<6D>uh%hahhh&h'h(M hjeubeh}<7D>(h]<5D><>
references<EFBFBD>ah]<5D>h]<5D><>
references<EFBFBD>ah]<5D>h!]<5D>uh%h<>hh<>hhh&h'h(K<>ubeh}<7D>(h]<5D>(<28>low-memory-dropout<75>h<EFBFBD>eh]<5D><>sphx-glr-example-title<6C>ah]<5D>(<28>low-memory dropout<75><74>;sphx_glr_getting-started_tutorials_04-low-memory-dropout.py<70>eh]<5D>h!]<5D>uh%h<>hhhhh&h'h(Kjy}<7D>j<EFBFBD>h<>sj{}<7D>h<EFBFBD>h<EFBFBD>subeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>source<63>h'uh%h<01>current_source<63>N<EFBFBD> current_line<6E>N<EFBFBD>settings<67><73>docutils.frontend<6E><64>Values<65><73><EFBFBD>)<29><>}<7D>(h<>N<EFBFBD> generator<6F>N<EFBFBD> datestamp<6D>N<EFBFBD> source_link<6E>N<EFBFBD>
source_url<EFBFBD>N<EFBFBD> toc_backlinks<6B><73>entry<72><79>footnote_backlinks<6B>K<01> sectnum_xform<72>K<01>strip_comments<74>N<EFBFBD>strip_elements_with_classes<65>N<EFBFBD> strip_classes<65>N<EFBFBD> report_level<65>K<02>
halt_level<EFBFBD>K<05>exit_status_level<65>K<05>debug<75>N<EFBFBD>warning_stream<61>N<EFBFBD> traceback<63><6B><EFBFBD>input_encoding<6E><67> utf-8-sig<69><67>input_encoding_error_handler<65><72>strict<63><74>output_encoding<6E><67>utf-8<><38>output_encoding_error_handler<65>j<EFBFBD><00>error_encoding<6E><67>utf-8<><38>error_encoding_error_handler<65><72>backslashreplace<63><65> language_code<64><65>en<65><6E>record_dependencies<65>N<EFBFBD>config<69>N<EFBFBD> id_prefix<69>h<06>auto_id_prefix<69><78>id<69><64> dump_settings<67>N<EFBFBD>dump_internals<6C>N<EFBFBD>dump_transforms<6D>N<EFBFBD>dump_pseudo_xml<6D>N<EFBFBD>expose_internals<6C>N<EFBFBD>strict_visitor<6F>N<EFBFBD>_disable_config<69>N<EFBFBD>_source<63>h'<27> _destination<6F>N<EFBFBD> _config_files<65>]<5D><>pep_references<65>N<EFBFBD> pep_base_url<72><6C> https://www.python.org/dev/peps/<2F><>pep_file_url_template<74><65>pep-%04d<34><64>rfc_references<65>N<EFBFBD> rfc_base_url<72><6C>https://tools.ietf.org/html/<2F><> tab_width<74>K<08>trim_footnote_reference_space<63><65><EFBFBD>file_insertion_enabled<65><64><EFBFBD> raw_enabled<65>K<01>syntax_highlight<68><74>long<6E><67> smart_quotes<65><73><EFBFBD>smartquotes_locales<65>]<5D><>character_level_inline_markup<75><70><EFBFBD>doctitle_xform<72><6D><EFBFBD> docinfo_xform<72>K<01>sectsubtitle_xform<72><6D><EFBFBD>embed_stylesheet<65><74><EFBFBD>cloak_email_addresses<65><73><EFBFBD>env<6E>Nub<75>reporter<65>N<EFBFBD>indirect_targets<74>]<5D><>substitution_defs<66>}<7D><>substitution_names<65>}<7D><>refnames<65>}<7D>(<28>srivastava2014<31>]<5D>h <09>citation_reference<63><65><EFBFBD>)<29><>}<7D>(h<05>[SRIVASTAVA2014]_<>h]<5D>h<11>SRIVASTAVA2014<31><34><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj(ubah}<7D>(h]<5D>jwah]<5D>h]<5D>h]<5D>h!]<5D>h<EFBFBD>j<EFBFBD>uh%j&hjFj<>Kuba<62>
salmon2011<EFBFBD>]<5D>j')<29><>}<7D>(h<05> [SALMON2011]_<>h]<5D>h<11>
SALMON2011<EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj8ubah}<7D>(h]<5D>j=ah]<5D>h]<5D>h]<5D>h!]<5D>h<EFBFBD>j<EFBFBD>uh%j&hj"j<>Kubau<61>refids<64>}<7D>(h<>]<5D>h<EFBFBD>aj<61>]<5D>j<EFBFBD>au<61>nameids<64>}<7D>(j<>h<>j<EFBFBD>j<>j]jZj<>j<>jbj_j<>j<>j<>j<>j<>j<>juj<>j<>j<>u<> nametypes<65>}<7D>(j<><00>j<EFBFBD>Nj]Nj<4E>NjbNj<4E>Nj<4E><00>j<EFBFBD><00>ju<00>j<EFBFBD><00>uh}<7D>(h<>h<EFBFBD>j<EFBFBD>h<>jZj5jwj(j<>j`j=j8j_j<>j<>jej<>jwj<>j<>j<>j<>j<>j<>u<> footnote_refs<66>}<7D><> citation_refs<66>}<7D>(j$]<5D>j(aj6]<5D>j8au<61> autofootnotes<65>]<5D><>autofootnote_refs<66>]<5D><>symbol_footnotes<65>]<5D><>symbol_footnote_refs<66>]<5D><> footnotes<65>]<5D><> citations<6E>]<5D>(jwj<>e<>autofootnote_start<72>K<01>symbol_footnote_start<72>K<00>
id_counter<EFBFBD><EFBFBD> collections<6E><73>Counter<65><72><EFBFBD>}<7D>j<EFBFBD>Ks<><73>R<EFBFBD><52>parse_messages<65>]<5D>(h <09>system_message<67><65><EFBFBD>)<29><>}<7D>(hhh]<5D>(hm)<29><>}<7D>(h<05>Title underline too short.<2E>h]<5D>h<11>Title underline too short.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjqubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhjnubj)<29><>}<7D>(h<05>$Low-Memory Dropout
=================<3D>h]<5D>h<11>$Low-Memory Dropout
=================<3D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%jhjnh&h'ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>level<65>K<02>type<70><65>WARNING<4E><47>line<6E>K<16>source<63>h'uh%jlhh<>hhh&h'h(Kubjm)<29><>}<7D>(hhh]<5D>(hm)<29><>}<7D>(hhh]<5D>h<11>Title underline too short.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhj<>ubj)<29><>}<7D>(h<05>Seeded dropout
-------------<2D>h]<5D>h<11>Seeded dropout
-------------<2D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%jhj<>ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>level<65>K<02>type<70>j<EFBFBD><00>line<6E>K<EFBFBD><4B>source<63>h'uh%jlubjm)<29><>}<7D>(hhh]<5D>(hm)<29><>}<7D>(h<05>Title underline too short.<2E>h]<5D>h<11>Title underline too short.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhj<>ubj)<29><>}<7D>(h<05>Seeded dropout
-------------<2D>h]<5D>h<11>Seeded dropout
-------------<2D><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>h#h$uh%jhj<>h&h'ubeh}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>level<65>K<02>type<70>j<EFBFBD><00>line<6E>K<EFBFBD><4B>source<63>h'uh%jlhj`hhh&h'h(K<>ube<62>transform_messages<65>]<5D>(jm)<29><>}<7D>(hhh]<5D>hm)<29><>}<7D>(hhh]<5D>h<11>aHyperlink target "sphx-glr-getting-started-tutorials-04-low-memory-dropout-py" is not referenced.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhj<>ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>level<65>K<01>type<70><65>INFO<46><4F>source<63>h'<27>line<6E>Kuh%jlubjm)<29><>}<7D>(hhh]<5D>hm)<29><>}<7D>(hhh]<5D>h<11>jHyperlink target "sphx-glr-download-getting-started-tutorials-04-low-memory-dropout-py" is not referenced.<2E><><EFBFBD><EFBFBD><EFBFBD>}<7D>(hhhj ubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D>uh%hlhjubah}<7D>(h]<5D>h]<5D>h]<5D>h]<5D>h!]<5D><>level<65>K<01>type<70>j<00>source<63>h'<27>line<6E>K<EFBFBD>uh%jlube<62> transformer<65>N<EFBFBD>
decoration<EFBFBD>Nhhub.