[DOCS] Fix spelling (#664)
This PR applies minor spelling fix in comments and string literals to `master`. It shouldn't hurt anything.
This commit is contained in:
@@ -78,7 +78,7 @@ def softmax_kernel(
|
||||
input_ptrs = row_start_ptr + col_offsets
|
||||
# Load the row into SRAM, using a mask since BLOCK_SIZE may be > than n_cols
|
||||
row = tl.load(input_ptrs, mask=col_offsets < n_cols, other=-float('inf'))
|
||||
# Substract maximum for numerical stability
|
||||
# Subtract maximum for numerical stability
|
||||
row_minus_max = row - tl.max(row, axis=0)
|
||||
# Note that exponentials in Triton are fast but approximate (i.e., think __expf in CUDA)
|
||||
numerator = tl.exp(row_minus_max)
|
||||
|
@@ -18,7 +18,7 @@ You will specifically learn about:
|
||||
# They are notoriously hard to optimize, hence their implementation is generally done by
|
||||
# hardware vendors themselves as part of so-called "kernel libraries" (e.g., cuBLAS).
|
||||
# Unfortunately, these libraries are often proprietary and cannot be easily customized
|
||||
# to accomodate the needs of modern deep learning workloads (e.g., fused activation functions).
|
||||
# to accommodate the needs of modern deep learning workloads (e.g., fused activation functions).
|
||||
# In this tutorial, you will learn how to implement efficient matrix multiplications by
|
||||
# yourself with Triton, in a way that is easy to customize and extend.
|
||||
#
|
||||
|
Reference in New Issue
Block a user