diff --git a/docs/tutorials/custom-operation.rst b/docs/tutorials/custom-operation.rst
index 0412a99f4..30619bef9 100644
--- a/docs/tutorials/custom-operation.rst
+++ b/docs/tutorials/custom-operation.rst
@@ -57,7 +57,8 @@ As you will see, a wrapper for the above Triton function can be created in just
     }
         """
         # create callable kernel for the source-code
-        kernel = triton.kernel(src)
+        # options: 4 warps and a -DTILE=1024
+        kernel = triton.kernel(src, defines = {'TILE': 1024}; num_warps = [4])
 
         # Forward pass
         @staticmethod
@@ -72,11 +73,7 @@ As you will see, a wrapper for the above Triton function can be created in just
             N = x.numel()
             grid = lambda opt: (triton.cdiv(N, opt.d('TILE')), )
             # launch kernel
-            # options: 4 warps and a -DTILE=1024
-            _add.kernel(z, x, y, N, 
-                        grid = grid, 
-                        num_warps = 4,
-                        defines = {'TILE': 1024})
+            _add.kernel(z, x, y, N, grid = grid)
             # return output
             return z
 
diff --git a/docs/tutorials/index.rst b/docs/tutorials/index.rst
index b49e262c1..1cd7548ce 100644
--- a/docs/tutorials/index.rst
+++ b/docs/tutorials/index.rst
@@ -8,4 +8,3 @@ Tutorials
    triton-vs-cuda
    matrix-transposition
    matrix-multiplication
-   putting-it-all-together
diff --git a/docs/tutorials/triton-vs-cuda.rst b/docs/tutorials/triton-vs-cuda.rst
index 4d563a583..c90190313 100644
--- a/docs/tutorials/triton-vs-cuda.rst
+++ b/docs/tutorials/triton-vs-cuda.rst
@@ -97,12 +97,10 @@ Auto-Tuning
 Now assume that you want to tune the above code for different data types, tile sizes and thread block sizes. This is doable in CUDA but would require you to write cumbersome machinery to handle different vector sizes and loop unrolling factors. In Triton, this can be trivially done by adjusting some compilation parameters. For example:
 
 .. code-block:: python
+ 
+  kernel = triton.kernel(src, defines = {'TILE': [256, 512, 1024]}, num_warps = [2, 4, 8])
 
-  _vector_add.kernel(y, x, N, grid=grid, 
-                     defines={'TILE': [256, 512, 1024]},
-                     num_warps = [2, 4, 8])
-
-would benchmark our above triton-code for tile sizes of 256, 512 and 1024 executed with 2, 4 or 8 warps -- and cache the fastest kernel.
+would benchmark our above triton source-code for tile sizes of 256, 512 and 1024 executed with 2, 4 or 8 warps -- and cache the fastest kernel.
 
 =============================
 Going Further