[Triton-MLIR][Backend] Fix the definition of MmaEncodingAttr v1, and the output sequence of DotConversion in MMAv1 (#941)
This commit is contained in:
@@ -293,7 +293,7 @@ partitioned between warps.
|
||||
// -------------------------------- version = 1 --------------------------- //
|
||||
|
||||
For first-gen tensor cores, the implicit warpTileSize is [16, 16].
|
||||
Information about this layout can be found in the official PTX documentation
|
||||
Note: the layout is different from the recommended in PTX ISA
|
||||
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html
|
||||
(mma.884 section, FP32 accumulator).
|
||||
|
||||
@@ -301,29 +301,29 @@ For example, the matrix L corresponding to blockTileSize=[32,16] is:
|
||||
|
||||
warp 0
|
||||
--------------------------------/\-------------------------------
|
||||
[ 0 0 2 2 0 0 2 2 4 4 6 6 4 4 6 6 ]
|
||||
[ 1 1 3 3 1 1 3 3 5 5 7 7 5 5 7 7 ]
|
||||
[ 0 0 2 2 0 0 2 2 4 4 6 6 4 4 6 6 ]
|
||||
[ 1 1 3 3 1 1 3 3 5 5 7 7 5 5 7 7 ]
|
||||
[ 16 16 18 18 16 16 18 18 20 20 22 22 20 20 22 22]
|
||||
[ 17 17 19 19 17 17 19 19 21 21 23 23 21 21 23 23]
|
||||
[ 16 16 18 18 16 16 18 18 20 20 22 22 20 20 22 22]
|
||||
[ 17 17 19 19 17 17 19 19 21 21 23 23 21 21 23 23]
|
||||
[ 8 8 10 10 8 8 10 10 12 12 14 14 12 12 14 14]
|
||||
[ 9 9 11 11 9 9 11 11 13 13 15 15 13 13 15 15]
|
||||
[ ..............................................................
|
||||
[ ..............................................................
|
||||
[ 24 24 26 26 24 24 26 26 28 28 30 30 28 28 30 30]
|
||||
[ 25 25 27 27 25 25 27 27 29 29 31 31 29 29 31 31]
|
||||
[ 0 0 2 2 8 8 10 10 0 0 2 2 8 8 10 10 ]
|
||||
[ 1 1 3 3 9 9 11 11 1 1 3 3 9 9 11 11 ]
|
||||
[ 0 0 2 2 8 8 10 10 0 0 2 2 8 8 10 10 ]
|
||||
[ 1 1 3 3 9 9 11 11 1 1 3 3 9 9 11 11 ]
|
||||
[ 4 4 6 6 12 12 14 14 4 4 6 6 12 12 14 14 ]
|
||||
[ 5 5 7 7 13 13 15 15 5 5 7 7 13 13 15 15 ]
|
||||
[ 4 4 6 6 12 12 14 14 4 4 6 6 12 12 14 14 ]
|
||||
[ 5 5 7 7 13 13 15 15 5 5 7 7 13 13 15 15 ]
|
||||
[ 16 16 18 18 20 20 22 22 16 16 18 18 20 20 22 22 ]
|
||||
[ 17 17 19 19 21 21 23 23 17 17 19 19 21 21 23 23 ]
|
||||
[ 16 16 18 18 20 20 22 22 16 16 18 18 20 20 22 22 ]
|
||||
[ 17 17 19 19 21 21 23 23 17 17 19 19 21 21 23 23 ]
|
||||
[ 24 24 26 26 28 28 30 30 24 24 26 26 28 28 30 30 ]
|
||||
[ 25 25 27 27 29 29 31 31 25 25 27 27 29 29 31 31 ]
|
||||
[ 24 24 26 26 28 28 30 30 24 24 26 26 28 28 30 30 ]
|
||||
[ 25 25 27 27 29 29 31 31 25 25 27 27 29 29 31 31 ]
|
||||
|
||||
warp 1 = warp0 + 32
|
||||
warp 1 = warp0 + 32
|
||||
--------------------------------/\-------------------------------
|
||||
[ 32 32 34 34 32 32 34 34 36 36 38 38 36 36 38 38]
|
||||
[ 33 33 35 35 33 33 35 35 37 37 39 39 37 37 39 39]
|
||||
[ ..............................................................
|
||||
[ ..............................................................
|
||||
[ 56 56 58 58 56 56 58 58 60 60 62 62 60 60 62 62]
|
||||
[ 57 57 59 59 57 57 59 59 61 61 63 63 61 61 63 63]
|
||||
[ 32 32 34 34 40 40 42 42 32 32 34 34 40 40 42 42 ]
|
||||
[ 33 33 35 35 41 41 43 43 33 33 35 35 41 41 43 43 ]
|
||||
[ ............................................................... ]
|
||||
|
||||
|
||||
// -------------------------------- version = 2 --------------------------- //
|
||||
|
||||
|
Reference in New Issue
Block a user