[Triton-MLIR][Backend] Fix the order in linear/delinear and a few bugs in reduce conversion (#851)

1, fix the order in linearize/delinearize, which fix the error of order
in emitIndices;
2, fix the selecting of fast implementation in reduce codegen;
3, fix the redundant barrier in reduce codegen;
4, fix the index mapping of the second round of warp_shuffle in shuffle
version of reduce codegen.

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
This commit is contained in:
goostavz
2022-11-09 02:10:09 +08:00
committed by GitHub
parent 303790da88
commit 080b4addf8
4 changed files with 282 additions and 246 deletions

View File

@@ -97,9 +97,7 @@ reduce2d_configs = [
(op, dtype, shape, axis)
for op in ['sum', 'min', 'max']
for dtype in dtypes
for shape in [(1, 4), (1, 8), (1, 16), (1, 32), (2, 32), (4, 32)]
# TODO: fix and uncomment
#, (4, 128), (32, 64)]
for shape in [(1, 4), (1, 8), (1, 16), (1, 32), (2, 32), (4, 32), (4, 128), (32, 64)]
for axis in [0, 1]
]
@@ -128,7 +126,6 @@ def test_reduce2d(op, dtype, shape, axis):
golden_z = torch.min(x, dim=axis, keepdim=False)[0].to(reduced_dtype)
else:
golden_z = torch.max(x, dim=axis, keepdim=False)[0].to(reduced_dtype)
if dtype.is_floating_point and op == 'sum':
if shape[axis] >= 256:
assert_close(z, golden_z, rtol=0.05, atol=0.1)