Currently Triton returns tensors with the input types rather than i32 when doing reduce argmax/argmin.
triton-mlir