[Triton-MLIR] Replace triton.extract_slice with tensor.extract_slice and support more general tensor slicing (#837)
## Features - Allow taking a block of tensor slice, as long as each dimension is contiguous (unit stride). - Fix some problems in `insert_slice_async`'s semantic. - More general verification for ops that return shared layout encoding. ## Known Limitations - `insert_slice_async` still uses the old semantic. May submit another PR later to support similar semantic like `tensor.extract_slice`. - No encoding verification for `tensor.extract_slice`. - 3d tensor ops are broken. - Strided accesses are not allowed. - May cause a little performance slowdown since we are passing strides as values but not constants (e.g., int). It would be difficult to pass strides as attributes when we have control flows. A block argument is possible to accept tensors with different strides.
This commit is contained in:
@@ -137,7 +137,7 @@ def kernel(X0, X1, Y, BLOCK: tl.constexpr):
|
||||
# reference result
|
||||
|
||||
if expr == "cdiv":
|
||||
y_ref = (x0 + x1 - 1) // x1
|
||||
y_ref = torch.div(x0 + x1 - 1, x1, rounding_mode='trunc')
|
||||
elif expr == "umulhi":
|
||||
y_ref = ((x0.to(torch.int64) * x1) >> 32).to(torch.int32)
|
||||
else:
|
||||
|
Reference in New Issue
Block a user