[BACKEND][FRONTEND] Fix problems with test_matmul (#973)

1. Handle induction variable when step is negative
2. Restore async_wait that accidentally deleted
3. Add missing induction variable in prefetch
4. Add device property functions

Co-authored-by: Philippe Tillet <Phil.Tillet@gmail.com>
This commit is contained in:
Keren Zhou
2022-12-10 20:34:58 -08:00
committed by GitHub
parent 24fd953f9a
commit be2f70699c
12 changed files with 217 additions and 58 deletions

View File

@@ -68,7 +68,7 @@ def test_vecadd_scf_mask(shape, num_warps, block_size, iter_size):
@num_elements: number of elements
'''
pid = tl.program_id(axis=0)
for i in range(math.ceil(block_size / iter_size)):
for i in range(tl.cdiv(block_size, iter_size)):
# TODO: a bug here, if put the offset outside the forloop, there will be a GPU mis-aligned error.
offset = pid * block_size + tl.arange(0, iter_size)
x_ptrs = x_ptr + offset