Yan Chunwei
4464646efb
[Triton-MLIR][BACKEND] Fix masked load store op vector size (#785)
Correct the Load/Store Op's vector size with the mask's alignment
correctly considered.
Some cases:
```mlir
// num_warp = 2
// block_size = 128
func @vecadd_mask_align_16(%a_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %b_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32},
%out_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %n_elements: i32 {tt.divisibility = 16 : i32}) {
// mask = make_range(128) < n_element
}
```
This should get the vec=2 `ld`/`st` instructions.
While the following example
```mlir
// num_warp = 2
// block_size = 128
func @vecadd_mask_align_16(%a_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %b_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32},
%out_ptr: !tt.ptr<f32> {tt.divisibility = 16 : i32}, %n_elements: i32) {
// mask = make_range(128) < n_element
}
```
it should get the vec=1 `ld`/`st` instructions.
2022-10-18 11:43:50 +08:00
..
2022-10-18 11:43:50 +08:00
2022-10-18 11:43:50 +08:00
2022-09-09 12:03:41 -07:00
2022-09-26 16:38:06 -07:00
2022-09-22 03:26:40 +00:00
2022-10-16 21:19:42 -07:00
2022-10-09 10:55:17 -07:00
2022-07-07 16:53:19 -07:00
2022-06-12 15:14:45 +08:00