This PR does the following things:
- Code refactoring on Load and Store op codegen, rewrite with same logic
and share much code
- Support the vectorized load/store
The purpose of this PR is analyzing shared memory aliases so that we can
fix memory allocation bugs and save memory allocations in triton code
involving complex control flows.
Changes to memory bar and allocation are on the way.
Co-authored-by: Philippe Tillet <phil@openai.com>