This PR merges the new runtime back into the `triton-mlir` branch. This
adds caching and just-in-time compilation functionality to the
triton-mlir project, and paves the way for re-using tests from the
master branch.
This PR does the following things:
- Code refactoring on Load and Store op codegen, rewrite with same logic
and share much code
- Support the vectorized load/store