This PR
- apply minimal modification to decouple the Dot helper related code
from TritonGPUToLLVM.cpp to a separate local header file to make it
easier to share some data structure for Dot
- add some patch necessary for transA and transB
- add some patch necessary for MMA v1 execution in backend