This PR - apply minimal modification to decouple the Dot helper related code from TritonGPUToLLVM.cpp to a separate local header file to make it easier to share some data structure for Dot - add some patch necessary for transA and transB - add some patch necessary for MMA v1 execution in backend