Reverts openai/triton#694
This PR applies #691 to the Triton-MLIR branch.
* make C++ code compatible with Windows + MSVC * added dlfcn-win32 for cross-platform dlopen * fixed building and pip install on Windows * fixed shared library file name under Windows