[CODEGEN] Major performance improvements on A100 (#70)
Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks
This commit is contained in:
committed by
Philippe Tillet
parent
045ab5d62a
commit
5b83259592
@@ -282,6 +282,7 @@ std::string cu_module::compile_llvm_module(std::unique_ptr<llvm::Module> module,
|
||||
|
||||
void cu_module::init_from_ptx(const std::string& ptx) {
|
||||
// JIT compile source-code
|
||||
// std::cout << ptx << std::endl;
|
||||
|
||||
try{
|
||||
// // compile ptx with ptxas
|
||||
|
Reference in New Issue
Block a user