Philippe Tillet
15f8e8c3b7
[CODEGEN] Major performance improvements on A100 ( #70 )
...
Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks
2021-02-21 18:19:39 -05:00
Philippe Tillet
de6fdd5625
[general] removed useless files and includes
2019-10-20 19:29:48 -04:00
Philippe Tillet
650c43ca07
[codegen] more cleaning
2019-10-07 18:06:54 -04:00
Philippe Tillet
43d88154bd
[codegen] cleaning-up / formalizing shared-memory passes
2019-09-20 16:01:12 -04:00
Philippe Tillet
e35be1ddcf
[ir][instruction] added identifier for each instruction
2019-09-19 16:25:36 -04:00
Philippe Tillet
8d37a55a21
[codegen][analysis] cleaned-up tiling formalism
2019-09-15 21:14:14 -04:00
Philippe Tillet
495163e0e8
some more cleaning
2019-09-14 16:53:13 -04:00
Philippe Tillet
0c41bade07
[codegen] basic recoalescing working
2019-09-10 23:25:47 -04:00
Philippe Tillet
a842d337c5
[general] various cleaning and bugfix:
...
* added copy1d and copy2d benchmark
* fixed issue in reassociation pass
2019-09-02 23:00:49 -04:00
Philippe Tillet
37cbcfabd0
[examples] back to 96 TFLOPS on V100
2019-08-26 22:49:14 -07:00
Philippe Tillet
732156b942
[general] rename *.cpp -> *.cc
2019-08-23 19:06:39 -07:00