Philippe Tillet
15f8e8c3b7
[CODEGEN] Major performance improvements on A100 ( #70 )
...
Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks
2021-02-21 18:19:39 -05:00
Philippe Tillet
bcc5745ea0
[CODEGEN] Fixed bug in atomic_add
2020-11-19 18:19:55 -05:00
Philippe Tillet
8e9d793d11
[CODEGEN] Fixed various issues in alignment inference pass
2020-06-06 11:28:43 -04:00
Philippe Tillet
547434d7f0
[CODEGEN] Fixed bug in alignment inference that prevented vectorization
...
in some cases
2020-06-06 01:13:38 -04:00
Philippe Tillet
a9efb27fde
[CODEGEN][ANALYSIS] bugfix in alignment analysis
2020-05-01 17:38:23 -04:00
Philippe Tillet
f278d9741a
[GENERAL] Merged einsum feature branch. Various feature, performance
...
improvements and bugfixes:
* Added preliminary support for extended Einstein summation in PyTriton
* Significant performance improvement on FP32 kernels containing matrix
multiplication
* Added re-coalescing pass for FP16 kernels containing matrix
multiplication
* Various bugfixes
2020-01-20 12:42:48 -05:00
Philippe Tillet
de6fdd5625
[general] removed useless files and includes
2019-10-20 19:29:48 -04:00
Philippe Tillet
650c43ca07
[codegen] more cleaning
2019-10-07 18:06:54 -04:00
Philippe Tillet
ed1b2bc563
more work on padding
2019-09-27 22:15:30 -04:00
Philippe Tillet
c24d55db23
[codegen] more work on hmma coalescing
2019-09-23 20:38:27 -04:00
Philippe Tillet
43d88154bd
[codegen] cleaning-up / formalizing shared-memory passes
2019-09-20 16:01:12 -04:00
Philippe Tillet
e35be1ddcf
[ir][instruction] added identifier for each instruction
2019-09-19 16:25:36 -04:00
Philippe Tillet
e184bad9a1
[auto-coalesce] more bugfixes
2019-09-16 13:28:23 -04:00
Philippe Tillet
495163e0e8
some more cleaning
2019-09-14 16:53:13 -04:00
Philippe Tillet
0c41bade07
[codegen] basic recoalescing working
2019-09-10 23:25:47 -04:00
Philippe Tillet
32234c2612
ugh
2019-09-08 17:35:24 -04:00
Philippe Tillet
5e03f0a065
[codegen][align] reverted some changes
2019-09-03 15:28:07 -04:00
Philippe Tillet
97fdb5b6be
[tests] added missing files
2019-09-03 12:44:35 -04:00
Philippe Tillet
a842d337c5
[general] various cleaning and bugfix:
...
* added copy1d and copy2d benchmark
* fixed issue in reassociation pass
2019-09-02 23:00:49 -04:00