triton

Author	SHA1	Message	Date
Philippe Tillet	15f8e8c3b7	[CODEGEN] Major performance improvements on A100 (#70 ) Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks	2021-02-21 18:19:39 -05:00
Philippe Tillet	af080740f2	[GENERAL] Merged v1.0alpha into master. Added features are: - A100 support via mma.16816 - Thread swizzling for conflict-free shared memory accesses without padding - Complete overhaul of the LLVM code generation in codegen/selection/generator.cc to remove overengineering - Added debugging capabilities in the Python binding - Compilation error for kernels that spill	2021-01-11 19:23:24 -05:00
Philippe Tillet	2fcf5cec5b	[TRITON][CODEGEN] Fixed flawed assert()	2020-01-24 15:25:00 -05:00
Philippe Tillet	78b98fb7cf	[GENERAL] Cleaned polymorphic structure of layouts analysis pass	2020-01-21 11:38:39 -05:00
Philippe Tillet	382ca2c745	[CODEGEN][ANALYSIS] cleaning: moving towards better polymorphism for tile layouts	2020-01-20 12:43:04 -05:00
Philippe Tillet	f278d9741a	[GENERAL] Merged einsum feature branch. Various feature, performance improvements and bugfixes: * Added preliminary support for extended Einstein summation in PyTriton * Significant performance improvement on FP32 kernels containing matrix multiplication * Added re-coalescing pass for FP16 kernels containing matrix multiplication * Various bugfixes	2020-01-20 12:42:48 -05:00
Philippe Tillet	de6fdd5625	[general] removed useless files and includes	2019-10-20 19:29:48 -04:00
Philippe Tillet	b43454c9b7	[codegen] [membar] view do not write to shared memory	2019-10-17 22:38:41 -04:00
Philippe Tillet	4bfe998cc8	[codegen] [selection] everything is now implemented with visitor	2019-10-16 18:10:03 -04:00
Philippe Tillet	7d77f34db0	[codegen] more cleaning	2019-10-11 23:40:27 -04:00
Philippe Tillet	323c90e431	ugh	2019-10-11 19:05:54 -04:00
Philippe Tillet	ed1b2bc563	more work on padding	2019-09-27 22:15:30 -04:00
Philippe Tillet	575dd06be3	[codegen] more progress towards unified dot implementation	2019-09-26 14:01:28 -04:00
Philippe Tillet	001973630e	[codegen] cleaned up shared memory and double-buffering logic	2019-09-21 22:21:40 -04:00
Philippe Tillet	43d88154bd	[codegen] cleaning-up / formalizing shared-memory passes	2019-09-20 16:01:12 -04:00
Philippe Tillet	e35be1ddcf	[ir][instruction] added identifier for each instruction	2019-09-19 16:25:36 -04:00
Philippe Tillet	495163e0e8	some more cleaning	2019-09-14 16:53:13 -04:00
Philippe Tillet	a842d337c5	[general] various cleaning and bugfix: * added copy1d and copy2d benchmark * fixed issue in reassociation pass	2019-09-02 23:00:49 -04:00

18 Commits