triton

Author	SHA1	Message	Date
Philippe Tillet	7c09ff80eb	[CORE] Fixed several issues that arose in the development of the torch-blocksparse package: * Now using warp shuffle in reductions when possible * Various bugfixes in layout inference * Added INFINITY, exponential and select * Better error messages for unimplemented constructs	2020-03-31 18:57:28 -04:00
Philippe Tillet	fbf2a3f56f	[CODEGEN][TRANSFORM] some bug-fixes for FP32 einsum	2020-01-20 12:42:53 -05:00
Philippe Tillet	f278d9741a	[GENERAL] Merged einsum feature branch. Various feature, performance improvements and bugfixes: * Added preliminary support for extended Einstein summation in PyTriton * Significant performance improvement on FP32 kernels containing matrix multiplication * Added re-coalescing pass for FP16 kernels containing matrix multiplication * Various bugfixes	2020-01-20 12:42:48 -05:00
Philippe Tillet	de6fdd5625	[general] removed useless files and includes	2019-10-20 19:29:48 -04:00
Philippe Tillet	323c90e431	ugh	2019-10-11 19:05:54 -04:00
Philippe Tillet	856e7baa04	[test] added tests for copy	2019-09-23 12:07:24 -04:00
Philippe Tillet	001973630e	[codegen] cleaned up shared memory and double-buffering logic	2019-09-21 22:21:40 -04:00
Philippe Tillet	43d88154bd	[codegen] cleaning-up / formalizing shared-memory passes	2019-09-20 16:01:12 -04:00