triton

Author	SHA1	Message	Date
Philippe Tillet	4a399a7e40	[BACKEND] Fix some bugs (atomics, a segfault...) (#577 ) This should fix #558 , #573 and #574	2022-07-06 20:03:04 -07:00
Philippe Tillet	5b4c8f221e	[BACKEND] Compiler improvements (#557 ) This PR adds several optimization capabilities in the compiler backend: - Now using inline PTX for `tl.store`, making it possible to use things like evict_last - For A100, mma layout can be directly converted to shared memory - For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major. - Fixed liveness analysis; this was broken. - Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop. - `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.	2022-06-27 11:49:19 -07:00
Jason Ansel	6b9756532f	[BACKEND] Remove print in coalesce.cc (#551 )	2022-06-15 13:13:20 -07:00
Philippe Tillet	8876e53206	[BACKEND] Restored reduction bugfixes	2022-06-03 11:38:52 -07:00
Philippe Tillet	a60374a597	Revert "[BACKEND] Various bug fixes; making reductions faster (#533 )". This is a more stable commit that produce bitwise identical code to earlier versions. Using commits after this one may lead to slightly different numerics	2022-06-03 11:36:06 -07:00
Philippe Tillet	3e7500dfe6	[BACKEND] Various bug fixes; making reductions faster (#533 )	2022-05-31 17:14:44 -07:00
Philippe Tillet	2acaa4d0dd	[LANG] Added support for constexpr (#361 )	2021-10-30 00:32:58 -07:00
Philippe Tillet	2c287544cb	[OPS] Faster and cleaner block-sparse implementation (#311 )	2021-09-27 18:25:16 -07:00
Philippe Tillet	ec2e7b8f48	[CODEGEN] Fixed nasty bug in coalesce pass (#303 )	2021-09-23 17:05:11 -07:00
Philippe Tillet	2849e7a773	[CODEGEN] now re-coalescing before atomics (#298 )	2021-09-22 13:35:53 -07:00
Philippe Tillet	4ff3714d61	[CODEGEN] Various bugfixes and stability improvements in compiler backend (#240 )	2021-08-30 11:50:35 -07:00
Philippe Tillet	083bbd1e8d	[GENERAL] Merged v1.0alpha into master. Added features are: - A100 support via mma.16816 - Thread swizzling for conflict-free shared memory accesses without padding - Complete overhaul of the LLVM code generation in codegen/selection/generator.cc to remove overengineering - Added debugging capabilities in the Python binding - Compilation error for kernels that spill	2021-07-27 12:38:48 -07:00
Philippe Tillet	a8f1b85c5f	[CODEGEN] Removed unnecessary coalescing rematerialization	2021-07-27 12:38:48 -07:00
Philippe Tillet	0516ea96d0	[CODEGEN] Fixed bug that caused missing recoalescing for some transpose operations	2021-07-27 12:38:48 -07:00
Philippe Tillet	6d7cf35123	History prior to this date belonged to the now deprecated ISAAC project, and was deleted to save space	2021-07-27 12:38:38 -07:00

15 Commits