triton

Author	SHA1	Message	Date
Phil Wang	7394d732ad	[DOCS] support for variable head dimensions in flash attention triton tutorial (#623 )	2022-08-15 19:16:49 -07:00
Philippe Tillet	86cab58d89	[CI] Changed dev wheel date to UTC time to match CRON schedule (#587 )	2022-07-18 14:54:13 -07:00
Phil Tillet	5b04331dd2	[TUTORIALS] Added more credits in fused attention tutorial	2022-07-13 23:48:58 -07:00
Phil Tillet	971f5782b4	[tutorials] Added flash attention credits in tutorial	2022-07-11 18:56:48 -07:00
Philippe Tillet	d5eb9bc230	[tutorial] Added bwd in fused attention example (#579 ) Doesn't work on V100	2022-07-11 15:43:46 -07:00
Philippe Tillet	5b4c8f221e	[BACKEND] Compiler improvements (#557 ) This PR adds several optimization capabilities in the compiler backend: - Now using inline PTX for `tl.store`, making it possible to use things like evict_last - For A100, mma layout can be directly converted to shared memory - For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major. - Fixed liveness analysis; this was broken. - Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop. - `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.	2022-06-27 11:49:19 -07:00