triton

Author	SHA1	Message	Date
Phil Tillet	520b69fe70	more reassociation	2023-01-05 16:05:11 -08:00
Phil Tillet	764134ee34	trying to decrease register pressure	2023-01-05 13:02:38 -08:00
Phil Tillet	268d2cd18d	better convert + write-back	2023-01-04 17:12:35 -08:00
Phil Tillet	36da342893	.	2023-01-04 11:25:03 -08:00
Phil Tillet	645fa5c1cd	.	2023-01-03 18:34:05 -08:00
Phil Tillet	5c01c567b9	.	2023-01-02 23:13:12 -08:00
Phil Tillet	05920e0b8b	reduced some spilling	2023-01-02 19:28:54 -08:00
Phil Tillet	c11fe351e1	.	2023-01-02 19:16:06 -08:00
Phil Tillet	b246d85fad	trying to figure out spilling root cause	2022-12-30 15:21:00 -08:00
Phil Tillet	7388fb1de9	manual ttgir in bwd pass	2022-12-29 15:53:38 -08:00
Phil Tillet	71e3143eaf	.	2022-12-29 14:40:27 -08:00
Phil Tillet	54ae3e8d6e	cleanup	2022-12-28 13:42:43 -08:00
Phil Tillet	eefc9d1274	Added TTGIR kernel	2022-12-27 21:49:28 -08:00
Phil Tillet	0d6e6cf578	trying more things	2022-12-27 20:58:31 -08:00
Philippe Tillet	4182e90862	less math	2022-12-24 00:31:05 -08:00
Philippe Tillet	033e82060d	.	2022-12-21 14:02:10 -08:00
Phil Tillet	88e572e54d	.	2022-12-21 13:54:30 -08:00
Philippe Tillet	20100a7254	Merge `triton-mlir` branch - Complete rewrite of the backend from scratch (#1004 ) This PR merges the `triton-mlir` branch, in which we have been quietly rewriting the Triton backend from scratch to increase maintainability, stability and ultimately performance. Changes to the runtime are minimal, and this new version aims to remain backward-compatible with the previous commit. The legacy backend is now officially deprecated, but can still be accessed via the `legacy-backend` tag. Co-authored-by: Keren Zhou <kerenzhou@openai.com> Co-authored-by: Yan Chunwei <yanchunwei@outlook.com> Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com> Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com> Co-authored-by: Yan Da <dyanab@connect.ust.hk> Co-authored-by: Jun Yang <yangjunpro@gmail.com> Co-authored-by: Ian Bearman <ianb@microsoft.com> Co-authored-by: Jason Ansel <jansel@jansel.net> Co-authored-by: Qingyi Liu <qingyil@nvidia.com> Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com> Co-authored-by: Chenggang Zhao <lyricz@yeah.net> Co-authored-by: ben-zhang-609 <benzh609@gmail.com> Co-authored-by: dongdongl <dongdongl@nvidia.com>	2022-12-21 01:30:50 -08:00
Phil Tillet	b244db06da	[TUTORIALS] Attention tutorial fixup	2022-09-30 19:31:43 -07:00
Phil Wang	7394d732ad	[DOCS] support for variable head dimensions in flash attention triton tutorial (#623 )	2022-08-15 19:16:49 -07:00
Philippe Tillet	86cab58d89	[CI] Changed dev wheel date to UTC time to match CRON schedule (#587 )	2022-07-18 14:54:13 -07:00
Phil Tillet	5b04331dd2	[TUTORIALS] Added more credits in fused attention tutorial	2022-07-13 23:48:58 -07:00
Phil Tillet	971f5782b4	[tutorials] Added flash attention credits in tutorial	2022-07-11 18:56:48 -07:00
Philippe Tillet	d5eb9bc230	[tutorial] Added bwd in fused attention example (#579 ) Doesn't work on V100	2022-07-11 15:43:46 -07:00
Philippe Tillet	5b4c8f221e	[BACKEND] Compiler improvements (#557 ) This PR adds several optimization capabilities in the compiler backend: - Now using inline PTX for `tl.store`, making it possible to use things like evict_last - For A100, mma layout can be directly converted to shared memory - For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major. - Fixed liveness analysis; this was broken. - Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop. - `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.	2022-06-27 11:49:19 -07:00

25 Commits