triton

Author	SHA1	Message	Date
Philippe Tillet	20100a7254	Merge `triton-mlir` branch - Complete rewrite of the backend from scratch (#1004 ) This PR merges the `triton-mlir` branch, in which we have been quietly rewriting the Triton backend from scratch to increase maintainability, stability and ultimately performance. Changes to the runtime are minimal, and this new version aims to remain backward-compatible with the previous commit. The legacy backend is now officially deprecated, but can still be accessed via the `legacy-backend` tag. Co-authored-by: Keren Zhou <kerenzhou@openai.com> Co-authored-by: Yan Chunwei <yanchunwei@outlook.com> Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com> Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com> Co-authored-by: Yan Da <dyanab@connect.ust.hk> Co-authored-by: Jun Yang <yangjunpro@gmail.com> Co-authored-by: Ian Bearman <ianb@microsoft.com> Co-authored-by: Jason Ansel <jansel@jansel.net> Co-authored-by: Qingyi Liu <qingyil@nvidia.com> Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com> Co-authored-by: Chenggang Zhao <lyricz@yeah.net> Co-authored-by: ben-zhang-609 <benzh609@gmail.com> Co-authored-by: dongdongl <dongdongl@nvidia.com>	2022-12-21 01:30:50 -08:00
Phil Tillet	b244db06da	[TUTORIALS] Attention tutorial fixup	2022-09-30 19:31:43 -07:00
Phil Wang	7394d732ad	[DOCS] support for variable head dimensions in flash attention triton tutorial (#623 )	2022-08-15 19:16:49 -07:00
Philippe Tillet	86cab58d89	[CI] Changed dev wheel date to UTC time to match CRON schedule (#587 )	2022-07-18 14:54:13 -07:00
Phil Tillet	5b04331dd2	[TUTORIALS] Added more credits in fused attention tutorial	2022-07-13 23:48:58 -07:00
Phil Tillet	971f5782b4	[tutorials] Added flash attention credits in tutorial	2022-07-11 18:56:48 -07:00
Philippe Tillet	d5eb9bc230	[tutorial] Added bwd in fused attention example (#579 ) Doesn't work on V100	2022-07-11 15:43:46 -07:00
Philippe Tillet	5b4c8f221e	[BACKEND] Compiler improvements (#557 ) This PR adds several optimization capabilities in the compiler backend: - Now using inline PTX for `tl.store`, making it possible to use things like evict_last - For A100, mma layout can be directly converted to shared memory - For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major. - Fixed liveness analysis; this was broken. - Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop. - `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.	2022-06-27 11:49:19 -07:00

8 Commits