triton

Author	SHA1	Message	Date
Phil Tillet	c98c889d7f	.	2023-01-09 19:08:51 -08:00
Phil Tillet	fc1007278d	.	2023-01-09 18:45:44 -08:00
Phil Tillet	0c101e0c33	.	2023-01-09 16:30:28 -08:00
Phil Tillet	3fefcd78d4	.	2023-01-09 16:29:45 -08:00
Phil Tillet	137e866bd2	more work	2023-01-09 16:20:10 -08:00
Phil Tillet	8ebb593bbb	more work	2023-01-09 15:45:06 -08:00
Phil Tillet	6c750b6856	Added verifier for trans	2023-01-08 14:29:17 -08:00
Phil Tillet	42421fabc5	.	2023-01-06 20:35:57 -08:00
Phil Tillet	600bcefb12	more optimizations	2023-01-06 20:27:49 -08:00
Philippe Tillet	18c7a72973	more pass template	2023-01-06 14:26:06 -08:00
Phil Tillet	a81345f7c1	SinkConversionsFromShared template	2023-01-06 13:01:08 -08:00
Philippe Tillet	874ee11ab5	More optimizations	2023-01-06 11:04:20 -08:00
Philippe Tillet	e6f1a9ad34	commenting dq but not load/store	2023-01-05 23:25:41 -08:00
Philippe Tillet	6f997f4ecb	dq now mma	2023-01-05 21:14:55 -08:00
Phil Tillet	520b69fe70	more reassociation	2023-01-05 16:05:11 -08:00
Phil Tillet	764134ee34	trying to decrease register pressure	2023-01-05 13:02:38 -08:00
Phil Tillet	1bde80b1e8	Added ptx code	2023-01-04 17:23:16 -08:00
Phil Tillet	268d2cd18d	better convert + write-back	2023-01-04 17:12:35 -08:00
Phil Tillet	29a1e20b58	tweak convert + trans	2023-01-04 17:12:28 -08:00
Phil Tillet	36da342893	.	2023-01-04 11:25:03 -08:00
Phil Tillet	e70e1e76b4	swizzling	2023-01-04 11:21:19 -08:00
Phil Tillet	e3c3d9fc65	16 spills	2023-01-04 00:01:22 -08:00
Phil Tillet	ee86ea9c90	100 spills	2023-01-03 20:52:00 -08:00
Phil Tillet	645fa5c1cd	.	2023-01-03 18:34:05 -08:00
Phil Tillet	8df1fa5e5b	Merge remote-tracking branch 'origin/master' into phil/fused-attention-perf-fixup	2023-01-03 18:31:34 -08:00
Keren Zhou	8460ea3df1	[Frontend] Fix import for libdevice (#1028 ) This is a hotfix for issue 1 in https://github.com/openai/triton/issues/1017	2023-01-03 15:48:05 -08:00
Phil Tillet	737e43a627	more tests	2023-01-03 09:48:08 -08:00
Phil Tillet	5c01c567b9	.	2023-01-02 23:13:12 -08:00
Phil Tillet	05920e0b8b	reduced some spilling	2023-01-02 19:28:54 -08:00
Phil Tillet	c11fe351e1	.	2023-01-02 19:16:06 -08:00
Phil Tillet	b246d85fad	trying to figure out spilling root cause	2022-12-30 15:21:00 -08:00
Phil Tillet	4dce8dd709	Merge remote-tracking branch 'origin/master' into phil/fused-attention-perf-fixup	2022-12-30 11:53:49 -08:00
Phil Tillet	7388fb1de9	manual ttgir in bwd pass	2022-12-29 15:53:38 -08:00
fdrocha	194ba103b1	[BUILD] Fixed error when compiling in systems with multiple versions of python installed (#1019 )	2022-12-29 15:10:34 -08:00
Phil Tillet	71e3143eaf	.	2022-12-29 14:40:27 -08:00
Phil Tillet	54ae3e8d6e	cleanup	2022-12-28 13:42:43 -08:00
Phil Tillet	7aba2a60d6	trying out another change	2022-12-27 21:51:51 -08:00
Phil Tillet	eefc9d1274	Added TTGIR kernel	2022-12-27 21:49:28 -08:00
Phil Tillet	0d6e6cf578	trying more things	2022-12-27 20:58:31 -08:00
Philippe Tillet	4182e90862	less math	2022-12-24 00:31:05 -08:00
Keren Zhou	fd2da4aff6	[BACKEND] Support splat constant on the DotOperandLayout (#1008 )	2022-12-22 00:48:46 -08:00
Sharad Vikram	925d3d7f98	[FRONTEND] Export `broadcast` and `broadcast_to` in `triton.language` (#1007 )	2022-12-22 01:57:33 +00:00
Philippe Tillet	033e82060d	.	2022-12-21 14:02:10 -08:00
Phil Tillet	88e572e54d	.	2022-12-21 13:54:30 -08:00
Keren Zhou	b5aafb0dab	[FRONTEND] Fix 3d indexing (#1006 )	2022-12-21 12:52:32 -08:00
Philippe Tillet	20100a7254	Merge `triton-mlir` branch - Complete rewrite of the backend from scratch (#1004 ) This PR merges the `triton-mlir` branch, in which we have been quietly rewriting the Triton backend from scratch to increase maintainability, stability and ultimately performance. Changes to the runtime are minimal, and this new version aims to remain backward-compatible with the previous commit. The legacy backend is now officially deprecated, but can still be accessed via the `legacy-backend` tag. Co-authored-by: Keren Zhou <kerenzhou@openai.com> Co-authored-by: Yan Chunwei <yanchunwei@outlook.com> Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com> Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com> Co-authored-by: Yan Da <dyanab@connect.ust.hk> Co-authored-by: Jun Yang <yangjunpro@gmail.com> Co-authored-by: Ian Bearman <ianb@microsoft.com> Co-authored-by: Jason Ansel <jansel@jansel.net> Co-authored-by: Qingyi Liu <qingyil@nvidia.com> Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com> Co-authored-by: Chenggang Zhao <lyricz@yeah.net> Co-authored-by: ben-zhang-609 <benzh609@gmail.com> Co-authored-by: dongdongl <dongdongl@nvidia.com>	2022-12-21 01:30:50 -08:00
Yang Hau	8650b4d1cb	[DRIVER] Fix typos (#939 )	2022-12-02 11:13:46 -08:00
Crutcher Dunnavant	44f577984d	Fix format double substitution bug: `{i}` => `{{i}}` (#886 ) The previous `{i}` was silently expanding to the `i` from the enumeration loop on `regular_args` (when it wasn't empty).	2022-11-20 11:44:42 -08:00
Crutcher Dunnavant	0e4691e6dd	[FRONTEND] Fix ExternLibrary(format=) bug; type annotate build_extern.py (#883 ) Ran mypy over `build_extern.py`, cleaned up type annotations. Found a fixed a bug where `ExternLibrary(format=)` was being ignored.	2022-11-17 18:45:30 +01:00
Natalia Gimelshein	0d7e753227	[TESTING] use torch.int for autotuning cache (#840 ) For stupid reasons, ops on int8 are 3 times slower than on int, and for another set of stupid reasons we are not using cudaMemset for `zero_`, so using `int8` buffer in `do_bench` makes it slow. Co-authored-by: Philippe Tillet <phil@openai.com>	2022-11-04 18:05:16 -07:00

1 2 3 4 5 ...

451 Commits