triton

Author	SHA1	Message	Date
Philippe Tillet	8c3d4d5749	[RUNTIME] now decoupling entry point from cubin (#696 )	2022-09-22 16:44:22 -07:00
Shintaro Iwasaki	df67068bb0	[pybind11] Update pybind11 to 2.10.0 (#691 ) This PR updates the version of pybind11 to 2.10.0 (the latest stable).	2022-09-21 20:18:02 -07:00
Philippe Tillet	677ddae618	[FRONTEND] Add warmup for triton.jit() (#684 ) This revives #671 , removing the static functions that may unnecessarily hold a reference to the grid and the JITFunction object Co-authored-by: Jason Ansel <jansel@jansel.net>	2022-09-21 19:13:20 +00:00
Jason Ansel	6abe813d1c	Fix issue breaking cudagraphs (#685 ) @ngimel figured this one out. The errors we were seeing from cudagraphs capture were coming from `cuStreamGetCtx` which is not allowed while a stream is capturing. It appears the result of `cuStreamGetCtx()` isn't even used, so I believe it can just be removed.	2022-09-21 10:20:48 -07:00
Philippe Tillet	e318185eb4	[DOCS] Improved README.md wording (#683 ) Initial wording dates from a time where nobody knew Triton, and comparing it to CUDA helped differentiate it from other existing DSLs. But nowadays this comparison doesn't make much sense; Triton is its own thing, and some people may even still be more productive in CUDA than Triton -- language preferences are subjective after all.	2022-09-20 18:09:43 -07:00
Philippe Tillet	7dc2a70edb	Revert "Add .warmup() for triton.jit()" (#682 ) Reverts openai/triton#671 It seems like for some reason this caused out-of-memory errors on some of our internal workloads. I'm reverting this so that HEAD can be used in production at OpenAI, and I will work on digging into this issue asynchronously.	2022-09-20 16:05:14 -07:00
Philippe Tillet	48f30550f1	[FRONTEND] Now using raw compiler syscalls when possible (#678 )	2022-09-19 21:01:36 -07:00
Jason Ansel	93b1adc53b	[FRONTEND] Add .warmup() for triton.jit() (#671 )	2022-09-18 23:09:34 -07:00
Phil Tillet	82956e5d6b	[PACKAGING] Added missing package	2022-09-18 17:34:05 -07:00
Philippe Tillet	2baf333d44	[DOCS] Fixed typos (#670 )	2022-09-18 17:13:12 -07:00
Jason Ansel	49f6bc3f2b	[FRONTEND] Fix filename too long error in new runtime (#669 )	2022-09-18 21:26:29 +00:00
Phil Tillet	00f4ef6958	[CI] wheel/docs workflows now only run on V100 machine	2022-09-18 13:28:35 -07:00
Jason Ansel	e647402fd3	Fix warning in generated C code (#667 )	2022-09-18 12:57:32 -07:00
Philippe Tillet	4a77dfb042	[FRONTEND] Complete rewrite of the runtime (#644 ) This PR completely rewrites the runtime of Triton to be more lean and clearly separate the compilation step from the just-in-time caching logic. This should substantially reduce launch overhead.	2022-09-18 08:51:48 -07:00
Ian Bearman	889d9e34a1	[REPO] update gitignore (#666 ) Update `.gitignore` to include `.vs` and `.vscode`	2022-09-17 14:25:28 -07:00
Shintaro Iwasaki	c668d6596e	[DOCS] Fix spelling (#664 ) This PR applies minor spelling fix in comments and string literals to `master`. It shouldn't hurt anything.	2022-09-16 12:26:40 -07:00
Sophia Wisdom	4580a04710	[FRONTEND] Improve error message for CPU tensors (#654 ) Redo of #651 against master. Fixes #525 by catching CUDA error when we check pytorch tensor size and rethrowing a more informative error that says why we failed.	2022-09-14 14:26:42 -07:00
Philippe Tillet	cfbbc7b43a	[CI] Added V100 tag to disambiguate self-hosted runners (#653 )	2022-09-14 13:47:50 -07:00
Yunxing Dai	59a8e25f43	[DOCS] Fix typo (#650 )	2022-09-14 12:17:05 -07:00
Da Yan	437ced38c2	fp8 <> bf16 conversion (#637 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2022-08-30 14:20:12 -07:00
Da Yan	210a296699	[BACKEND] bf16 flash-attention (#636 )	2022-08-26 20:40:55 -07:00
Daniil Fukalov	fe0c29b9ec	Fix inconsistent struct declaration instead of class. (#632 ) Looks like typo.	2022-08-26 16:20:21 -07:00
Phil Wang	7394d732ad	[DOCS] support for variable head dimensions in flash attention triton tutorial (#623 )	2022-08-15 19:16:49 -07:00
Da Yan	3e2953f357	Allow multiple_of and max_contiguous to accept n-d values (#617 )	2022-08-10 09:59:32 -07:00
Daniil Fukalov	cc79376222	Fix deprectaion warning on CreateGEP(Value , ArrayRef<Value >, const Twine &) (#608 ) This variant of CreateGEP() is already removed in LLVM 14.	2022-08-07 17:10:18 -07:00
Daniil Fukalov	7b91c7befd	Fix "warning: control reaches end of non-void function". (#607 )	2022-08-02 16:12:48 -07:00
Sharad Vikram	968f59027e	Expose `module.print` in pybind (#604 )	2022-07-29 21:36:08 -07:00
Anton Kostin	923d468187	Update LICENSE (#602 )	2022-07-25 09:30:03 -07:00
Jason Ansel	027321cdcf	[FRONTEND] Make tl.rand() 1-exclusive (#601 )	2022-07-24 17:47:23 -07:00
Jason Ansel	e02e56dc63	[FRONTEND] Add missing rfloordiv (#598 ) * [FRONTEND] Add missing rfloordiv * fix tests	2022-07-23 21:54:12 -07:00
Philippe Tillet	ab56d310dd	[BACKEND][IR] Fixed up internal dtype size for booleans (1bit -> 8bit) (#600 )	2022-07-23 20:08:03 -07:00
Da Yan	f28caddbf8	[FRONTEND] Allow tl.where to select pointers (#595 )	2022-07-21 09:54:27 -07:00
Keren Zhou	af85f5fa46	[FRONTEND] Refresh cache when the source code of outlined functions are changed (#590 )	2022-07-20 17:34:07 -07:00
daadaada	9b2bc88d11	[BACKEND] Better bf16 support (#588 )	2022-07-19 21:22:37 -07:00
Philippe Tillet	86cab58d89	[CI] Changed dev wheel date to UTC time to match CRON schedule (#587 )	2022-07-18 14:54:13 -07:00
Phil Tillet	5b04331dd2	[TUTORIALS] Added more credits in fused attention tutorial	2022-07-13 23:48:58 -07:00
Jason Ansel	0a3f3d5f25	[PACKAGING] Include triton/language/libdevice.10.bc in package data (#582 )	2022-07-13 23:45:27 -07:00
Keren Zhou	4912916c11	[FRONTEND] Added support for element-wise function defined in external LLVM bitcode (e.g., libdevice) (#562 )	2022-07-13 15:52:21 -07:00
Phil Tillet	971f5782b4	[tutorials] Added flash attention credits in tutorial	2022-07-11 18:56:48 -07:00
Philippe Tillet	d5eb9bc230	[tutorial] Added bwd in fused attention example (#579 ) Doesn't work on V100	2022-07-11 15:43:46 -07:00
Jason Ansel	c9a2b9c7d4	[FRONTEND] Add missing args to get_simd_tflops() (#578 )	2022-07-11 14:37:59 -07:00
Philippe Tillet	4a399a7e40	[BACKEND] Fix some bugs (atomics, a segfault...) (#577 ) This should fix #558 , #573 and #574	2022-07-06 20:03:04 -07:00
vesuppi	22105bc33b	[FRONTEND] Added type check in semantic arange (#572 )	2022-07-03 15:25:37 -07:00
Keren Zhou	4bf509889b	[BUILD] Change the default build type to Release (#571 )	2022-07-01 12:17:22 -07:00
Keren Zhou	a74cce375f	[FRONTEND] Raise broadcast error (#555 )	2022-06-30 17:32:07 -07:00
Philippe Tillet	f733327ba4	[BACKEND][CODEGEN] Disabling L2 residency control by default (#570 )	2022-06-29 17:05:13 -07:00
Natalia Gimelshein	1bbb2430d9	[TUTORIALS] adjust heuristics for dwdb kernel (#565 )	2022-06-29 17:00:22 -07:00
Kashif Rasul	1895ceaa2d	[TUTORIAL] Fix f-string for older python (#569 ) fixes issue #568	2022-06-29 09:39:10 -07:00
Philippe Tillet	feb7a2a0dc	[FRONTEND] Hotfix for `store` argument order (#567 )	2022-06-28 00:24:02 -07:00
Philippe Tillet	5b4c8f221e	[BACKEND] Compiler improvements (#557 ) This PR adds several optimization capabilities in the compiler backend: - Now using inline PTX for `tl.store`, making it possible to use things like evict_last - For A100, mma layout can be directly converted to shared memory - For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major. - Fixed liveness analysis; this was broken. - Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop. - `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.	2022-06-27 11:49:19 -07:00

1 2 3 4 5 ...

594 Commits