triton

Author	SHA1	Message	Date
Shintaro Iwasaki	ae59f51c2d	[CODEGEN] Fix an inliner to call a function with a phi-node (#727 )	2022-09-29 21:36:40 -07:00
albanD	f45e31ba7c	[FRONTEND] Make sure to hold the gil when creating python objects (#726 ) Without this patch, a debug version of python complains that: ``` Fatal Python error: Python memory allocator called without holding the GIL Python runtime state: initialized ```	2022-09-29 18:06:22 -07:00
Philippe Tillet	dad97528b2	[TESTING] allclose fixup (#724 )	2022-09-28 22:49:05 +00:00
Jason Ansel	998fd5f9af	[FRONTEND] Make triton.compile work without a cuda context (#708 ) This allows compiling in a subprocess. I'm not seeing a ton of speedup from this, but figure it is a good change anyway.	2022-09-24 13:41:47 -07:00
Shintaro Iwasaki	3ac929b48b	[BUILD] Download pybind11 in setup.py (#703 ) Based on the discussion in #700, this PR enables downloading pybind11 in `setup.py` without `git submodule` instead of copy-pasting pybind11 code. The downloaded pybind11 will be in `~/.triton/pybind` (like `llvm`).	2022-09-23 15:54:07 -07:00
Jason Ansel	579c03615d	[FRONTEND] Reduce number of compiles in JITFunction (#704 ) I suspect this was the cause of the "new compiles even on a warm cache" behavior I was seeing, though haven't 100% confirmed it. Python `set()` iteration order is nondeterministic when you create a new process. So the same args could produce different `instance_descriptor`s and have false cache misses.	2022-09-23 21:44:52 +00:00
Philippe Tillet	25e1b36785	Revert "[pybind11] Use git-submodule for pybind11" (#701 ) Reverts openai/triton#699	2022-09-23 12:25:38 -07:00
Shintaro Iwasaki	61d104ab3a	[FRONTEND] Use git-submodule for pybind11 (#699 ) This PR changes the `pybind11` source code management from copy-paste to a package controlled by git-submodule. See the discussion in #694 for details.	2022-09-23 09:55:03 -07:00
Philippe Tillet	8c3d4d5749	[RUNTIME] now decoupling entry point from cubin (#696 )	2022-09-22 16:44:22 -07:00
Shintaro Iwasaki	df67068bb0	[pybind11] Update pybind11 to 2.10.0 (#691 ) This PR updates the version of pybind11 to 2.10.0 (the latest stable).	2022-09-21 20:18:02 -07:00
Philippe Tillet	677ddae618	[FRONTEND] Add warmup for triton.jit() (#684 ) This revives #671 , removing the static functions that may unnecessarily hold a reference to the grid and the JITFunction object Co-authored-by: Jason Ansel <jansel@jansel.net>	2022-09-21 19:13:20 +00:00
Jason Ansel	6abe813d1c	Fix issue breaking cudagraphs (#685 ) @ngimel figured this one out. The errors we were seeing from cudagraphs capture were coming from `cuStreamGetCtx` which is not allowed while a stream is capturing. It appears the result of `cuStreamGetCtx()` isn't even used, so I believe it can just be removed.	2022-09-21 10:20:48 -07:00
Philippe Tillet	7dc2a70edb	Revert "Add .warmup() for triton.jit()" (#682 ) Reverts openai/triton#671 It seems like for some reason this caused out-of-memory errors on some of our internal workloads. I'm reverting this so that HEAD can be used in production at OpenAI, and I will work on digging into this issue asynchronously.	2022-09-20 16:05:14 -07:00
Philippe Tillet	48f30550f1	[FRONTEND] Now using raw compiler syscalls when possible (#678 )	2022-09-19 21:01:36 -07:00
Jason Ansel	93b1adc53b	[FRONTEND] Add .warmup() for triton.jit() (#671 )	2022-09-18 23:09:34 -07:00
Phil Tillet	82956e5d6b	[PACKAGING] Added missing package	2022-09-18 17:34:05 -07:00
Jason Ansel	49f6bc3f2b	[FRONTEND] Fix filename too long error in new runtime (#669 )	2022-09-18 21:26:29 +00:00
Jason Ansel	e647402fd3	Fix warning in generated C code (#667 )	2022-09-18 12:57:32 -07:00
Philippe Tillet	4a77dfb042	[FRONTEND] Complete rewrite of the runtime (#644 ) This PR completely rewrites the runtime of Triton to be more lean and clearly separate the compilation step from the just-in-time caching logic. This should substantially reduce launch overhead.	2022-09-18 08:51:48 -07:00
Shintaro Iwasaki	c668d6596e	[DOCS] Fix spelling (#664 ) This PR applies minor spelling fix in comments and string literals to `master`. It shouldn't hurt anything.	2022-09-16 12:26:40 -07:00
Sophia Wisdom	4580a04710	[FRONTEND] Improve error message for CPU tensors (#654 ) Redo of #651 against master. Fixes #525 by catching CUDA error when we check pytorch tensor size and rethrowing a more informative error that says why we failed.	2022-09-14 14:26:42 -07:00
Yunxing Dai	59a8e25f43	[DOCS] Fix typo (#650 )	2022-09-14 12:17:05 -07:00
Da Yan	437ced38c2	fp8 <> bf16 conversion (#637 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2022-08-30 14:20:12 -07:00
Phil Wang	7394d732ad	[DOCS] support for variable head dimensions in flash attention triton tutorial (#623 )	2022-08-15 19:16:49 -07:00
Da Yan	3e2953f357	Allow multiple_of and max_contiguous to accept n-d values (#617 )	2022-08-10 09:59:32 -07:00
Daniil Fukalov	7b91c7befd	Fix "warning: control reaches end of non-void function". (#607 )	2022-08-02 16:12:48 -07:00
Sharad Vikram	968f59027e	Expose `module.print` in pybind (#604 )	2022-07-29 21:36:08 -07:00
Jason Ansel	027321cdcf	[FRONTEND] Make tl.rand() 1-exclusive (#601 )	2022-07-24 17:47:23 -07:00
Jason Ansel	e02e56dc63	[FRONTEND] Add missing rfloordiv (#598 ) * [FRONTEND] Add missing rfloordiv * fix tests	2022-07-23 21:54:12 -07:00
Da Yan	f28caddbf8	[FRONTEND] Allow tl.where to select pointers (#595 )	2022-07-21 09:54:27 -07:00
Keren Zhou	af85f5fa46	[FRONTEND] Refresh cache when the source code of outlined functions are changed (#590 )	2022-07-20 17:34:07 -07:00
daadaada	9b2bc88d11	[BACKEND] Better bf16 support (#588 )	2022-07-19 21:22:37 -07:00
Philippe Tillet	86cab58d89	[CI] Changed dev wheel date to UTC time to match CRON schedule (#587 )	2022-07-18 14:54:13 -07:00
Phil Tillet	5b04331dd2	[TUTORIALS] Added more credits in fused attention tutorial	2022-07-13 23:48:58 -07:00
Jason Ansel	0a3f3d5f25	[PACKAGING] Include triton/language/libdevice.10.bc in package data (#582 )	2022-07-13 23:45:27 -07:00
Keren Zhou	4912916c11	[FRONTEND] Added support for element-wise function defined in external LLVM bitcode (e.g., libdevice) (#562 )	2022-07-13 15:52:21 -07:00
Phil Tillet	971f5782b4	[tutorials] Added flash attention credits in tutorial	2022-07-11 18:56:48 -07:00
Philippe Tillet	d5eb9bc230	[tutorial] Added bwd in fused attention example (#579 ) Doesn't work on V100	2022-07-11 15:43:46 -07:00
Jason Ansel	c9a2b9c7d4	[FRONTEND] Add missing args to get_simd_tflops() (#578 )	2022-07-11 14:37:59 -07:00
Philippe Tillet	4a399a7e40	[BACKEND] Fix some bugs (atomics, a segfault...) (#577 ) This should fix #558 , #573 and #574	2022-07-06 20:03:04 -07:00
vesuppi	22105bc33b	[FRONTEND] Added type check in semantic arange (#572 )	2022-07-03 15:25:37 -07:00
Keren Zhou	4bf509889b	[BUILD] Change the default build type to Release (#571 )	2022-07-01 12:17:22 -07:00
Keren Zhou	a74cce375f	[FRONTEND] Raise broadcast error (#555 )	2022-06-30 17:32:07 -07:00
Natalia Gimelshein	1bbb2430d9	[TUTORIALS] adjust heuristics for dwdb kernel (#565 )	2022-06-29 17:00:22 -07:00
Kashif Rasul	1895ceaa2d	[TUTORIAL] Fix f-string for older python (#569 ) fixes issue #568	2022-06-29 09:39:10 -07:00
Philippe Tillet	feb7a2a0dc	[FRONTEND] Hotfix for `store` argument order (#567 )	2022-06-28 00:24:02 -07:00
Philippe Tillet	5b4c8f221e	[BACKEND] Compiler improvements (#557 ) This PR adds several optimization capabilities in the compiler backend: - Now using inline PTX for `tl.store`, making it possible to use things like evict_last - For A100, mma layout can be directly converted to shared memory - For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major. - Fixed liveness analysis; this was broken. - Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop. - `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.	2022-06-27 11:49:19 -07:00
Keren Zhou	87413bc925	[BACKEND] Fix layout convert for non-contiguous input (#564 )	2022-06-25 23:12:03 -07:00
Keren Zhou	d345ddf837	[DOCS] Separate atomic cas from other atomic operations since operands are very different (#559 )	2022-06-22 17:51:17 -07:00
Keren Zhou	a428cf0bb2	[FRONTEND] Fix pytorch warning. (#560 ) UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc').	2022-06-20 20:12:09 -07:00

1 2 3 4 5 ...

382 Commits