triton

Author	SHA1	Message	Date
Philippe Tillet	33e6f0df7f	[DRIVER] Bumped CUDA requirement to 11.4+. This is to avoid bad performance surprises as older `ptxas` are much slower. (#769 ) This also makes codegen simpler by avoiding special handling of eviction policies	2022-10-12 12:02:30 -07:00
Philippe Tillet	af76c989eb	[RUNTIME] Make entry point cache key depend on triton version hash (#765 )	2022-10-11 13:24:30 -07:00
Bin Bao	09cc2d454b	[FRONTEND] Fix a bool tensor storing problem (#746 )	2022-10-10 12:11:50 -07:00
Felipe Petroski Such	5d4b26d380	[RUNTIME] support multiple devices in the same process (#757 )	2022-10-09 20:30:04 -07:00
Chris	9a11a567ce	[DOCS] Fixed typos in 01-vector-add.py (#751 )	2022-10-09 18:12:46 -07:00
Keren Zhou	11345e9b74	[RUNTIME] Add callback functions for external tools (#738 )	2022-10-05 14:46:55 -07:00
Philippe Tillet	bdfdb9a1d2	[RUNTIME] Fixed JIT bug that leg some constexpr values to be overriden by specialization parameters (#742 )	2022-10-05 11:00:32 -07:00
shenggan	77c752dc78	[RUNTIME] remove fixed cu_include_dir (#739 ) Use environment variable `CUDA_HOME` with default value`/usr/local/cuda` for `cu_include_dir` #731	2022-10-04 19:49:57 -07:00
Natalia Gimelshein	d3c925db8a	[FRONTEND] properly broadcast scalar where condition (#736 )	2022-10-04 12:44:03 -07:00
fdrocha	2b0f877fad	[RUNTIME] Support environments with multiple cudalibs (#733 )	2022-10-03 18:36:24 +00:00
Keren Zhou	4a2d3b7d79	[RUNTIME] Dump llvm, ttir, and sass to help debugging (#732 )	2022-10-03 00:39:52 +00:00
Natalia Gimelshein	f55960e773	[FRONTEND] fix broadcasting for where (#729 ) Fixes #532, all 3 inputs to where have to be broadcast together.	2022-10-01 13:18:47 -07:00
Phil Tillet	b244db06da	[TUTORIALS] Attention tutorial fixup	2022-09-30 19:31:43 -07:00
Shintaro Iwasaki	7b61303ea1	[CODEGEN] Fix extract_N_bufferable in layout analysis (#728 )	2022-09-30 12:21:22 -07:00
Shintaro Iwasaki	ae59f51c2d	[CODEGEN] Fix an inliner to call a function with a phi-node (#727 )	2022-09-29 21:36:40 -07:00
albanD	f45e31ba7c	[FRONTEND] Make sure to hold the gil when creating python objects (#726 ) Without this patch, a debug version of python complains that: ``` Fatal Python error: Python memory allocator called without holding the GIL Python runtime state: initialized ```	2022-09-29 18:06:22 -07:00
Philippe Tillet	dad97528b2	[TESTING] allclose fixup (#724 )	2022-09-28 22:49:05 +00:00
Jason Ansel	998fd5f9af	[FRONTEND] Make triton.compile work without a cuda context (#708 ) This allows compiling in a subprocess. I'm not seeing a ton of speedup from this, but figure it is a good change anyway.	2022-09-24 13:41:47 -07:00
Shintaro Iwasaki	3ac929b48b	[BUILD] Download pybind11 in setup.py (#703 ) Based on the discussion in #700, this PR enables downloading pybind11 in `setup.py` without `git submodule` instead of copy-pasting pybind11 code. The downloaded pybind11 will be in `~/.triton/pybind` (like `llvm`).	2022-09-23 15:54:07 -07:00
Jason Ansel	579c03615d	[FRONTEND] Reduce number of compiles in JITFunction (#704 ) I suspect this was the cause of the "new compiles even on a warm cache" behavior I was seeing, though haven't 100% confirmed it. Python `set()` iteration order is nondeterministic when you create a new process. So the same args could produce different `instance_descriptor`s and have false cache misses.	2022-09-23 21:44:52 +00:00
Philippe Tillet	25e1b36785	Revert "[pybind11] Use git-submodule for pybind11" (#701 ) Reverts openai/triton#699	2022-09-23 12:25:38 -07:00
Shintaro Iwasaki	61d104ab3a	[FRONTEND] Use git-submodule for pybind11 (#699 ) This PR changes the `pybind11` source code management from copy-paste to a package controlled by git-submodule. See the discussion in #694 for details.	2022-09-23 09:55:03 -07:00
Philippe Tillet	8c3d4d5749	[RUNTIME] now decoupling entry point from cubin (#696 )	2022-09-22 16:44:22 -07:00
Shintaro Iwasaki	df67068bb0	[pybind11] Update pybind11 to 2.10.0 (#691 ) This PR updates the version of pybind11 to 2.10.0 (the latest stable).	2022-09-21 20:18:02 -07:00
Philippe Tillet	677ddae618	[FRONTEND] Add warmup for triton.jit() (#684 ) This revives #671 , removing the static functions that may unnecessarily hold a reference to the grid and the JITFunction object Co-authored-by: Jason Ansel <jansel@jansel.net>	2022-09-21 19:13:20 +00:00
Jason Ansel	6abe813d1c	Fix issue breaking cudagraphs (#685 ) @ngimel figured this one out. The errors we were seeing from cudagraphs capture were coming from `cuStreamGetCtx` which is not allowed while a stream is capturing. It appears the result of `cuStreamGetCtx()` isn't even used, so I believe it can just be removed.	2022-09-21 10:20:48 -07:00
Philippe Tillet	e318185eb4	[DOCS] Improved README.md wording (#683 ) Initial wording dates from a time where nobody knew Triton, and comparing it to CUDA helped differentiate it from other existing DSLs. But nowadays this comparison doesn't make much sense; Triton is its own thing, and some people may even still be more productive in CUDA than Triton -- language preferences are subjective after all.	2022-09-20 18:09:43 -07:00
Philippe Tillet	7dc2a70edb	Revert "Add .warmup() for triton.jit()" (#682 ) Reverts openai/triton#671 It seems like for some reason this caused out-of-memory errors on some of our internal workloads. I'm reverting this so that HEAD can be used in production at OpenAI, and I will work on digging into this issue asynchronously.	2022-09-20 16:05:14 -07:00
Philippe Tillet	48f30550f1	[FRONTEND] Now using raw compiler syscalls when possible (#678 )	2022-09-19 21:01:36 -07:00
Jason Ansel	93b1adc53b	[FRONTEND] Add .warmup() for triton.jit() (#671 )	2022-09-18 23:09:34 -07:00
Phil Tillet	82956e5d6b	[PACKAGING] Added missing package	2022-09-18 17:34:05 -07:00
Philippe Tillet	2baf333d44	[DOCS] Fixed typos (#670 )	2022-09-18 17:13:12 -07:00
Jason Ansel	49f6bc3f2b	[FRONTEND] Fix filename too long error in new runtime (#669 )	2022-09-18 21:26:29 +00:00
Phil Tillet	00f4ef6958	[CI] wheel/docs workflows now only run on V100 machine	2022-09-18 13:28:35 -07:00
Jason Ansel	e647402fd3	Fix warning in generated C code (#667 )	2022-09-18 12:57:32 -07:00
Philippe Tillet	4a77dfb042	[FRONTEND] Complete rewrite of the runtime (#644 ) This PR completely rewrites the runtime of Triton to be more lean and clearly separate the compilation step from the just-in-time caching logic. This should substantially reduce launch overhead.	2022-09-18 08:51:48 -07:00
Ian Bearman	889d9e34a1	[REPO] update gitignore (#666 ) Update `.gitignore` to include `.vs` and `.vscode`	2022-09-17 14:25:28 -07:00
Shintaro Iwasaki	c668d6596e	[DOCS] Fix spelling (#664 ) This PR applies minor spelling fix in comments and string literals to `master`. It shouldn't hurt anything.	2022-09-16 12:26:40 -07:00
Sophia Wisdom	4580a04710	[FRONTEND] Improve error message for CPU tensors (#654 ) Redo of #651 against master. Fixes #525 by catching CUDA error when we check pytorch tensor size and rethrowing a more informative error that says why we failed.	2022-09-14 14:26:42 -07:00
Philippe Tillet	cfbbc7b43a	[CI] Added V100 tag to disambiguate self-hosted runners (#653 )	2022-09-14 13:47:50 -07:00
Yunxing Dai	59a8e25f43	[DOCS] Fix typo (#650 )	2022-09-14 12:17:05 -07:00
Da Yan	437ced38c2	fp8 <> bf16 conversion (#637 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2022-08-30 14:20:12 -07:00
Da Yan	210a296699	[BACKEND] bf16 flash-attention (#636 )	2022-08-26 20:40:55 -07:00
Daniil Fukalov	fe0c29b9ec	Fix inconsistent struct declaration instead of class. (#632 ) Looks like typo.	2022-08-26 16:20:21 -07:00
Phil Wang	7394d732ad	[DOCS] support for variable head dimensions in flash attention triton tutorial (#623 )	2022-08-15 19:16:49 -07:00
Da Yan	3e2953f357	Allow multiple_of and max_contiguous to accept n-d values (#617 )	2022-08-10 09:59:32 -07:00
Daniil Fukalov	cc79376222	Fix deprectaion warning on CreateGEP(Value , ArrayRef<Value >, const Twine &) (#608 ) This variant of CreateGEP() is already removed in LLVM 14.	2022-08-07 17:10:18 -07:00
Daniil Fukalov	7b91c7befd	Fix "warning: control reaches end of non-void function". (#607 )	2022-08-02 16:12:48 -07:00
Sharad Vikram	968f59027e	Expose `module.print` in pybind (#604 )	2022-07-29 21:36:08 -07:00
Anton Kostin	923d468187	Update LICENSE (#602 )	2022-07-25 09:30:03 -07:00

1 2 3 4 5 ...

566 Commits