triton

Author	SHA1	Message	Date
Yunxing Dai	59a8e25f43	[DOCS] Fix typo (#650 )	2022-09-14 12:17:05 -07:00
Phil Wang	7394d732ad	[DOCS] support for variable head dimensions in flash attention triton tutorial (#623 )	2022-08-15 19:16:49 -07:00
Keren Zhou	af85f5fa46	[FRONTEND] Refresh cache when the source code of outlined functions are changed (#590 )	2022-07-20 17:34:07 -07:00
Philippe Tillet	86cab58d89	[CI] Changed dev wheel date to UTC time to match CRON schedule (#587 )	2022-07-18 14:54:13 -07:00
Phil Tillet	5b04331dd2	[TUTORIALS] Added more credits in fused attention tutorial	2022-07-13 23:48:58 -07:00
Keren Zhou	4912916c11	[FRONTEND] Added support for element-wise function defined in external LLVM bitcode (e.g., libdevice) (#562 )	2022-07-13 15:52:21 -07:00
Phil Tillet	971f5782b4	[tutorials] Added flash attention credits in tutorial	2022-07-11 18:56:48 -07:00
Philippe Tillet	d5eb9bc230	[tutorial] Added bwd in fused attention example (#579 ) Doesn't work on V100	2022-07-11 15:43:46 -07:00
Natalia Gimelshein	1bbb2430d9	[TUTORIALS] adjust heuristics for dwdb kernel (#565 )	2022-06-29 17:00:22 -07:00
Philippe Tillet	5b4c8f221e	[BACKEND] Compiler improvements (#557 ) This PR adds several optimization capabilities in the compiler backend: - Now using inline PTX for `tl.store`, making it possible to use things like evict_last - For A100, mma layout can be directly converted to shared memory - For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major. - Fixed liveness analysis; this was broken. - Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop. - `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.	2022-06-27 11:49:19 -07:00
Philippe Tillet	751e325d2e	[TUTORIALS] Fixed typo	2022-06-05 13:33:21 -07:00
Philippe Tillet	801c8a4c92	[TUTORIALS] Fixed typo	2022-06-05 12:32:07 -07:00
Philippe Tillet	8876e53206	[BACKEND] Restored reduction bugfixes	2022-06-03 11:38:52 -07:00
Philippe Tillet	a60374a597	Revert "[BACKEND] Various bug fixes; making reductions faster (#533 )". This is a more stable commit that produce bitwise identical code to earlier versions. Using commits after this one may lead to slightly different numerics	2022-06-03 11:36:06 -07:00
Philippe Tillet	3e7500dfe6	[BACKEND] Various bug fixes; making reductions faster (#533 )	2022-05-31 17:14:44 -07:00
Philippe Tillet	0835a4fb05	[TUTORIALS] Removed #noformat in layer norm tutorial	2022-05-12 12:41:25 -07:00
Philippe Tillet	c736ba7c3e	[TUTORIALS] Fixed formatting	2022-05-12 12:31:23 -07:00
Philippe Tillet	cd30a99aa2	[TUTORIALS] fixed formatting	2022-05-12 12:28:22 -07:00
Philippe Tillet	d87435e536	[TUTORIALS] Layer norm tutorial now uses residency control (#510 )	2022-05-05 19:53:54 -07:00
Philippe Tillet	5c7122004c	[TUTORIALS] Tutorial shouldn't expose `clock`. Just removed it.	2022-04-14 17:33:44 -07:00
Philippe Tillet	bace26143d	[TUTORIALS] Removed leftover print	2022-03-28 16:53:23 -07:00
Philippe Tillet	e0cc488055	[FRONTEND] Added `tl.clock` and `tl.globaltimer` (#485 )	2022-03-28 16:15:43 -07:00
Philippe Tillet	7b48340ffd	[CI] Some fixes for the build (#451 )	2022-02-06 19:11:33 -08:00
Philippe Tillet	2922dc141c	Merge branch 'master' into v2.0	2022-01-30 20:25:01 -08:00
Madeleine Thompson	efdabe6073	[STYLE] check python with flake8 (#424 ) I've been using this locally to find errors without running tests, and now that we're using autopep8, it passes with minimal suppressions. This is also what turned up the issues with the tutorials, which were fixed in #422.	2022-01-07 15:28:36 -08:00
Madeleine Thompson	a70acfec77	[STYLE] add isort and autopep8 config files and check on CI (#423 ) Also a fix a few more style issues from the "aggressive" mode of autopep8.	2022-01-07 13:11:34 -08:00
Madeleine Thompson	9801aa7b56	[DOCS] fix tutorials for v2.0 (#422 ) - Fix meta-parameter usage on tutorials. - Install tutorial dependencies on CI. - Switch from `requirements-test.txt` to `extras_require` for test dependencies, and also use it for tutorial dependencies. - Make some performance tests deterministic.	2022-01-07 12:34:38 -08:00
Madeleine Thompson	8bf551ae7a	[STYLE] run autopep8 and isort (#421 ) Run: ``` isort ./python autopep8 -i --ignore E501,E701,E731 $(find ./python/ -name '*.py') ``` with an `.isort.cfg` and then clean up a few warts. This PR should be a no-op; the idea is that this is all boring whitespace changes, and any config file changes will be in a different change to make it easier to review.	2022-01-06 14:34:17 -08:00
Noah Ziems	3edc2633e9	[TUTORIALS] Fix 01-vector-add.py typo (#406 )	2021-12-29 15:09:34 -08:00
Philippe Tillet	2acaa4d0dd	[LANG] Added support for constexpr (#361 )	2021-10-30 00:32:58 -07:00
Philippe Tillet	90ded16c32	[DOCS] Added placeholder docstring for layernorm tutorial	2021-10-15 19:04:01 -07:00
Philippe Tillet	d4baad426d	[DOCS] Added layer norm example (#326 )	2021-10-08 11:02:10 -07:00
Philippe Tillet	4163d32c49	[DOCS] Fixed leftover exit() in 01-vector-add tutorial	2021-09-10 15:52:26 -07:00
Philippe Tillet	ac10551d55	[PYTHON] Now providing triton.next_power_of_2 (#273 )	2021-09-10 11:05:44 -07:00
Philippe Tillet	94c83d30ce	[GENERAL] Removed deprecated driver files and added basic compatibility with rocm (#268 ) - Removed driver module -- accelerator runtime is handled by pytorch - Added basic support for ROCM based on @micmelesse 's PR -- now can execute empty kernel on AMD devices without any compile-time changes - Now only using PREFER_SHARED for kernels when the size of shared memory is greater than 49k. Otherwise there can be poor L1 performance for broadcast tensors	2021-09-09 00:04:28 -07:00
Szymon Sidor	8bedcce9be	[LANG] Added seeded random number generation - philox (#261 )	2021-09-02 22:02:40 -07:00
Sasank Chilamkurthy	6aa5720d75	[DOCS] use numel for num_elements in elementwise tutorial (#228 )	2021-08-19 19:35:12 -07:00
Philippe Tillet	f26a48a3b4	[DOCS] Various improvements (#224 ) - Added docstr for autotune, Config, heuristics - Added docstr for atomics - Hiding internal _builder argument used for built-in language primitives - Re-factor docstr to use common templates between similar functions.	2021-08-18 11:15:53 -07:00
Philippe Tillet	70e28ff380	[DOCS] Minor modifications of the matmul tutorial (#199 ) Making the code more compact and fixing inconsistencies between text variable names and final python program.	2021-08-11 18:59:15 -07:00
Philippe Tillet	398d4b4aeb	[DOCS] softmax tutorial fixup (#198 )	2021-08-11 17:35:00 -07:00
Nicholas Joseph	6cd1ec3955	[DOCS] Fix formatting mistakes (#192 )	2021-08-06 12:58:43 -07:00
Nicholas Joseph	68f7eeba92	[DOCS] Improve matmul tutorial readability (#188 )	2021-08-05 16:05:56 -07:00
Nicholas Joseph	4e6f667c2f	[DOCS] Improve readability of 02-fused-softmax.py (#186 )	2021-08-05 09:39:07 -07:00
Nicholas Joseph	23c71538fc	[DOCS] Improve tutorial readability (#185 )	2021-08-05 09:27:06 -07:00
Xiangru Lian	9967e9d4b4	[DOCS] Fix fused softmax example script naive softmax implementation (#178 )	2021-08-02 09:37:31 -07:00
Philippe Tillet	acd5e44611	[GENERAL] Some minor improvements here and there to build systems and docs (#148 )	2021-07-28 01:51:17 -07:00
Philippe Tillet	b253b77c71	[DOCS] Improved documentation and integration in CI (#139 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	b7b05a560e	[DRIVER] Now giving the option to use system ptxas through environment variable (#123 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	840140bf26	[CODEGEN] Removed dedicated reassociate pass to merge it into LLVM isel (#101 ) This massively simplifies implementation of `reassociate` and also fixes a bunch of bug. The pass could still be improved, but can already be used to generate constant pointer offsets in eg the matmul epilogue	2021-07-27 12:38:49 -07:00
Philippe Tillet	bfc0a7587d	[PYTHON] Renamed triton.core -> triton.language (#92 )	2021-07-27 12:38:49 -07:00

1 2

63 Commits