triton

Author	SHA1	Message	Date
Keren Zhou	16e973edf2	[BACKEND] Fix dependency analysis in pipeline (#946 )	2022-12-06 09:08:55 -08:00
Philippe Tillet	8edfe813a5	[FRONTEND][BACKEND] Added `trans` instruction; made flash attention bwd pass work (#943 )	2022-12-03 09:58:24 -08:00
Philippe Tillet	6461254fb5	[BACKEND] Make flash attention forward pass work (#928 ) This also simplifies BroadcastOp codegen	2022-11-30 10:13:24 +00:00
Chenggang Zhao	516a241234	[Triton-MLIR] Fix some typos (#874 ) Fix some typos	2022-11-13 18:15:53 -08:00
Keren Zhou	2da71b2aaa	[Triton-MLIR] Increase block size K to completely eliminate shared memory bank conflicts (#862 )	2022-11-08 17:39:23 -08:00
Philippe Tillet	91a9773b38	[OPTIMIZER] Minor bugfixes that affected matmul codegen performance (#834 )	2022-11-02 22:58:09 -07:00
Philippe Tillet	fcb228d1d4	Merge select commits from `master` branch into `triton-mlir` (#799 ) Co-authored-by: Keren Zhou <kerenzhou@openai.com> Co-authored-by: vesuppi <zt9465@gmail.com> Co-authored-by: Jason Ansel <jansel@jansel.net> Co-authored-by: daadaada <dyanab@connect.ust.hk> Co-authored-by: Anton Kostin <masguit42@users.noreply.github.com> Co-authored-by: Yunxing Dai <nov503@gmail.com> Co-authored-by: Shintaro Iwasaki <shintaro.iwasaki.work@gmail.com>	2022-10-24 14:52:37 -07:00
Philippe Tillet	bb0f9235d1	[OPTIMIZER] Made layout simplification pass efficient for fused attention kernels (#790 )	2022-10-21 16:52:15 -07:00
Shintaro Iwasaki	13669b46a6	[DOCS] Correct spelling (#665 ) This PR corrects spelling like #664 for Triton-MLIR. It should not break anything.	2022-09-16 15:07:34 -07:00
Yan Da	e95d98a886	bindings for ModuleOp	2022-03-30 13:32:52 +08:00
Philippe Tillet	7b48340ffd	[CI] Some fixes for the build (#451 )	2022-02-06 19:11:33 -08:00
Philippe Tillet	2922dc141c	Merge branch 'master' into v2.0	2022-01-30 20:25:01 -08:00
Madeleine Thompson	efdabe6073	[STYLE] check python with flake8 (#424 ) I've been using this locally to find errors without running tests, and now that we're using autopep8, it passes with minimal suppressions. This is also what turned up the issues with the tutorials, which were fixed in #422.	2022-01-07 15:28:36 -08:00
Madeleine Thompson	a70acfec77	[STYLE] add isort and autopep8 config files and check on CI (#423 ) Also a fix a few more style issues from the "aggressive" mode of autopep8.	2022-01-07 13:11:34 -08:00
Madeleine Thompson	9801aa7b56	[DOCS] fix tutorials for v2.0 (#422 ) - Fix meta-parameter usage on tutorials. - Install tutorial dependencies on CI. - Switch from `requirements-test.txt` to `extras_require` for test dependencies, and also use it for tutorial dependencies. - Make some performance tests deterministic.	2022-01-07 12:34:38 -08:00
Madeleine Thompson	8bf551ae7a	[STYLE] run autopep8 and isort (#421 ) Run: ``` isort ./python autopep8 -i --ignore E501,E701,E731 $(find ./python/ -name '*.py') ``` with an `.isort.cfg` and then clean up a few warts. This PR should be a no-op; the idea is that this is all boring whitespace changes, and any config file changes will be in a different change to make it easier to review.	2022-01-06 14:34:17 -08:00
Noah Ziems	3edc2633e9	[TUTORIALS] Fix 01-vector-add.py typo (#406 )	2021-12-29 15:09:34 -08:00
Philippe Tillet	2acaa4d0dd	[LANG] Added support for constexpr (#361 )	2021-10-30 00:32:58 -07:00
Philippe Tillet	90ded16c32	[DOCS] Added placeholder docstring for layernorm tutorial	2021-10-15 19:04:01 -07:00
Philippe Tillet	d4baad426d	[DOCS] Added layer norm example (#326 )	2021-10-08 11:02:10 -07:00
Philippe Tillet	4163d32c49	[DOCS] Fixed leftover exit() in 01-vector-add tutorial	2021-09-10 15:52:26 -07:00
Philippe Tillet	ac10551d55	[PYTHON] Now providing triton.next_power_of_2 (#273 )	2021-09-10 11:05:44 -07:00
Philippe Tillet	94c83d30ce	[GENERAL] Removed deprecated driver files and added basic compatibility with rocm (#268 ) - Removed driver module -- accelerator runtime is handled by pytorch - Added basic support for ROCM based on @micmelesse 's PR -- now can execute empty kernel on AMD devices without any compile-time changes - Now only using PREFER_SHARED for kernels when the size of shared memory is greater than 49k. Otherwise there can be poor L1 performance for broadcast tensors	2021-09-09 00:04:28 -07:00
Szymon Sidor	8bedcce9be	[LANG] Added seeded random number generation - philox (#261 )	2021-09-02 22:02:40 -07:00
Sasank Chilamkurthy	6aa5720d75	[DOCS] use numel for num_elements in elementwise tutorial (#228 )	2021-08-19 19:35:12 -07:00
Philippe Tillet	f26a48a3b4	[DOCS] Various improvements (#224 ) - Added docstr for autotune, Config, heuristics - Added docstr for atomics - Hiding internal _builder argument used for built-in language primitives - Re-factor docstr to use common templates between similar functions.	2021-08-18 11:15:53 -07:00
Philippe Tillet	70e28ff380	[DOCS] Minor modifications of the matmul tutorial (#199 ) Making the code more compact and fixing inconsistencies between text variable names and final python program.	2021-08-11 18:59:15 -07:00
Philippe Tillet	398d4b4aeb	[DOCS] softmax tutorial fixup (#198 )	2021-08-11 17:35:00 -07:00
Nicholas Joseph	6cd1ec3955	[DOCS] Fix formatting mistakes (#192 )	2021-08-06 12:58:43 -07:00
Nicholas Joseph	68f7eeba92	[DOCS] Improve matmul tutorial readability (#188 )	2021-08-05 16:05:56 -07:00
Nicholas Joseph	4e6f667c2f	[DOCS] Improve readability of 02-fused-softmax.py (#186 )	2021-08-05 09:39:07 -07:00
Nicholas Joseph	23c71538fc	[DOCS] Improve tutorial readability (#185 )	2021-08-05 09:27:06 -07:00
Xiangru Lian	9967e9d4b4	[DOCS] Fix fused softmax example script naive softmax implementation (#178 )	2021-08-02 09:37:31 -07:00
Philippe Tillet	acd5e44611	[GENERAL] Some minor improvements here and there to build systems and docs (#148 )	2021-07-28 01:51:17 -07:00
Philippe Tillet	b253b77c71	[DOCS] Improved documentation and integration in CI (#139 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	b7b05a560e	[DRIVER] Now giving the option to use system ptxas through environment variable (#123 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	840140bf26	[CODEGEN] Removed dedicated reassociate pass to merge it into LLVM isel (#101 ) This massively simplifies implementation of `reassociate` and also fixes a bunch of bug. The pass could still be improved, but can already be used to generate constant pointer offsets in eg the matmul epilogue	2021-07-27 12:38:49 -07:00
Philippe Tillet	bfc0a7587d	[PYTHON] Renamed triton.core -> triton.language (#92 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	29e33e50b7	[DOCS] Updates and improvements (#87 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	39f4730305	Deprecation of Triton-C and Replacement by decorated Python functions (#86 ) This PR implements a major overhaul of the frontend for Triton, and replaces Triton-C by a pure Python API in which kernels are defined as @triton.jit decorated functions. The documentation and tutorials have also been updated to accommodate these changes. See documentations for more information on the new API	2021-07-27 12:38:49 -07:00
Philippe Tillet	1fdb465b71	[DOCS] Various improvements and typo fixes	2021-07-27 12:38:49 -07:00
Philippe Tillet	183878dce5	[DOCS] Added matrix multiplication tutorial	2021-07-27 12:38:49 -07:00
Philippe Tillet	50e58d73db	[DOCS] Improved plots in tutorials	2021-07-27 12:38:49 -07:00
Philippe Tillet	d1d09566b1	[DOCS] Improved tutorials documentation	2021-07-27 12:38:49 -07:00
Philippe Tillet	92242ace2c	[DOCS] Re-structured documentation hierarchy	2021-07-27 12:38:49 -07:00
Philippe Tillet	ca04da3575	[DOCS] Switched tutorials to Python and use Sphinx Gallery	2021-07-27 12:38:49 -07:00
Philippe Tillet	5172792543	[DOCS] Added .ipynb tutorials in docs	2021-07-27 12:38:49 -07:00
Philippe Tillet	3ecf834a69	[PYTHON] Deleted 01-vector-add.py: it is an unnecessary duplicate of 01-vector-add.ipynb	2021-07-27 12:38:49 -07:00
Philippe Tillet	62835a0979	[RUNTIME] Added auto-alignment mechanism (#71 ) This PR adds an automatic memory alignment mechanism in the Triton runtime. Specifically, the JIT compiler detects the alignment (in bytes) of each pointer argument as well as the largest power of two divisor (between 1 and 16) of each integer argument. Proper .aligned and .multipleof attributes are then added to the Triton-IR on-the-fly for all auto-tunable kernels. There is a cache that remembers all the kernels compiled for each possible configuration. This PR also includes substantial cleaning of the Python API. This adds 2-3us overhead, mostly due to accessing integer #defines from the auto-tuned compilation options. The previous solution was slightly faster but hacky and potentially unsafe, so this is preferred for now.	2021-07-27 12:38:49 -07:00
Philippe Tillet	50ff1aea86	[DOCS] Added Python 02-fused-softmax.ipynb tutorial	2021-07-27 12:38:49 -07:00

1 2

51 Commits