triton

Author	SHA1	Message	Date
Philippe Tillet	4941bc7001	[DOCS] Some more fixes (#455 )	2022-02-08 16:53:56 -08:00
Philippe Tillet	2fdf0a4fe8	[DOCS] changed build command	2022-02-08 11:45:21 -08:00
Philippe Tillet	077d6c8ff0	[DOCS] re-activated tutorials	2022-02-08 11:42:39 -08:00
Philippe Tillet	822ddcd14b	[DOCS] Added versioning (#453 )	2022-02-08 11:28:18 -08:00
Philippe Tillet	7b48340ffd	[CI] Some fixes for the build (#451 )	2022-02-06 19:11:33 -08:00
Philippe Tillet	5a8a544d10	[OPS][BLOCKSPARSE] Improved robustness, clarity and performance (#450 ) * dds layout now internally re-uses dsd code path for increased code * at_mask and kp_mask related things are now dropped from the softmax API. I couldn't think of any case where it was needed beyond is_causal. And if there is any, we should probably find a way to get it implemented statically so that users don't have to materialize masks. * fixed bug in blocksparse matmul that caused troubles when layout had a full row/col of zeros * blocksparse softmax now no longer modifies any data in-place * blocksparse softmax now takes an is_dense arguments that provides better performance. Passing is_dense=True, is_causal=True is the best way to achieve triangular attention. * unit tests now test backward pass	2022-02-06 18:00:45 -08:00
Philippe Tillet	69ff52ea1f	[CODEGEN] removed buggy (and mostly useless) optimization in peephole pass (#449 )	2022-02-05 21:37:23 -08:00
TC	137bb67fad	[LANG] Add fp16 to fp8 conversion (#444 )	2022-02-02 20:42:09 -08:00
Philippe Tillet	3b20170fa3	Merge pull request #448 from openai/v2.0 `v2.0` is now merged into `master`	2022-01-30 20:49:08 -08:00
Philippe Tillet	b0d6e2f322	[STYLE] run autopep	2022-01-30 20:27:44 -08:00
Philippe Tillet	2922dc141c	Merge branch 'master' into v2.0	2022-01-30 20:25:01 -08:00
Philippe Tillet	807d8a1945	[ALL] Merge master (#447 )	2022-01-30 20:21:20 -08:00
Philippe Tillet	bef76b142a	[BACKEND] float division is now approximate by default (#446 )	2022-01-29 18:29:29 -08:00
Philippe Tillet	bd52e530a0	[OPS][BLOCKSPARSE] Fix padding issue in DSD LUT (#445 )	2022-01-28 21:40:30 -08:00
daadaada	e68d6a7776	[BACKEND] Making the warp-level tile "more square" to increase data-reuse for tl.dot. (#442 ) * Increase smem data-reuse for some layouts * tweak * Keep the original tiling logic for sm < 80 Co-authored-by: Philippe Tillet <phil@openai.com>	2022-01-27 09:59:54 -08:00
daadaada	59d371c6eb	[BACKEND] Added Int8 mma (#440 )	2022-01-27 09:12:44 -08:00
Benjamin Lefaudeux	3a23c1dd33	[BACKEND] minor, hotfix for gcc compilation (#439 )	2022-01-23 14:24:02 -08:00
Philippe Tillet	ccf9abe0ba	[FRONTEND][RANDOM] Improved backward compatibility of RNG (#438 ) The unsigned int PR definitely improved our RNG. However, it requires different floating point arithmetics which, means the results are not bit-wise identical to how they were before. This commit revives backward compatibility, but we should change it back to the "right" way later.	2022-01-21 18:05:55 -08:00
Philippe Tillet	4c97d1ecd7	[FRONTEND] Bunch of fixes here and there (#436 )	2022-01-20 10:55:59 -08:00
Philippe Tillet	e0c5709cc8	[FRONTEND] Fixed semantics bug on ptr to bool conversions (#432 )	2022-01-17 18:00:03 -08:00
daadaada	2a944ded53	[TESTS] Added bfloat16 tests (#430 )	2022-01-13 23:38:32 -08:00
Philippe Tillet	4c94359199	[FRONTEND] Alignment fix-up (#428 )	2022-01-11 23:11:58 -08:00
Philippe Tillet	bbc78f6516	[FRONTEND][RANDOM] Make sure offset dtype is always uint32 before calling uint32_to_uniform_float (#427 )	2022-01-11 11:08:49 -08:00
Botao Yu	bf32205edc	[OPS][BLOCKSPARSE] Remove unnecessary loop and add cuda bool layout support (#425 )	2022-01-11 11:07:16 -08:00
daadaada	94a2e10fe5	[BACKEND] Add bf16 & tf32 mma supports (on A100) (#426 )	2022-01-11 10:20:31 -08:00
Madeleine Thompson	efdabe6073	[STYLE] check python with flake8 (#424 ) I've been using this locally to find errors without running tests, and now that we're using autopep8, it passes with minimal suppressions. This is also what turned up the issues with the tutorials, which were fixed in #422.	2022-01-07 15:28:36 -08:00
Madeleine Thompson	a70acfec77	[STYLE] add isort and autopep8 config files and check on CI (#423 ) Also a fix a few more style issues from the "aggressive" mode of autopep8.	2022-01-07 13:11:34 -08:00
Madeleine Thompson	9801aa7b56	[DOCS] fix tutorials for v2.0 (#422 ) - Fix meta-parameter usage on tutorials. - Install tutorial dependencies on CI. - Switch from `requirements-test.txt` to `extras_require` for test dependencies, and also use it for tutorial dependencies. - Make some performance tests deterministic.	2022-01-07 12:34:38 -08:00
Madeleine Thompson	8bf551ae7a	[STYLE] run autopep8 and isort (#421 ) Run: ``` isort ./python autopep8 -i --ignore E501,E701,E731 $(find ./python/ -name '*.py') ``` with an `.isort.cfg` and then clean up a few warts. This PR should be a no-op; the idea is that this is all boring whitespace changes, and any config file changes will be in a different change to make it easier to review.	2022-01-06 14:34:17 -08:00
Shantanu	6f7acad48f	[CODEGEN] Avoid use of deprecated AST nodes (#418 ) Co-authored-by: hauntsaninja <>	2022-01-06 12:04:33 -08:00
Madeleine Thompson	120cda015e	[FRONTEND] use unsigned integers to simplify RNG (#417 )	2022-01-06 10:49:09 -08:00
Philippe Tillet	001fb757fe	[OPS][BLOCKSPARSE] Added `.contiguous()` in blocksparse inputs when necessary (#420 )	2022-01-06 09:56:22 -08:00
Madeleine Thompson	0ab9d67bad	uint8, uint16, uint32, and uint64 in kernels (#413 ) A forthcoming PR will update the RNG to use these types. Also: - Add tests for the `//`, `<<`, and `>>` operators. - Change `TensorWrapper` to unwrap objects when the resulting object would be simpler. - Clean up `throw_unreachable`, since it was triggering compiler warnings.	2022-01-05 15:27:17 -08:00
Madeleine Thompson	d8db0308cb	[TEST] use numpy for reference results in test_core.py (#409 ) Since numpy supports unsigned integers, and pytorch doesn't, this will make it easier to test unsigned integer support. This adds an explicit requirement for numpy in tests, but we already required scipy, so it was already an implicit dependency.	2022-01-04 13:07:29 -08:00
Philippe Tillet	03f1256f60	[FRONTEND] Added `volatile` flag for load (#407 )	2021-12-30 22:33:24 -08:00
Noah Ziems	3edc2633e9	[TUTORIALS] Fix 01-vector-add.py typo (#406 )	2021-12-29 15:09:34 -08:00
Madeleine Thompson	985798f101	add missing bfloat16 repr and improve assertions (#403 ) - `BF16TyID` was missing a repr implementation. - Throw a better exception on impossible casts. - Add a few assertions. Tested with a debug build. - Add `pointer_dtype.__str__` to aid kernel debugging.	2021-12-23 17:01:17 -08:00
Philippe Tillet	d8fce83e7a	[FRONTEND] Remade exception picklable	2021-12-21 22:14:06 -08:00
Philippe Tillet	a425f24d54	[FRONTEND] Better cache hook (#400 ) Added an additional `repr` argument to the cache hook, which represents a human-readable string representation of the signature and argument attributes associated with the compiled binary.	2021-12-21 21:29:47 -08:00
Philippe Tillet	2509124dd0	[DRIVER] Fixed some issue with how ptxas is used (#399 ) Now using tmpnam and properly deleting temporaries when an exception is raised	2021-12-21 14:31:51 -08:00
daadaada	39d4bfed83	[OPS] Add performance model for gemm/gemv (#397 ) Significantly improves the performance of `triton.ops.matmul` in memory-bound settings via the use of many more block configs coupled with a performance model to drive the auto-tuning process.	2021-12-21 09:56:10 -08:00
Madeleine Thompson	5cdb948c05	[FRONTEND] signed-integer math fixes and testing (#395 ) - Promote 16-bit floating-point `/` and `%` to 32-bit; we have to anyway. - Do not force result of integer binary operations to be the LHS type. There used to be a bug in pytorch that did this, which Triton matched, but that bug is fixed now. - When testing signed integer operations, use random numbers from the full range of the type. - Add an optional `seed` argument to `triton.testing.random` so binary operations are not tested with both sides equal when the LHS and RHS have the same type. - Fix a bad `CompilationError` invocation. - Fix a warning suppression that causes tests to fail if you run them with `-W error` on python 3.8.	2021-12-21 09:46:05 -08:00
daadaada	4a8953efa3	[FRONTEND] Replace the legacy print call in triton.cc with the SlotTracker-based one. (#396 ) The legacy print call will assign names (e.g., %10) to values, which can be undesirable in some cases.	2021-12-18 18:03:22 -08:00
Madeleine Thompson	fa62b4a8f6	[FRONTEND] better stringification (#394 ) - Don't override `self.args` in `CompilationError`, and show the line number and column in error messages. This causes it to generate an easier-to-read backtrace. - Better `__str__` on `TensorWrapper`, `dtype`, and `block`.	2021-12-17 20:11:45 -08:00
Philippe Tillet	4e93b41c52	[GENERAL] Some minor fixups (#393 ) * [RUNTIME] Now displaying error message when generated PTX is invalid * [CODEGEN] Now converting `if` condition to bool implicitly	2021-12-17 18:06:21 -08:00
Philippe Tillet	e062812969	[CODEGEN] Disabled peephole for masked load + select -- masked_load doesn't work as expected when vectorized	2021-12-17 12:44:47 -08:00
Victor	eb077fc993	[RUNTIME] fixed NVidia DLL names on Windows (#392 )	2021-12-16 22:09:52 -08:00
Philippe Tillet	e0b92c1380	[FRONTEND] Reverted `from .random import *`. There are still some namespace errors in the Triton frontend apparently	2021-12-16 18:37:51 -08:00
Philippe Tillet	558555630f	[FRONTEND] Added xor_sum	2021-12-16 17:55:35 -08:00
Madeleine Thompson	e575ae3443	[FRONTEND] Minor accumulated style and warning fixes (#388 ) - Fix some whitespace. - Make an undeclared dependency on `pytest` explicit. - Fix deprecated `description-file` use. - `#ifdef` out a deprecated `PyEval_InitThreads` call. - Use a slightly different numpy invocation in `test_random.py` to quiet down overflow warnings in tests. - Fix a deprecated cast in `test_core.py`. - Suppress a warning about `visit_Constant` in Python 3.9+; we can't migrate yet because it'd break Python 3.6 and 3.7. - Use chained exceptions for `CompilationError` rather than rolling our own; it makes the error messages nicer. - Add a `__str__` for `tl.dtype` to make debugging kernels easier; it lets you `print` a dtype to see what type was inferred. - Fix a few bad escapes.	2021-12-10 15:19:20 -08:00

... 2 3 4 5 6 ...

572 Commits