triton

Author	SHA1	Message	Date
daadaada	539961072c	[FRONTEND] Semantic analysis refactor (#473 ) Moved dispatch.cc to semantic.py Integer signedness now moved from C++ to python Cleaner frontend type Co-authored-by: Phil Tillet <phil@openai.com>	2022-03-16 21:25:30 -07:00
Philippe Tillet	d4d8eaf6c0	[FRONTEND] improved caching mechanism (#474 ) Co-authored-by: Greg Brockman <gdb@gregbrockman.com> Co-authored-by: Christopher Hesse <christopherhesse@users.noreply.github.com>	2022-03-15 12:20:51 -07:00
daadaada	a9dfdcaaa9	[FRONTEND] Make the performance model work for int8, tf32, and fp32 (#456 )	2022-02-11 22:34:42 -08:00
Philippe Tillet	5a8a544d10	[OPS][BLOCKSPARSE] Improved robustness, clarity and performance (#450 ) * dds layout now internally re-uses dsd code path for increased code * at_mask and kp_mask related things are now dropped from the softmax API. I couldn't think of any case where it was needed beyond is_causal. And if there is any, we should probably find a way to get it implemented statically so that users don't have to materialize masks. * fixed bug in blocksparse matmul that caused troubles when layout had a full row/col of zeros * blocksparse softmax now no longer modifies any data in-place * blocksparse softmax now takes an is_dense arguments that provides better performance. Passing is_dense=True, is_causal=True is the best way to achieve triangular attention. * unit tests now test backward pass	2022-02-06 18:00:45 -08:00
TC	137bb67fad	[LANG] Add fp16 to fp8 conversion (#444 )	2022-02-02 20:42:09 -08:00
Philippe Tillet	807d8a1945	[ALL] Merge master (#447 )	2022-01-30 20:21:20 -08:00
daadaada	59d371c6eb	[BACKEND] Added Int8 mma (#440 )	2022-01-27 09:12:44 -08:00
Philippe Tillet	4c97d1ecd7	[FRONTEND] Bunch of fixes here and there (#436 )	2022-01-20 10:55:59 -08:00
daadaada	2a944ded53	[TESTS] Added bfloat16 tests (#430 )	2022-01-13 23:38:32 -08:00
daadaada	94a2e10fe5	[BACKEND] Add bf16 & tf32 mma supports (on A100) (#426 )	2022-01-11 10:20:31 -08:00
Madeleine Thompson	efdabe6073	[STYLE] check python with flake8 (#424 ) I've been using this locally to find errors without running tests, and now that we're using autopep8, it passes with minimal suppressions. This is also what turned up the issues with the tutorials, which were fixed in #422.	2022-01-07 15:28:36 -08:00
Madeleine Thompson	a70acfec77	[STYLE] add isort and autopep8 config files and check on CI (#423 ) Also a fix a few more style issues from the "aggressive" mode of autopep8.	2022-01-07 13:11:34 -08:00
Madeleine Thompson	9801aa7b56	[DOCS] fix tutorials for v2.0 (#422 ) - Fix meta-parameter usage on tutorials. - Install tutorial dependencies on CI. - Switch from `requirements-test.txt` to `extras_require` for test dependencies, and also use it for tutorial dependencies. - Make some performance tests deterministic.	2022-01-07 12:34:38 -08:00
Madeleine Thompson	8bf551ae7a	[STYLE] run autopep8 and isort (#421 ) Run: ``` isort ./python autopep8 -i --ignore E501,E701,E731 $(find ./python/ -name '*.py') ``` with an `.isort.cfg` and then clean up a few warts. This PR should be a no-op; the idea is that this is all boring whitespace changes, and any config file changes will be in a different change to make it easier to review.	2022-01-06 14:34:17 -08:00
Madeleine Thompson	0ab9d67bad	uint8, uint16, uint32, and uint64 in kernels (#413 ) A forthcoming PR will update the RNG to use these types. Also: - Add tests for the `//`, `<<`, and `>>` operators. - Change `TensorWrapper` to unwrap objects when the resulting object would be simpler. - Clean up `throw_unreachable`, since it was triggering compiler warnings.	2022-01-05 15:27:17 -08:00
Madeleine Thompson	d8db0308cb	[TEST] use numpy for reference results in test_core.py (#409 ) Since numpy supports unsigned integers, and pytorch doesn't, this will make it easier to test unsigned integer support. This adds an explicit requirement for numpy in tests, but we already required scipy, so it was already an implicit dependency.	2022-01-04 13:07:29 -08:00
Madeleine Thompson	985798f101	add missing bfloat16 repr and improve assertions (#403 ) - `BF16TyID` was missing a repr implementation. - Throw a better exception on impossible casts. - Add a few assertions. Tested with a debug build. - Add `pointer_dtype.__str__` to aid kernel debugging.	2021-12-23 17:01:17 -08:00
Philippe Tillet	a425f24d54	[FRONTEND] Better cache hook (#400 ) Added an additional `repr` argument to the cache hook, which represents a human-readable string representation of the signature and argument attributes associated with the compiled binary.	2021-12-21 21:29:47 -08:00
daadaada	39d4bfed83	[OPS] Add performance model for gemm/gemv (#397 ) Significantly improves the performance of `triton.ops.matmul` in memory-bound settings via the use of many more block configs coupled with a performance model to drive the auto-tuning process.	2021-12-21 09:56:10 -08:00
Madeleine Thompson	5cdb948c05	[FRONTEND] signed-integer math fixes and testing (#395 ) - Promote 16-bit floating-point `/` and `%` to 32-bit; we have to anyway. - Do not force result of integer binary operations to be the LHS type. There used to be a bug in pytorch that did this, which Triton matched, but that bug is fixed now. - When testing signed integer operations, use random numbers from the full range of the type. - Add an optional `seed` argument to `triton.testing.random` so binary operations are not tested with both sides equal when the LHS and RHS have the same type. - Fix a bad `CompilationError` invocation. - Fix a warning suppression that causes tests to fail if you run them with `-W error` on python 3.8.	2021-12-21 09:46:05 -08:00
Madeleine Thompson	e575ae3443	[FRONTEND] Minor accumulated style and warning fixes (#388 ) - Fix some whitespace. - Make an undeclared dependency on `pytest` explicit. - Fix deprecated `description-file` use. - `#ifdef` out a deprecated `PyEval_InitThreads` call. - Use a slightly different numpy invocation in `test_random.py` to quiet down overflow warnings in tests. - Fix a deprecated cast in `test_core.py`. - Suppress a warning about `visit_Constant` in Python 3.9+; we can't migrate yet because it'd break Python 3.6 and 3.7. - Use chained exceptions for `CompilationError` rather than rolling our own; it makes the error messages nicer. - Add a `__str__` for `tl.dtype` to make debugging kernels easier; it lets you `print` a dtype to see what type was inferred. - Fix a few bad escapes.	2021-12-10 15:19:20 -08:00
Philippe Tillet	8ec9f037bb	[BACKEND/CODE_GEN] Fixed float32 matmul problem (#380 )	2021-11-30 22:00:56 -08:00
Philippe Tillet	c86ad9c9ab	[FRONTEND] Added default arguments to non-kernel @triton.jit'd function (#379 )	2021-11-29 19:11:26 -08:00
daadaada	1296eb877b	[RUNTIME] Config hook v2.0 (#373 ) * Add pre_hook to triton.Config * Use argument names in triton.heuristics * Update base perf * Remove meta from heuristics	2021-11-21 11:20:59 -08:00
Philippe Tillet	01cc3d4503	[RUNTIME] Restored `do_not_specialize` (#374 )	2021-11-12 15:06:55 -08:00
Philippe Tillet	e66bf76354	[RUNTIME] Bunch of bugfixes (#372 )	2021-11-12 00:55:00 -08:00
Philippe Tillet	2acaa4d0dd	[LANG] Added support for constexpr (#361 )	2021-10-30 00:32:58 -07:00
daadaada	858dec8372	[CODEGEN] Add cache modifier to tl.load (#351 ) * Add cache modifier to tl.load * Add comment to cache_modifier * Remove force_nc_cache * Update test	2021-10-17 22:14:04 -07:00
Stephen McGroarty	c2e6b90ff1	[CODEGEN] Fixes masked load exception (#342 )	2021-10-13 13:31:52 -07:00
Philippe Tillet	c3c0ff0552	[LANGUAGE] Fixed issue with duplicates in large arrays of random uniform numbers (#338 )	2021-10-10 15:22:34 -07:00
daadaada	9e9d781912	[CODEGEN] Pipeline fixup (#336 )	2021-10-10 01:47:11 -07:00
Philippe Tillet	5123db0b7d	[LANG] Various (relatively minor) improvements (#320 )	2021-10-04 18:39:40 -07:00
Philippe Tillet	2c287544cb	[OPS] Faster and cleaner block-sparse implementation (#311 )	2021-09-27 18:25:16 -07:00
Benjamin Lefaudeux	b53f5f3803	[OPS][BLOCKSPARSE] safeguarding a couple more configurations (#292 )	2021-09-20 17:15:31 -07:00
Philippe Tillet	a12827848d	[FRONTEND] Now using exist_ok=True when creating cache directories (#288 )	2021-09-18 23:44:21 -07:00
Philippe Tillet	313d6488f6	[CODEGEN] Fixed over-aggressive division handling in alignment pass (#280 )	2021-09-15 00:40:17 -07:00
Philippe Tillet	da5063d898	[TEST] Added performance regression tests (#283 )	2021-09-14 01:46:32 -07:00
Philippe Tillet	3e395bc84e	[LANG] Fixed semantics of NaN in float comparisons (#281 )	2021-09-13 15:06:29 -07:00
Philippe Tillet	585e5cd0ec	[TEST] Added test for empty kernel (#271 )	2021-09-09 10:20:37 -07:00
Philippe Tillet	94c83d30ce	[GENERAL] Removed deprecated driver files and added basic compatibility with rocm (#268 ) - Removed driver module -- accelerator runtime is handled by pytorch - Added basic support for ROCM based on @micmelesse 's PR -- now can execute empty kernel on AMD devices without any compile-time changes - Now only using PREFER_SHARED for kernels when the size of shared memory is greater than 49k. Otherwise there can be poor L1 performance for broadcast tensors	2021-09-09 00:04:28 -07:00
Szymon Sidor	8bedcce9be	[LANG] Added seeded random number generation - philox (#261 )	2021-09-02 22:02:40 -07:00
Philippe Tillet	4ff3714d61	[CODEGEN] Various bugfixes and stability improvements in compiler backend (#240 )	2021-08-30 11:50:35 -07:00
milesial	5b29da719d	[DRIVER] Add CUDA P2P support (#209 )	2021-08-20 21:00:54 -07:00
Philippe Tillet	a714b6b856	[PYTHON] re-activated auto-tuner configurations for triton.ops.matmul (#212 )	2021-08-16 22:56:21 -07:00
Philippe Tillet	bb1eebb4b4	[CODEGEN] Fixed bug for visit_reduce1d with 64-bit data-types (#207 )	2021-08-14 21:07:01 -07:00
Philippe Tillet	b120d70a0a	[CI] Moved from `assert_allclose` to `assert_almost_equal` (#200 )	2021-08-12 12:00:30 -07:00
Philippe Tillet	2824345065	[LANGUAGE] Added cos/sin (#132 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	8cea583109	[IR] Preliminary support for BF16 (#129 ) This PR adds a BF16 data-type, along with FP32 <-> BF16 conversion instructions in the LLVM codegen. Other kinds of ops on bfloat16 are not yet supported.	2021-07-27 12:38:49 -07:00
daadaada	d8d6b715c8	[CODEGEN] Performance improvement on A100 (#125 ) Improved codegen for the Ampere GPUs. * Make the layout pass recognize the multistage pipelined pattern. * Now the pipeline pass can automate the multistage pipelining transformation. * Remove extra barriers (from the prefetch pass & WAR) on Ampere. * Update the code generator (generator.cc) to make Triton generate n-buffered shared memory loads/stores.	2021-07-27 12:38:49 -07:00
Philippe Tillet	0274429429	[IR] Added IR and Codegen support for atomic_rmw (#120 )	2021-07-27 12:38:49 -07:00

1 2 3

112 Commits