triton

Author	SHA1	Message	Date
Philippe Tillet	e0cc488055	[FRONTEND] Added `tl.clock` and `tl.globaltimer` (#485 )	2022-03-28 16:15:43 -07:00
Philippe Tillet	76a9ee50a8	Revert "[FRONTEND] Semantic analysis refactor (#473 )" (#483 ) This reverts commit `539961072c`.	2022-03-24 17:16:50 -07:00
Philippe Tillet	ea6d1f1b85	[DRIVER] LLVM driver fixup (#482 ) Current way of doing things is probably not super thread safe. init is shared between threads and some threads my not call the LLVMInitialize* function.	2022-03-23 00:24:45 -07:00
Keren Zhou	a4f68165cd	[FRONTEND] Hot fix for lineno (#481 ) Override __reduce__ to make CompilationError pickable and print out error messages	2022-03-22 22:09:49 -07:00
daadaada	539961072c	[FRONTEND] Semantic analysis refactor (#473 ) Moved dispatch.cc to semantic.py Integer signedness now moved from C++ to python Cleaner frontend type Co-authored-by: Phil Tillet <phil@openai.com>	2022-03-16 21:25:30 -07:00
Yongjik Kim	0dd2ec2e3a	[FRONTEND] Add an assert in case we get a CPU tensor. (#478 )	2022-03-16 14:38:56 -07:00
Philippe Tillet	d4d8eaf6c0	[FRONTEND] improved caching mechanism (#474 ) Co-authored-by: Greg Brockman <gdb@gregbrockman.com> Co-authored-by: Christopher Hesse <christopherhesse@users.noreply.github.com>	2022-03-15 12:20:51 -07:00
Doğukan Tuna	21f8a0646d	[DOCS] Minor README.md (#470 ) Added binary distribution for quick installation	2022-03-05 00:50:37 -08:00
Philippe Tillet	a50a47a85b	[CODEGEN] Reverted some changes from previous PR; fixed vectorization characteristics of mma layout (#469 )	2022-03-04 01:53:31 -08:00
Philippe Tillet	bb5765df5c	[CODEGEN] Now padding shared memory for layout conversion (#468 )	2022-03-03 22:19:05 -08:00
daadaada	d9dd97492f	Use unique_ptr in ir::context_impl (#462 ) Co-authored-by: Philippe Tillet <Phil.Tillet@gmail.com>	2022-02-24 16:07:10 -08:00
Philippe Tillet	98ed7db8c1	[CODEGEN] Improvements and bugfixes (#463 )	2022-02-24 14:56:24 -08:00
daadaada	a9dfdcaaa9	[FRONTEND] Make the performance model work for int8, tf32, and fp32 (#456 )	2022-02-11 22:34:42 -08:00
Philippe Tillet	9b100302d3	[FRONTEND] Now using pybind11 to release GIL (#458 )	2022-02-10 01:57:39 -08:00
Philippe Tillet	40093a9878	[DOCS] Multiple versions are now supported (#457 )	2022-02-09 01:32:41 -08:00
Philippe Tillet	4941bc7001	[DOCS] Some more fixes (#455 )	2022-02-08 16:53:56 -08:00
Philippe Tillet	2fdf0a4fe8	[DOCS] changed build command	2022-02-08 11:45:21 -08:00
Philippe Tillet	077d6c8ff0	[DOCS] re-activated tutorials	2022-02-08 11:42:39 -08:00
Philippe Tillet	822ddcd14b	[DOCS] Added versioning (#453 )	2022-02-08 11:28:18 -08:00
Philippe Tillet	7b48340ffd	[CI] Some fixes for the build (#451 )	2022-02-06 19:11:33 -08:00
Philippe Tillet	5a8a544d10	[OPS][BLOCKSPARSE] Improved robustness, clarity and performance (#450 ) * dds layout now internally re-uses dsd code path for increased code * at_mask and kp_mask related things are now dropped from the softmax API. I couldn't think of any case where it was needed beyond is_causal. And if there is any, we should probably find a way to get it implemented statically so that users don't have to materialize masks. * fixed bug in blocksparse matmul that caused troubles when layout had a full row/col of zeros * blocksparse softmax now no longer modifies any data in-place * blocksparse softmax now takes an is_dense arguments that provides better performance. Passing is_dense=True, is_causal=True is the best way to achieve triangular attention. * unit tests now test backward pass	2022-02-06 18:00:45 -08:00
Philippe Tillet	69ff52ea1f	[CODEGEN] removed buggy (and mostly useless) optimization in peephole pass (#449 )	2022-02-05 21:37:23 -08:00
TC	137bb67fad	[LANG] Add fp16 to fp8 conversion (#444 )	2022-02-02 20:42:09 -08:00
Philippe Tillet	3b20170fa3	Merge pull request #448 from openai/v2.0 `v2.0` is now merged into `master`	2022-01-30 20:49:08 -08:00
Philippe Tillet	b0d6e2f322	[STYLE] run autopep	2022-01-30 20:27:44 -08:00
Philippe Tillet	2922dc141c	Merge branch 'master' into v2.0	2022-01-30 20:25:01 -08:00
Philippe Tillet	807d8a1945	[ALL] Merge master (#447 )	2022-01-30 20:21:20 -08:00
Philippe Tillet	bef76b142a	[BACKEND] float division is now approximate by default (#446 )	2022-01-29 18:29:29 -08:00
Philippe Tillet	bd52e530a0	[OPS][BLOCKSPARSE] Fix padding issue in DSD LUT (#445 )	2022-01-28 21:40:30 -08:00
daadaada	e68d6a7776	[BACKEND] Making the warp-level tile "more square" to increase data-reuse for tl.dot. (#442 ) * Increase smem data-reuse for some layouts * tweak * Keep the original tiling logic for sm < 80 Co-authored-by: Philippe Tillet <phil@openai.com>	2022-01-27 09:59:54 -08:00
daadaada	59d371c6eb	[BACKEND] Added Int8 mma (#440 )	2022-01-27 09:12:44 -08:00
Benjamin Lefaudeux	3a23c1dd33	[BACKEND] minor, hotfix for gcc compilation (#439 )	2022-01-23 14:24:02 -08:00
Philippe Tillet	ccf9abe0ba	[FRONTEND][RANDOM] Improved backward compatibility of RNG (#438 ) The unsigned int PR definitely improved our RNG. However, it requires different floating point arithmetics which, means the results are not bit-wise identical to how they were before. This commit revives backward compatibility, but we should change it back to the "right" way later.	2022-01-21 18:05:55 -08:00
Philippe Tillet	4c97d1ecd7	[FRONTEND] Bunch of fixes here and there (#436 )	2022-01-20 10:55:59 -08:00
Philippe Tillet	e0c5709cc8	[FRONTEND] Fixed semantics bug on ptr to bool conversions (#432 )	2022-01-17 18:00:03 -08:00
daadaada	2a944ded53	[TESTS] Added bfloat16 tests (#430 )	2022-01-13 23:38:32 -08:00
Philippe Tillet	4c94359199	[FRONTEND] Alignment fix-up (#428 )	2022-01-11 23:11:58 -08:00
Philippe Tillet	bbc78f6516	[FRONTEND][RANDOM] Make sure offset dtype is always uint32 before calling uint32_to_uniform_float (#427 )	2022-01-11 11:08:49 -08:00
Botao Yu	bf32205edc	[OPS][BLOCKSPARSE] Remove unnecessary loop and add cuda bool layout support (#425 )	2022-01-11 11:07:16 -08:00
daadaada	94a2e10fe5	[BACKEND] Add bf16 & tf32 mma supports (on A100) (#426 )	2022-01-11 10:20:31 -08:00
Madeleine Thompson	efdabe6073	[STYLE] check python with flake8 (#424 ) I've been using this locally to find errors without running tests, and now that we're using autopep8, it passes with minimal suppressions. This is also what turned up the issues with the tutorials, which were fixed in #422.	2022-01-07 15:28:36 -08:00
Madeleine Thompson	a70acfec77	[STYLE] add isort and autopep8 config files and check on CI (#423 ) Also a fix a few more style issues from the "aggressive" mode of autopep8.	2022-01-07 13:11:34 -08:00
Madeleine Thompson	9801aa7b56	[DOCS] fix tutorials for v2.0 (#422 ) - Fix meta-parameter usage on tutorials. - Install tutorial dependencies on CI. - Switch from `requirements-test.txt` to `extras_require` for test dependencies, and also use it for tutorial dependencies. - Make some performance tests deterministic.	2022-01-07 12:34:38 -08:00
Madeleine Thompson	8bf551ae7a	[STYLE] run autopep8 and isort (#421 ) Run: ``` isort ./python autopep8 -i --ignore E501,E701,E731 $(find ./python/ -name '*.py') ``` with an `.isort.cfg` and then clean up a few warts. This PR should be a no-op; the idea is that this is all boring whitespace changes, and any config file changes will be in a different change to make it easier to review.	2022-01-06 14:34:17 -08:00
Shantanu	6f7acad48f	[CODEGEN] Avoid use of deprecated AST nodes (#418 ) Co-authored-by: hauntsaninja <>	2022-01-06 12:04:33 -08:00
Madeleine Thompson	120cda015e	[FRONTEND] use unsigned integers to simplify RNG (#417 )	2022-01-06 10:49:09 -08:00
Philippe Tillet	001fb757fe	[OPS][BLOCKSPARSE] Added `.contiguous()` in blocksparse inputs when necessary (#420 )	2022-01-06 09:56:22 -08:00
Madeleine Thompson	0ab9d67bad	uint8, uint16, uint32, and uint64 in kernels (#413 ) A forthcoming PR will update the RNG to use these types. Also: - Add tests for the `//`, `<<`, and `>>` operators. - Change `TensorWrapper` to unwrap objects when the resulting object would be simpler. - Clean up `throw_unreachable`, since it was triggering compiler warnings.	2022-01-05 15:27:17 -08:00
Madeleine Thompson	d8db0308cb	[TEST] use numpy for reference results in test_core.py (#409 ) Since numpy supports unsigned integers, and pytorch doesn't, this will make it easier to test unsigned integer support. This adds an explicit requirement for numpy in tests, but we already required scipy, so it was already an implicit dependency.	2022-01-04 13:07:29 -08:00
Philippe Tillet	03f1256f60	[FRONTEND] Added `volatile` flag for load (#407 )	2021-12-30 22:33:24 -08:00

1 2 3 4 5 ...

537 Commits