Philippe Tillet
25f6689508
[FRONTEND] rename current stream monkey patch ( #495 )
2022-04-13 11:45:55 -07:00
Philippe Tillet
76bfac9f15
[FRONTEND] Improved constexpr handling ( #493 )
2022-04-12 00:02:54 -07:00
Philippe Tillet
14b0fd4cfb
[FRONTEND] Added possibility for users to customize current stream query ( #492 )
2022-04-07 12:11:32 -07:00
Philippe Tillet
6424771f55
[CI] Documentation fixup
2022-04-07 09:42:35 -07:00
Philippe Tillet
9f08ecd684
[FRONTEND] Semantic analysis refactor ( #491 )
...
Moved dispatch.cc to semantic.py (@ptillet)
Integer signedness analysis was moved from C++ to python (@daadaada)
Cleaner frontend types (@daadaada)
Moved SSA construction to a separate object (@ptillet)
Co-authored-by: Yan Da <dyanab@connect.ust.hk >
2022-04-06 16:13:53 -07:00
Philippe Tillet
2bed6fc850
[LANG] Added support for device functions ( #484 )
2022-04-03 20:58:16 -07:00
apd10
e85c7a7fc7
Bugfix in ptxas path. ( #487 )
...
Bug: "ret" value is destroyed when a failing "ptxas --version" is run
overwriting the previous valid "ret" value.
Fix: keep rets only for those runs which are successful. Pick the first
one
2022-03-30 20:45:41 -07:00
Philippe Tillet
bace26143d
[TUTORIALS] Removed leftover print
2022-03-28 16:53:23 -07:00
Philippe Tillet
e0cc488055
[FRONTEND] Added tl.clock
and tl.globaltimer
( #485 )
2022-03-28 16:15:43 -07:00
Philippe Tillet
76a9ee50a8
Revert "[FRONTEND] Semantic analysis refactor ( #473 )" ( #483 )
...
This reverts commit 539961072c
.
2022-03-24 17:16:50 -07:00
Philippe Tillet
ea6d1f1b85
[DRIVER] LLVM driver fixup ( #482 )
...
Current way of doing things is probably not super thread safe. init is shared between threads and some threads my not call the LLVMInitialize* function.
2022-03-23 00:24:45 -07:00
Keren Zhou
a4f68165cd
[FRONTEND] Hot fix for lineno ( #481 )
...
Override __reduce__ to make CompilationError pickable and print out error messages
2022-03-22 22:09:49 -07:00
daadaada
539961072c
[FRONTEND] Semantic analysis refactor ( #473 )
...
Moved dispatch.cc to semantic.py
Integer signedness now moved from C++ to python
Cleaner frontend type
Co-authored-by: Phil Tillet <phil@openai.com >
2022-03-16 21:25:30 -07:00
Yongjik Kim
0dd2ec2e3a
[FRONTEND] Add an assert in case we get a CPU tensor. ( #478 )
2022-03-16 14:38:56 -07:00
Philippe Tillet
d4d8eaf6c0
[FRONTEND] improved caching mechanism ( #474 )
...
Co-authored-by: Greg Brockman <gdb@gregbrockman.com >
Co-authored-by: Christopher Hesse <christopherhesse@users.noreply.github.com >
2022-03-15 12:20:51 -07:00
Doğukan Tuna
21f8a0646d
[DOCS] Minor README.md ( #470 )
...
Added binary distribution for quick installation
2022-03-05 00:50:37 -08:00
Philippe Tillet
a50a47a85b
[CODEGEN] Reverted some changes from previous PR; fixed vectorization characteristics of mma layout ( #469 )
2022-03-04 01:53:31 -08:00
Philippe Tillet
bb5765df5c
[CODEGEN] Now padding shared memory for layout conversion ( #468 )
2022-03-03 22:19:05 -08:00
daadaada
d9dd97492f
Use unique_ptr in ir::context_impl ( #462 )
...
Co-authored-by: Philippe Tillet <Phil.Tillet@gmail.com >
2022-02-24 16:07:10 -08:00
Philippe Tillet
98ed7db8c1
[CODEGEN] Improvements and bugfixes ( #463 )
2022-02-24 14:56:24 -08:00
daadaada
a9dfdcaaa9
[FRONTEND] Make the performance model work for int8, tf32, and fp32 ( #456 )
2022-02-11 22:34:42 -08:00
Philippe Tillet
9b100302d3
[FRONTEND] Now using pybind11 to release GIL ( #458 )
2022-02-10 01:57:39 -08:00
Philippe Tillet
40093a9878
[DOCS] Multiple versions are now supported ( #457 )
2022-02-09 01:32:41 -08:00
Philippe Tillet
4941bc7001
[DOCS] Some more fixes ( #455 )
2022-02-08 16:53:56 -08:00
Philippe Tillet
2fdf0a4fe8
[DOCS] changed build command
2022-02-08 11:45:21 -08:00
Philippe Tillet
077d6c8ff0
[DOCS] re-activated tutorials
2022-02-08 11:42:39 -08:00
Philippe Tillet
822ddcd14b
[DOCS] Added versioning ( #453 )
2022-02-08 11:28:18 -08:00
Philippe Tillet
7b48340ffd
[CI] Some fixes for the build ( #451 )
2022-02-06 19:11:33 -08:00
Philippe Tillet
5a8a544d10
[OPS][BLOCKSPARSE] Improved robustness, clarity and performance ( #450 )
...
* dds layout now internally re-uses dsd code path for increased code
* at_mask and kp_mask related things are now dropped from the softmax API. I couldn't think of any case where it was needed beyond is_causal. And if there is any, we should probably find a way to get it implemented statically so that users don't have to materialize masks.
* fixed bug in blocksparse matmul that caused troubles when layout had a full row/col of zeros
* blocksparse softmax now no longer modifies any data in-place
* blocksparse softmax now takes an is_dense arguments that provides better performance. Passing is_dense=True, is_causal=True is the best way to achieve triangular attention.
* unit tests now test backward pass
2022-02-06 18:00:45 -08:00
Philippe Tillet
69ff52ea1f
[CODEGEN] removed buggy (and mostly useless) optimization in peephole pass ( #449 )
2022-02-05 21:37:23 -08:00
TC
137bb67fad
[LANG] Add fp16 to fp8 conversion ( #444 )
2022-02-02 20:42:09 -08:00
Philippe Tillet
3b20170fa3
Merge pull request #448 from openai/v2.0
...
`v2.0` is now merged into `master`
2022-01-30 20:49:08 -08:00
Philippe Tillet
b0d6e2f322
[STYLE] run autopep
2022-01-30 20:27:44 -08:00
Philippe Tillet
2922dc141c
Merge branch 'master' into v2.0
2022-01-30 20:25:01 -08:00
Philippe Tillet
807d8a1945
[ALL] Merge master ( #447 )
2022-01-30 20:21:20 -08:00
Philippe Tillet
bef76b142a
[BACKEND] float division is now approximate by default ( #446 )
2022-01-29 18:29:29 -08:00
Philippe Tillet
bd52e530a0
[OPS][BLOCKSPARSE] Fix padding issue in DSD LUT ( #445 )
2022-01-28 21:40:30 -08:00
daadaada
e68d6a7776
[BACKEND] Making the warp-level tile "more square" to increase data-reuse for tl.dot. ( #442 )
...
* Increase smem data-reuse for some layouts
* tweak
* Keep the original tiling logic for sm < 80
Co-authored-by: Philippe Tillet <phil@openai.com >
2022-01-27 09:59:54 -08:00
daadaada
59d371c6eb
[BACKEND] Added Int8 mma ( #440 )
2022-01-27 09:12:44 -08:00
Benjamin Lefaudeux
3a23c1dd33
[BACKEND] minor, hotfix for gcc compilation ( #439 )
2022-01-23 14:24:02 -08:00
Philippe Tillet
ccf9abe0ba
[FRONTEND][RANDOM] Improved backward compatibility of RNG ( #438 )
...
The unsigned int PR definitely improved our RNG. However, it requires
different floating point arithmetics which, means the results are not
bit-wise identical to how they were before. This commit revives backward
compatibility, but we should change it back to the "right" way later.
2022-01-21 18:05:55 -08:00
Philippe Tillet
4c97d1ecd7
[FRONTEND] Bunch of fixes here and there ( #436 )
2022-01-20 10:55:59 -08:00
Philippe Tillet
e0c5709cc8
[FRONTEND] Fixed semantics bug on ptr to bool conversions ( #432 )
2022-01-17 18:00:03 -08:00
daadaada
2a944ded53
[TESTS] Added bfloat16 tests ( #430 )
2022-01-13 23:38:32 -08:00
Philippe Tillet
4c94359199
[FRONTEND] Alignment fix-up ( #428 )
2022-01-11 23:11:58 -08:00
Philippe Tillet
bbc78f6516
[FRONTEND][RANDOM] Make sure offset dtype is always uint32 before calling uint32_to_uniform_float ( #427 )
2022-01-11 11:08:49 -08:00
Botao Yu
bf32205edc
[OPS][BLOCKSPARSE] Remove unnecessary loop and add cuda bool layout support ( #425 )
2022-01-11 11:07:16 -08:00
daadaada
94a2e10fe5
[BACKEND] Add bf16 & tf32 mma supports (on A100) ( #426 )
2022-01-11 10:20:31 -08:00
Madeleine Thompson
efdabe6073
[STYLE] check python with flake8 ( #424 )
...
I've been using this locally to find errors without running tests, and now that we're using autopep8, it passes with minimal suppressions. This is also what turned up the issues with the tutorials, which were fixed in #422 .
2022-01-07 15:28:36 -08:00
Madeleine Thompson
a70acfec77
[STYLE] add isort and autopep8 config files and check on CI ( #423 )
...
Also a fix a few more style issues from the "aggressive" mode of autopep8.
2022-01-07 13:11:34 -08:00