Philippe Tillet
073be1d2ee
[FRONTEND] check that tensors have power-of-two number of elements ( #499 )
2022-04-14 19:30:02 -07:00
Philippe Tillet
5c7122004c
[TUTORIALS] Tutorial shouldn't expose clock
. Just removed it.
2022-04-14 17:33:44 -07:00
Philippe Tillet
dc4d40faec
[FRONTEND] now mangle constexpr float containing "e-"
2022-04-14 10:26:48 -07:00
Philippe Tillet
25f6689508
[FRONTEND] rename current stream monkey patch ( #495 )
2022-04-13 11:45:55 -07:00
Philippe Tillet
76bfac9f15
[FRONTEND] Improved constexpr handling ( #493 )
2022-04-12 00:02:54 -07:00
Philippe Tillet
14b0fd4cfb
[FRONTEND] Added possibility for users to customize current stream query ( #492 )
2022-04-07 12:11:32 -07:00
Philippe Tillet
6424771f55
[CI] Documentation fixup
2022-04-07 09:42:35 -07:00
Philippe Tillet
9f08ecd684
[FRONTEND] Semantic analysis refactor ( #491 )
...
Moved dispatch.cc to semantic.py (@ptillet)
Integer signedness analysis was moved from C++ to python (@daadaada)
Cleaner frontend types (@daadaada)
Moved SSA construction to a separate object (@ptillet)
Co-authored-by: Yan Da <dyanab@connect.ust.hk >
2022-04-06 16:13:53 -07:00
Philippe Tillet
2bed6fc850
[LANG] Added support for device functions ( #484 )
2022-04-03 20:58:16 -07:00
apd10
e85c7a7fc7
Bugfix in ptxas path. ( #487 )
...
Bug: "ret" value is destroyed when a failing "ptxas --version" is run
overwriting the previous valid "ret" value.
Fix: keep rets only for those runs which are successful. Pick the first
one
2022-03-30 20:45:41 -07:00
Philippe Tillet
bace26143d
[TUTORIALS] Removed leftover print
2022-03-28 16:53:23 -07:00
Philippe Tillet
e0cc488055
[FRONTEND] Added tl.clock
and tl.globaltimer
( #485 )
2022-03-28 16:15:43 -07:00
Philippe Tillet
76a9ee50a8
Revert "[FRONTEND] Semantic analysis refactor ( #473 )" ( #483 )
...
This reverts commit 539961072c
.
2022-03-24 17:16:50 -07:00
Philippe Tillet
ea6d1f1b85
[DRIVER] LLVM driver fixup ( #482 )
...
Current way of doing things is probably not super thread safe. init is shared between threads and some threads my not call the LLVMInitialize* function.
2022-03-23 00:24:45 -07:00
Keren Zhou
a4f68165cd
[FRONTEND] Hot fix for lineno ( #481 )
...
Override __reduce__ to make CompilationError pickable and print out error messages
2022-03-22 22:09:49 -07:00
daadaada
539961072c
[FRONTEND] Semantic analysis refactor ( #473 )
...
Moved dispatch.cc to semantic.py
Integer signedness now moved from C++ to python
Cleaner frontend type
Co-authored-by: Phil Tillet <phil@openai.com >
2022-03-16 21:25:30 -07:00
Yongjik Kim
0dd2ec2e3a
[FRONTEND] Add an assert in case we get a CPU tensor. ( #478 )
2022-03-16 14:38:56 -07:00
Philippe Tillet
d4d8eaf6c0
[FRONTEND] improved caching mechanism ( #474 )
...
Co-authored-by: Greg Brockman <gdb@gregbrockman.com >
Co-authored-by: Christopher Hesse <christopherhesse@users.noreply.github.com >
2022-03-15 12:20:51 -07:00
Doğukan Tuna
21f8a0646d
[DOCS] Minor README.md ( #470 )
...
Added binary distribution for quick installation
2022-03-05 00:50:37 -08:00
Philippe Tillet
a50a47a85b
[CODEGEN] Reverted some changes from previous PR; fixed vectorization characteristics of mma layout ( #469 )
2022-03-04 01:53:31 -08:00
Philippe Tillet
bb5765df5c
[CODEGEN] Now padding shared memory for layout conversion ( #468 )
2022-03-03 22:19:05 -08:00
daadaada
d9dd97492f
Use unique_ptr in ir::context_impl ( #462 )
...
Co-authored-by: Philippe Tillet <Phil.Tillet@gmail.com >
2022-02-24 16:07:10 -08:00
Philippe Tillet
98ed7db8c1
[CODEGEN] Improvements and bugfixes ( #463 )
2022-02-24 14:56:24 -08:00
daadaada
a9dfdcaaa9
[FRONTEND] Make the performance model work for int8, tf32, and fp32 ( #456 )
2022-02-11 22:34:42 -08:00
Philippe Tillet
9b100302d3
[FRONTEND] Now using pybind11 to release GIL ( #458 )
2022-02-10 01:57:39 -08:00
Philippe Tillet
40093a9878
[DOCS] Multiple versions are now supported ( #457 )
2022-02-09 01:32:41 -08:00
Philippe Tillet
4941bc7001
[DOCS] Some more fixes ( #455 )
2022-02-08 16:53:56 -08:00
Philippe Tillet
2fdf0a4fe8
[DOCS] changed build command
2022-02-08 11:45:21 -08:00
Philippe Tillet
077d6c8ff0
[DOCS] re-activated tutorials
2022-02-08 11:42:39 -08:00
Philippe Tillet
822ddcd14b
[DOCS] Added versioning ( #453 )
2022-02-08 11:28:18 -08:00
Philippe Tillet
7b48340ffd
[CI] Some fixes for the build ( #451 )
2022-02-06 19:11:33 -08:00
Philippe Tillet
5a8a544d10
[OPS][BLOCKSPARSE] Improved robustness, clarity and performance ( #450 )
...
* dds layout now internally re-uses dsd code path for increased code
* at_mask and kp_mask related things are now dropped from the softmax API. I couldn't think of any case where it was needed beyond is_causal. And if there is any, we should probably find a way to get it implemented statically so that users don't have to materialize masks.
* fixed bug in blocksparse matmul that caused troubles when layout had a full row/col of zeros
* blocksparse softmax now no longer modifies any data in-place
* blocksparse softmax now takes an is_dense arguments that provides better performance. Passing is_dense=True, is_causal=True is the best way to achieve triangular attention.
* unit tests now test backward pass
2022-02-06 18:00:45 -08:00
Philippe Tillet
69ff52ea1f
[CODEGEN] removed buggy (and mostly useless) optimization in peephole pass ( #449 )
2022-02-05 21:37:23 -08:00
TC
137bb67fad
[LANG] Add fp16 to fp8 conversion ( #444 )
2022-02-02 20:42:09 -08:00
Philippe Tillet
3b20170fa3
Merge pull request #448 from openai/v2.0
...
`v2.0` is now merged into `master`
2022-01-30 20:49:08 -08:00
Philippe Tillet
b0d6e2f322
[STYLE] run autopep
2022-01-30 20:27:44 -08:00
Philippe Tillet
2922dc141c
Merge branch 'master' into v2.0
2022-01-30 20:25:01 -08:00
Philippe Tillet
807d8a1945
[ALL] Merge master ( #447 )
2022-01-30 20:21:20 -08:00
Philippe Tillet
bef76b142a
[BACKEND] float division is now approximate by default ( #446 )
2022-01-29 18:29:29 -08:00
Philippe Tillet
bd52e530a0
[OPS][BLOCKSPARSE] Fix padding issue in DSD LUT ( #445 )
2022-01-28 21:40:30 -08:00
daadaada
e68d6a7776
[BACKEND] Making the warp-level tile "more square" to increase data-reuse for tl.dot. ( #442 )
...
* Increase smem data-reuse for some layouts
* tweak
* Keep the original tiling logic for sm < 80
Co-authored-by: Philippe Tillet <phil@openai.com >
2022-01-27 09:59:54 -08:00
daadaada
59d371c6eb
[BACKEND] Added Int8 mma ( #440 )
2022-01-27 09:12:44 -08:00
Benjamin Lefaudeux
3a23c1dd33
[BACKEND] minor, hotfix for gcc compilation ( #439 )
2022-01-23 14:24:02 -08:00
Philippe Tillet
ccf9abe0ba
[FRONTEND][RANDOM] Improved backward compatibility of RNG ( #438 )
...
The unsigned int PR definitely improved our RNG. However, it requires
different floating point arithmetics which, means the results are not
bit-wise identical to how they were before. This commit revives backward
compatibility, but we should change it back to the "right" way later.
2022-01-21 18:05:55 -08:00
Philippe Tillet
4c97d1ecd7
[FRONTEND] Bunch of fixes here and there ( #436 )
2022-01-20 10:55:59 -08:00
Philippe Tillet
e0c5709cc8
[FRONTEND] Fixed semantics bug on ptr to bool conversions ( #432 )
2022-01-17 18:00:03 -08:00
daadaada
2a944ded53
[TESTS] Added bfloat16 tests ( #430 )
2022-01-13 23:38:32 -08:00
Philippe Tillet
4c94359199
[FRONTEND] Alignment fix-up ( #428 )
2022-01-11 23:11:58 -08:00
Philippe Tillet
bbc78f6516
[FRONTEND][RANDOM] Make sure offset dtype is always uint32 before calling uint32_to_uniform_float ( #427 )
2022-01-11 11:08:49 -08:00
Botao Yu
bf32205edc
[OPS][BLOCKSPARSE] Remove unnecessary loop and add cuda bool layout support ( #425 )
2022-01-11 11:07:16 -08:00