Jiabao Lei
abea3dc2c6
[FRONTEND] provide device kwargs && fix fstring error for py<3.8 ( #515 )
...
Co-authored-by: Philippe Tillet <phil@openai.com >
2022-05-14 16:21:46 -07:00
Philippe Tillet
d35617bea1
[BACKEND][CODEGEN] Faster reduction for scanline layout ( #516 )
2022-05-14 15:26:13 -07:00
Jason Ansel
d954a05989
[FRONTEND] Handle torch.uint8 args ( #513 )
...
Co-authored-by: Philippe Tillet <Phil.Tillet@gmail.com >
2022-05-12 13:07:39 -07:00
Philippe Tillet
0835a4fb05
[TUTORIALS] Removed #noformat in layer norm tutorial
2022-05-12 12:41:25 -07:00
Philippe Tillet
c736ba7c3e
[TUTORIALS] Fixed formatting
2022-05-12 12:31:23 -07:00
Philippe Tillet
cd30a99aa2
[TUTORIALS] fixed formatting
2022-05-12 12:28:22 -07:00
Philippe Tillet
d87435e536
[TUTORIALS] Layer norm tutorial now uses residency control ( #510 )
2022-05-05 19:53:54 -07:00
Philippe Tillet
95feb10ec9
[FRONTEND] fixup ( #505 )
2022-04-30 14:25:06 -07:00
Philippe Tillet
11a908655d
[FRONTEND] Fixup
2022-04-29 14:35:09 -07:00
Phil Tillet
cd78ce4888
[FRONTEND] Improved error message when assigning None to non-constexpr
2022-04-29 09:17:54 -07:00
Philippe Tillet
ae2a1ab225
[BACKEND] Alignment pass improvements ( #503 )
2022-04-25 21:16:00 -07:00
Philippe Tillet
3ca792043f
[TEST] Added test for vectorization
2022-04-24 13:50:48 -07:00
Philippe Tillet
bda209002e
[BACKEND][CODEGEN] vectorization bugfix ( #502 )
2022-04-23 13:18:33 -07:00
Philippe Tillet
7d6c504e8d
[TESTING] Added testing utilities for fixing clock and using cuda-memcheck ( #500 )
2022-04-21 22:40:10 -07:00
Philippe Tillet
073be1d2ee
[FRONTEND] check that tensors have power-of-two number of elements ( #499 )
2022-04-14 19:30:02 -07:00
Philippe Tillet
5c7122004c
[TUTORIALS] Tutorial shouldn't expose clock
. Just removed it.
2022-04-14 17:33:44 -07:00
Philippe Tillet
dc4d40faec
[FRONTEND] now mangle constexpr float containing "e-"
2022-04-14 10:26:48 -07:00
Philippe Tillet
25f6689508
[FRONTEND] rename current stream monkey patch ( #495 )
2022-04-13 11:45:55 -07:00
Philippe Tillet
76bfac9f15
[FRONTEND] Improved constexpr handling ( #493 )
2022-04-12 00:02:54 -07:00
Philippe Tillet
14b0fd4cfb
[FRONTEND] Added possibility for users to customize current stream query ( #492 )
2022-04-07 12:11:32 -07:00
Philippe Tillet
9f08ecd684
[FRONTEND] Semantic analysis refactor ( #491 )
...
Moved dispatch.cc to semantic.py (@ptillet)
Integer signedness analysis was moved from C++ to python (@daadaada)
Cleaner frontend types (@daadaada)
Moved SSA construction to a separate object (@ptillet)
Co-authored-by: Yan Da <dyanab@connect.ust.hk >
2022-04-06 16:13:53 -07:00
Philippe Tillet
2bed6fc850
[LANG] Added support for device functions ( #484 )
2022-04-03 20:58:16 -07:00
Philippe Tillet
bace26143d
[TUTORIALS] Removed leftover print
2022-03-28 16:53:23 -07:00
Philippe Tillet
e0cc488055
[FRONTEND] Added tl.clock
and tl.globaltimer
( #485 )
2022-03-28 16:15:43 -07:00
Philippe Tillet
76a9ee50a8
Revert "[FRONTEND] Semantic analysis refactor ( #473 )" ( #483 )
...
This reverts commit 539961072c
.
2022-03-24 17:16:50 -07:00
Keren Zhou
a4f68165cd
[FRONTEND] Hot fix for lineno ( #481 )
...
Override __reduce__ to make CompilationError pickable and print out error messages
2022-03-22 22:09:49 -07:00
daadaada
539961072c
[FRONTEND] Semantic analysis refactor ( #473 )
...
Moved dispatch.cc to semantic.py
Integer signedness now moved from C++ to python
Cleaner frontend type
Co-authored-by: Phil Tillet <phil@openai.com >
2022-03-16 21:25:30 -07:00
Yongjik Kim
0dd2ec2e3a
[FRONTEND] Add an assert in case we get a CPU tensor. ( #478 )
2022-03-16 14:38:56 -07:00
Philippe Tillet
d4d8eaf6c0
[FRONTEND] improved caching mechanism ( #474 )
...
Co-authored-by: Greg Brockman <gdb@gregbrockman.com >
Co-authored-by: Christopher Hesse <christopherhesse@users.noreply.github.com >
2022-03-15 12:20:51 -07:00
Philippe Tillet
98ed7db8c1
[CODEGEN] Improvements and bugfixes ( #463 )
2022-02-24 14:56:24 -08:00
daadaada
a9dfdcaaa9
[FRONTEND] Make the performance model work for int8, tf32, and fp32 ( #456 )
2022-02-11 22:34:42 -08:00
Philippe Tillet
9b100302d3
[FRONTEND] Now using pybind11 to release GIL ( #458 )
2022-02-10 01:57:39 -08:00
Philippe Tillet
7b48340ffd
[CI] Some fixes for the build ( #451 )
2022-02-06 19:11:33 -08:00
Philippe Tillet
5a8a544d10
[OPS][BLOCKSPARSE] Improved robustness, clarity and performance ( #450 )
...
* dds layout now internally re-uses dsd code path for increased code
* at_mask and kp_mask related things are now dropped from the softmax API. I couldn't think of any case where it was needed beyond is_causal. And if there is any, we should probably find a way to get it implemented statically so that users don't have to materialize masks.
* fixed bug in blocksparse matmul that caused troubles when layout had a full row/col of zeros
* blocksparse softmax now no longer modifies any data in-place
* blocksparse softmax now takes an is_dense arguments that provides better performance. Passing is_dense=True, is_causal=True is the best way to achieve triangular attention.
* unit tests now test backward pass
2022-02-06 18:00:45 -08:00
TC
137bb67fad
[LANG] Add fp16 to fp8 conversion ( #444 )
2022-02-02 20:42:09 -08:00
Philippe Tillet
b0d6e2f322
[STYLE] run autopep
2022-01-30 20:27:44 -08:00
Philippe Tillet
2922dc141c
Merge branch 'master' into v2.0
2022-01-30 20:25:01 -08:00
Philippe Tillet
807d8a1945
[ALL] Merge master ( #447 )
2022-01-30 20:21:20 -08:00
Philippe Tillet
bef76b142a
[BACKEND] float division is now approximate by default ( #446 )
2022-01-29 18:29:29 -08:00
Philippe Tillet
bd52e530a0
[OPS][BLOCKSPARSE] Fix padding issue in DSD LUT ( #445 )
2022-01-28 21:40:30 -08:00
daadaada
59d371c6eb
[BACKEND] Added Int8 mma ( #440 )
2022-01-27 09:12:44 -08:00
Philippe Tillet
ccf9abe0ba
[FRONTEND][RANDOM] Improved backward compatibility of RNG ( #438 )
...
The unsigned int PR definitely improved our RNG. However, it requires
different floating point arithmetics which, means the results are not
bit-wise identical to how they were before. This commit revives backward
compatibility, but we should change it back to the "right" way later.
2022-01-21 18:05:55 -08:00
Philippe Tillet
4c97d1ecd7
[FRONTEND] Bunch of fixes here and there ( #436 )
2022-01-20 10:55:59 -08:00
daadaada
2a944ded53
[TESTS] Added bfloat16 tests ( #430 )
2022-01-13 23:38:32 -08:00
Philippe Tillet
4c94359199
[FRONTEND] Alignment fix-up ( #428 )
2022-01-11 23:11:58 -08:00
Philippe Tillet
bbc78f6516
[FRONTEND][RANDOM] Make sure offset dtype is always uint32 before calling uint32_to_uniform_float ( #427 )
2022-01-11 11:08:49 -08:00
Botao Yu
bf32205edc
[OPS][BLOCKSPARSE] Remove unnecessary loop and add cuda bool layout support ( #425 )
2022-01-11 11:07:16 -08:00
daadaada
94a2e10fe5
[BACKEND] Add bf16 & tf32 mma supports (on A100) ( #426 )
2022-01-11 10:20:31 -08:00
Madeleine Thompson
efdabe6073
[STYLE] check python with flake8 ( #424 )
...
I've been using this locally to find errors without running tests, and now that we're using autopep8, it passes with minimal suppressions. This is also what turned up the issues with the tutorials, which were fixed in #422 .
2022-01-07 15:28:36 -08:00
Madeleine Thompson
a70acfec77
[STYLE] add isort and autopep8 config files and check on CI ( #423 )
...
Also a fix a few more style issues from the "aggressive" mode of autopep8.
2022-01-07 13:11:34 -08:00