Shintaro Iwasaki
c668d6596e
[DOCS] Fix spelling ( #664 )
...
This PR applies minor spelling fix in comments and string literals to
`master`. It shouldn't hurt anything.
2022-09-16 12:26:40 -07:00
Da Yan
437ced38c2
fp8 <> bf16 conversion ( #637 )
...
Co-authored-by: Philippe Tillet <phil@openai.com >
2022-08-30 14:20:12 -07:00
Da Yan
210a296699
[BACKEND] bf16 flash-attention ( #636 )
2022-08-26 20:40:55 -07:00
Da Yan
3e2953f357
Allow multiple_of and max_contiguous to accept n-d values ( #617 )
2022-08-10 09:59:32 -07:00
Daniil Fukalov
cc79376222
Fix deprectaion warning on CreateGEP(Value *, ArrayRef<Value *>, const Twine &) ( #608 )
...
This variant of CreateGEP() is already removed in LLVM 14.
2022-08-07 17:10:18 -07:00
Philippe Tillet
ab56d310dd
[BACKEND][IR] Fixed up internal dtype size for booleans (1bit -> 8bit) ( #600 )
2022-07-23 20:08:03 -07:00
daadaada
9b2bc88d11
[BACKEND] Better bf16 support ( #588 )
2022-07-19 21:22:37 -07:00
Keren Zhou
4912916c11
[FRONTEND] Added support for element-wise function defined in external LLVM bitcode (e.g., libdevice) ( #562 )
2022-07-13 15:52:21 -07:00
Philippe Tillet
4a399a7e40
[BACKEND] Fix some bugs (atomics, a segfault...) ( #577 )
...
This should fix #558 , #573 and #574
2022-07-06 20:03:04 -07:00
Philippe Tillet
f733327ba4
[BACKEND][CODEGEN] Disabling L2 residency control by default ( #570 )
2022-06-29 17:05:13 -07:00
Philippe Tillet
5b4c8f221e
[BACKEND] Compiler improvements ( #557 )
...
This PR adds several optimization capabilities in the compiler backend:
- Now using inline PTX for `tl.store`, making it possible to use things like evict_last
- For A100, mma layout can be directly converted to shared memory
- For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major.
- Fixed liveness analysis; this was broken.
- Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop.
- `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.
2022-06-27 11:49:19 -07:00
Keren Zhou
87413bc925
[BACKEND] Fix layout convert for non-contiguous input ( #564 )
2022-06-25 23:12:03 -07:00
Keren Zhou
b5e728cb14
Add argmin argmax ( #552 )
2022-06-15 13:55:20 -07:00
Jason Ansel
6b9756532f
[BACKEND] Remove print in coalesce.cc ( #551 )
2022-06-15 13:13:20 -07:00
Keren Zhou
93209c07e0
[BACKEND][CODEGEN] Fix reduce uint ( #547 )
2022-06-13 16:43:57 -07:00
Philippe Tillet
58c8889235
[FRONTEND] Fix scanline layout ( #548 )
2022-06-13 16:21:10 -07:00
Mengchi Zhang
2cdc6d35c4
[FRONTEND] Give col_per_thread an initial value to make the compiler happy ( #535 )
...
Signed-off-by: Mengchi Zhang <mengchi@fb.com >
2022-06-06 12:48:23 -07:00
Philippe Tillet
8876e53206
[BACKEND] Restored reduction bugfixes
2022-06-03 11:38:52 -07:00
Philippe Tillet
a60374a597
Revert "[BACKEND] Various bug fixes; making reductions faster ( #533 )".
...
This is a more stable commit that produce bitwise identical code to earlier
versions. Using commits after this one may lead to slightly different numerics
2022-06-03 11:36:06 -07:00
Philippe Tillet
3e7500dfe6
[BACKEND] Various bug fixes; making reductions faster ( #533 )
2022-05-31 17:14:44 -07:00
Philippe Tillet
0e2883020a
[BACKEND] Fixed typo in alignment analysis ( #528 )
2022-05-25 20:01:19 -07:00
Philippe Tillet
d35617bea1
[BACKEND][CODEGEN] Faster reduction for scanline layout ( #516 )
2022-05-14 15:26:13 -07:00
Sriram Murali
7c9bc5a47b
[CODEGEN] Change return type of generator::packed_type to appease build warnings ( #507 )
2022-05-04 20:03:37 -07:00
Philippe Tillet
ae2a1ab225
[BACKEND] Alignment pass improvements ( #503 )
2022-04-25 21:16:00 -07:00
Philippe Tillet
7d544799a0
[BACKEND] Now disabling L2 eviction policy for sm < 80
2022-04-25 09:35:36 -07:00
Philippe Tillet
bda209002e
[BACKEND][CODEGEN] vectorization bugfix ( #502 )
2022-04-23 13:18:33 -07:00
Philippe Tillet
0cc3b1129b
[BACKEND][CODE_GEN] eviction policies now also apply to L2 ( #501 )
2022-04-21 23:56:01 -07:00
Philippe Tillet
76bfac9f15
[FRONTEND] Improved constexpr handling ( #493 )
2022-04-12 00:02:54 -07:00
Philippe Tillet
9f08ecd684
[FRONTEND] Semantic analysis refactor ( #491 )
...
Moved dispatch.cc to semantic.py (@ptillet)
Integer signedness analysis was moved from C++ to python (@daadaada)
Cleaner frontend types (@daadaada)
Moved SSA construction to a separate object (@ptillet)
Co-authored-by: Yan Da <dyanab@connect.ust.hk >
2022-04-06 16:13:53 -07:00
Philippe Tillet
2bed6fc850
[LANG] Added support for device functions ( #484 )
2022-04-03 20:58:16 -07:00
apd10
e85c7a7fc7
Bugfix in ptxas path. ( #487 )
...
Bug: "ret" value is destroyed when a failing "ptxas --version" is run
overwriting the previous valid "ret" value.
Fix: keep rets only for those runs which are successful. Pick the first
one
2022-03-30 20:45:41 -07:00
Philippe Tillet
e0cc488055
[FRONTEND] Added tl.clock
and tl.globaltimer
( #485 )
2022-03-28 16:15:43 -07:00
Philippe Tillet
76a9ee50a8
Revert "[FRONTEND] Semantic analysis refactor ( #473 )" ( #483 )
...
This reverts commit 539961072c
.
2022-03-24 17:16:50 -07:00
Philippe Tillet
ea6d1f1b85
[DRIVER] LLVM driver fixup ( #482 )
...
Current way of doing things is probably not super thread safe. init is shared between threads and some threads my not call the LLVMInitialize* function.
2022-03-23 00:24:45 -07:00
daadaada
539961072c
[FRONTEND] Semantic analysis refactor ( #473 )
...
Moved dispatch.cc to semantic.py
Integer signedness now moved from C++ to python
Cleaner frontend type
Co-authored-by: Phil Tillet <phil@openai.com >
2022-03-16 21:25:30 -07:00
Philippe Tillet
a50a47a85b
[CODEGEN] Reverted some changes from previous PR; fixed vectorization characteristics of mma layout ( #469 )
2022-03-04 01:53:31 -08:00
Philippe Tillet
bb5765df5c
[CODEGEN] Now padding shared memory for layout conversion ( #468 )
2022-03-03 22:19:05 -08:00
daadaada
d9dd97492f
Use unique_ptr in ir::context_impl ( #462 )
...
Co-authored-by: Philippe Tillet <Phil.Tillet@gmail.com >
2022-02-24 16:07:10 -08:00
Philippe Tillet
98ed7db8c1
[CODEGEN] Improvements and bugfixes ( #463 )
2022-02-24 14:56:24 -08:00
Philippe Tillet
69ff52ea1f
[CODEGEN] removed buggy (and mostly useless) optimization in peephole pass ( #449 )
2022-02-05 21:37:23 -08:00
TC
137bb67fad
[LANG] Add fp16 to fp8 conversion ( #444 )
2022-02-02 20:42:09 -08:00
Philippe Tillet
2922dc141c
Merge branch 'master' into v2.0
2022-01-30 20:25:01 -08:00
Philippe Tillet
807d8a1945
[ALL] Merge master ( #447 )
2022-01-30 20:21:20 -08:00
Philippe Tillet
bef76b142a
[BACKEND] float division is now approximate by default ( #446 )
2022-01-29 18:29:29 -08:00
daadaada
e68d6a7776
[BACKEND] Making the warp-level tile "more square" to increase data-reuse for tl.dot. ( #442 )
...
* Increase smem data-reuse for some layouts
* tweak
* Keep the original tiling logic for sm < 80
Co-authored-by: Philippe Tillet <phil@openai.com >
2022-01-27 09:59:54 -08:00
daadaada
59d371c6eb
[BACKEND] Added Int8 mma ( #440 )
2022-01-27 09:12:44 -08:00
Philippe Tillet
4c97d1ecd7
[FRONTEND] Bunch of fixes here and there ( #436 )
2022-01-20 10:55:59 -08:00
Philippe Tillet
e0c5709cc8
[FRONTEND] Fixed semantics bug on ptr to bool conversions ( #432 )
2022-01-17 18:00:03 -08:00
daadaada
94a2e10fe5
[BACKEND] Add bf16 & tf32 mma supports (on A100) ( #426 )
2022-01-11 10:20:31 -08:00
Madeleine Thompson
0ab9d67bad
uint8, uint16, uint32, and uint64 in kernels ( #413 )
...
A forthcoming PR will update the RNG to use these types.
Also:
- Add tests for the `//`, `<<`, and `>>` operators.
- Change `TensorWrapper` to unwrap objects when the resulting object would be simpler.
- Clean up `throw_unreachable`, since it was triggering compiler warnings.
2022-01-05 15:27:17 -08:00