Da Yan
3e2953f357
Allow multiple_of and max_contiguous to accept n-d values ( #617 )
2022-08-10 09:59:32 -07:00
Daniil Fukalov
cc79376222
Fix deprectaion warning on CreateGEP(Value *, ArrayRef<Value *>, const Twine &) ( #608 )
...
This variant of CreateGEP() is already removed in LLVM 14.
2022-08-07 17:10:18 -07:00
daadaada
9b2bc88d11
[BACKEND] Better bf16 support ( #588 )
2022-07-19 21:22:37 -07:00
Keren Zhou
4912916c11
[FRONTEND] Added support for element-wise function defined in external LLVM bitcode (e.g., libdevice) ( #562 )
2022-07-13 15:52:21 -07:00
Philippe Tillet
4a399a7e40
[BACKEND] Fix some bugs (atomics, a segfault...) ( #577 )
...
This should fix #558 , #573 and #574
2022-07-06 20:03:04 -07:00
Philippe Tillet
f733327ba4
[BACKEND][CODEGEN] Disabling L2 residency control by default ( #570 )
2022-06-29 17:05:13 -07:00
Philippe Tillet
5b4c8f221e
[BACKEND] Compiler improvements ( #557 )
...
This PR adds several optimization capabilities in the compiler backend:
- Now using inline PTX for `tl.store`, making it possible to use things like evict_last
- For A100, mma layout can be directly converted to shared memory
- For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major.
- Fixed liveness analysis; this was broken.
- Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop.
- `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.
2022-06-27 11:49:19 -07:00
Keren Zhou
87413bc925
[BACKEND] Fix layout convert for non-contiguous input ( #564 )
2022-06-25 23:12:03 -07:00
Keren Zhou
b5e728cb14
Add argmin argmax ( #552 )
2022-06-15 13:55:20 -07:00
Jason Ansel
6b9756532f
[BACKEND] Remove print in coalesce.cc ( #551 )
2022-06-15 13:13:20 -07:00
Keren Zhou
93209c07e0
[BACKEND][CODEGEN] Fix reduce uint ( #547 )
2022-06-13 16:43:57 -07:00
Philippe Tillet
58c8889235
[FRONTEND] Fix scanline layout ( #548 )
2022-06-13 16:21:10 -07:00
Mengchi Zhang
2cdc6d35c4
[FRONTEND] Give col_per_thread an initial value to make the compiler happy ( #535 )
...
Signed-off-by: Mengchi Zhang <mengchi@fb.com >
2022-06-06 12:48:23 -07:00
Philippe Tillet
8876e53206
[BACKEND] Restored reduction bugfixes
2022-06-03 11:38:52 -07:00
Philippe Tillet
a60374a597
Revert "[BACKEND] Various bug fixes; making reductions faster ( #533 )".
...
This is a more stable commit that produce bitwise identical code to earlier
versions. Using commits after this one may lead to slightly different numerics
2022-06-03 11:36:06 -07:00
Philippe Tillet
3e7500dfe6
[BACKEND] Various bug fixes; making reductions faster ( #533 )
2022-05-31 17:14:44 -07:00
Philippe Tillet
0e2883020a
[BACKEND] Fixed typo in alignment analysis ( #528 )
2022-05-25 20:01:19 -07:00
Philippe Tillet
d35617bea1
[BACKEND][CODEGEN] Faster reduction for scanline layout ( #516 )
2022-05-14 15:26:13 -07:00
Sriram Murali
7c9bc5a47b
[CODEGEN] Change return type of generator::packed_type to appease build warnings ( #507 )
2022-05-04 20:03:37 -07:00
Philippe Tillet
ae2a1ab225
[BACKEND] Alignment pass improvements ( #503 )
2022-04-25 21:16:00 -07:00
Philippe Tillet
7d544799a0
[BACKEND] Now disabling L2 eviction policy for sm < 80
2022-04-25 09:35:36 -07:00
Philippe Tillet
bda209002e
[BACKEND][CODEGEN] vectorization bugfix ( #502 )
2022-04-23 13:18:33 -07:00
Philippe Tillet
0cc3b1129b
[BACKEND][CODE_GEN] eviction policies now also apply to L2 ( #501 )
2022-04-21 23:56:01 -07:00
Philippe Tillet
2bed6fc850
[LANG] Added support for device functions ( #484 )
2022-04-03 20:58:16 -07:00
Philippe Tillet
e0cc488055
[FRONTEND] Added tl.clock
and tl.globaltimer
( #485 )
2022-03-28 16:15:43 -07:00
Philippe Tillet
a50a47a85b
[CODEGEN] Reverted some changes from previous PR; fixed vectorization characteristics of mma layout ( #469 )
2022-03-04 01:53:31 -08:00
Philippe Tillet
bb5765df5c
[CODEGEN] Now padding shared memory for layout conversion ( #468 )
2022-03-03 22:19:05 -08:00
Philippe Tillet
98ed7db8c1
[CODEGEN] Improvements and bugfixes ( #463 )
2022-02-24 14:56:24 -08:00
Philippe Tillet
69ff52ea1f
[CODEGEN] removed buggy (and mostly useless) optimization in peephole pass ( #449 )
2022-02-05 21:37:23 -08:00
TC
137bb67fad
[LANG] Add fp16 to fp8 conversion ( #444 )
2022-02-02 20:42:09 -08:00
Philippe Tillet
807d8a1945
[ALL] Merge master ( #447 )
2022-01-30 20:21:20 -08:00
Philippe Tillet
bef76b142a
[BACKEND] float division is now approximate by default ( #446 )
2022-01-29 18:29:29 -08:00
daadaada
e68d6a7776
[BACKEND] Making the warp-level tile "more square" to increase data-reuse for tl.dot. ( #442 )
...
* Increase smem data-reuse for some layouts
* tweak
* Keep the original tiling logic for sm < 80
Co-authored-by: Philippe Tillet <phil@openai.com >
2022-01-27 09:59:54 -08:00
daadaada
59d371c6eb
[BACKEND] Added Int8 mma ( #440 )
2022-01-27 09:12:44 -08:00
daadaada
94a2e10fe5
[BACKEND] Add bf16 & tf32 mma supports (on A100) ( #426 )
2022-01-11 10:20:31 -08:00
Philippe Tillet
03f1256f60
[FRONTEND] Added volatile
flag for load ( #407 )
2021-12-30 22:33:24 -08:00
daadaada
39d4bfed83
[OPS] Add performance model for gemm/gemv ( #397 )
...
Significantly improves the performance of `triton.ops.matmul` in memory-bound settings via the use of many more block configs coupled with a performance model to drive the auto-tuning process.
2021-12-21 09:56:10 -08:00
Philippe Tillet
e062812969
[CODEGEN] Disabled peephole for masked load + select -- masked_load
...
doesn't work as expected when vectorized
2021-12-17 12:44:47 -08:00
Philippe Tillet
558555630f
[FRONTEND] Added xor_sum
2021-12-16 17:55:35 -08:00
Madeleine Thompson
e575ae3443
[FRONTEND] Minor accumulated style and warning fixes ( #388 )
...
- Fix some whitespace.
- Make an undeclared dependency on `pytest` explicit.
- Fix deprecated `description-file` use.
- `#ifdef` out a deprecated `PyEval_InitThreads` call.
- Use a slightly different numpy invocation in `test_random.py` to quiet down overflow warnings in tests.
- Fix a deprecated cast in `test_core.py`.
- Suppress a warning about `visit_Constant` in Python 3.9+; we can't migrate yet because it'd break Python 3.6 and 3.7.
- Use chained exceptions for `CompilationError` rather than rolling our own; it makes the error messages nicer.
- Add a `__str__` for `tl.dtype` to make debugging kernels easier; it lets you `print` a dtype to see what type was inferred.
- Fix a few bad escapes.
2021-12-10 15:19:20 -08:00
Philippe Tillet
8ec9f037bb
[BACKEND/CODE_GEN] Fixed float32 matmul problem ( #380 )
2021-11-30 22:00:56 -08:00
Philippe Tillet
e66bf76354
[RUNTIME] Bunch of bugfixes ( #372 )
2021-11-12 00:55:00 -08:00
Philippe Tillet
2acaa4d0dd
[LANG] Added support for constexpr ( #361 )
2021-10-30 00:32:58 -07:00
Philippe Tillet
5ce1b726dc
[CODEGEN] Various bugfixes that make it possible to fuse RNG in a matmul epilogue ( #356 )
2021-10-24 02:30:46 -07:00
daadaada
858dec8372
[CODEGEN] Add cache modifier to tl.load ( #351 )
...
* Add cache modifier to tl.load
* Add comment to cache_modifier
* Remove force_nc_cache
* Update test
2021-10-17 22:14:04 -07:00
Philippe Tillet
9b32075062
[CODEGEN] Some compiler improvements ( #349 )
2021-10-13 17:49:39 -07:00
Stephen McGroarty
c2e6b90ff1
[CODEGEN] Fixes masked load exception ( #342 )
2021-10-13 13:31:52 -07:00
daadaada
9e9d781912
[CODEGEN] Pipeline fixup ( #336 )
2021-10-10 01:47:11 -07:00
Philippe Tillet
5123db0b7d
[LANG] Various (relatively minor) improvements ( #320 )
2021-10-04 18:39:40 -07:00
Philippe Tillet
2c287544cb
[OPS] Faster and cleaner block-sparse implementation ( #311 )
2021-09-27 18:25:16 -07:00