vesuppi
22105bc33b
[FRONTEND] Added type check in semantic arange ( #572 )
2022-07-03 15:25:37 -07:00
Keren Zhou
4bf509889b
[BUILD] Change the default build type to Release ( #571 )
2022-07-01 12:17:22 -07:00
Keren Zhou
a74cce375f
[FRONTEND] Raise broadcast error ( #555 )
2022-06-30 17:32:07 -07:00
Philippe Tillet
f733327ba4
[BACKEND][CODEGEN] Disabling L2 residency control by default ( #570 )
2022-06-29 17:05:13 -07:00
Natalia Gimelshein
1bbb2430d9
[TUTORIALS] adjust heuristics for dwdb kernel ( #565 )
2022-06-29 17:00:22 -07:00
Kashif Rasul
1895ceaa2d
[TUTORIAL] Fix f-string for older python ( #569 )
...
fixes issue #568
2022-06-29 09:39:10 -07:00
Philippe Tillet
feb7a2a0dc
[FRONTEND] Hotfix for store
argument order ( #567 )
2022-06-28 00:24:02 -07:00
Philippe Tillet
5b4c8f221e
[BACKEND] Compiler improvements ( #557 )
...
This PR adds several optimization capabilities in the compiler backend:
- Now using inline PTX for `tl.store`, making it possible to use things like evict_last
- For A100, mma layout can be directly converted to shared memory
- For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major.
- Fixed liveness analysis; this was broken.
- Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop.
- `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.
2022-06-27 11:49:19 -07:00
Keren Zhou
87413bc925
[BACKEND] Fix layout convert for non-contiguous input ( #564 )
2022-06-25 23:12:03 -07:00
Keren Zhou
d345ddf837
[DOCS] Separate atomic cas from other atomic operations since operands are very different ( #559 )
2022-06-22 17:51:17 -07:00
Keren Zhou
b02bac41ba
[CI] Change cache dir ( #561 )
2022-06-22 11:44:35 -07:00
Keren Zhou
a428cf0bb2
[FRONTEND] Fix pytorch warning. ( #560 )
...
UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc').
2022-06-20 20:12:09 -07:00
Keren Zhou
b5e728cb14
Add argmin argmax ( #552 )
2022-06-15 13:55:20 -07:00
Jason Ansel
6b9756532f
[BACKEND] Remove print in coalesce.cc ( #551 )
2022-06-15 13:13:20 -07:00
Madeleine Thompson
8ce2c12e33
[PYTHON] move ephemeral files to homedir ( #549 )
...
This prevents potential conflicts with other users on shared machines.
2022-06-13 19:37:52 -07:00
Keren Zhou
93209c07e0
[BACKEND][CODEGEN] Fix reduce uint ( #547 )
2022-06-13 16:43:57 -07:00
Philippe Tillet
58c8889235
[FRONTEND] Fix scanline layout ( #548 )
2022-06-13 16:21:10 -07:00
Natalia Gimelshein
7094657aa9
[FRONTEND] fix bool conversion of floating types ( #545 )
2022-06-13 15:52:37 -07:00
Keren Zhou
38573d1261
[FRONTEND] Return allocated registers and spilled registers for users ( #541 )
2022-06-07 18:37:12 -07:00
Mengchi Zhang
2cdc6d35c4
[FRONTEND] Give col_per_thread an initial value to make the compiler happy ( #535 )
...
Signed-off-by: Mengchi Zhang <mengchi@fb.com >
2022-06-06 12:48:23 -07:00
TC
f13cbaab9f
[FRONTEND] assert that num_warps is a power of 2 ( #539 )
2022-06-06 11:37:08 -07:00
Philippe Tillet
751e325d2e
[TUTORIALS] Fixed typo
2022-06-05 13:33:21 -07:00
Philippe Tillet
801c8a4c92
[TUTORIALS] Fixed typo
2022-06-05 12:32:07 -07:00
Philippe Tillet
8876e53206
[BACKEND] Restored reduction bugfixes
2022-06-03 11:38:52 -07:00
Philippe Tillet
a60374a597
Revert "[BACKEND] Various bug fixes; making reductions faster ( #533 )".
...
This is a more stable commit that produce bitwise identical code to earlier
versions. Using commits after this one may lead to slightly different numerics
2022-06-03 11:36:06 -07:00
Philippe Tillet
efa04cac1f
[FRONTEND] A couple of bugfixes ( #534 )
2022-06-02 16:57:37 -07:00
Philippe Tillet
3e7500dfe6
[BACKEND] Various bug fixes; making reductions faster ( #533 )
2022-05-31 17:14:44 -07:00
Bert Maher
37037bb3be
[FRONTEND] Default cache dir to /tmp/triton_$USER ( #527 )
2022-05-27 13:51:05 -07:00
Philippe Tillet
c82a206684
[FRONTEND] Better dot error message ( #531 )
2022-05-26 17:41:09 -07:00
Philippe Tillet
0e2883020a
[BACKEND] Fixed typo in alignment analysis ( #528 )
2022-05-25 20:01:19 -07:00
Bert Maher
43fec2adca
[FRONTEND] Add binding for create_int_to_ptr ( #526 )
2022-05-25 15:26:18 -07:00
Philippe Tillet
011bc83c1b
[FRONTEND] For loops now promote initial value ( #524 )
2022-05-24 13:20:10 -07:00
Natalia Gimelshein
96bff90471
[FRONTEND] faster jit function launch ( #523 )
...
With fast (200 ns) get_stream function soon to be available from pytorch this shaves off approx 25-30 us from function launch, but even without that function due to caching device properties we are saving ~15-20us.
2022-05-24 12:08:49 -07:00
daadaada
d5eaa8dfa0
Making the generated Triton IR deterministic & a script to compare cached assembly ( #522 )
2022-05-24 08:56:36 -07:00
Shantanu
80f6a2698b
[FRONTEND] Ensure version_key is called at most once ( #519 )
...
Co-authored-by: hauntsaninja <>
2022-05-23 13:40:08 -07:00
daadaada
205a493b10
[FRONTEND] Fix a bug in atomic_cas (correct cmp to val) & more tests on atomic_cas ( #520 )
...
Fix a bug in atomic_cas (correct cmp to val) & more tests on atomic_cas
2022-05-21 09:45:54 -07:00
Jiabao Lei
abea3dc2c6
[FRONTEND] provide device kwargs && fix fstring error for py<3.8 ( #515 )
...
Co-authored-by: Philippe Tillet <phil@openai.com >
2022-05-14 16:21:46 -07:00
Philippe Tillet
d35617bea1
[BACKEND][CODEGEN] Faster reduction for scanline layout ( #516 )
2022-05-14 15:26:13 -07:00
Mengchi Zhang
d1a22a94e6
[FRONTEND] Add empty return value and remove protect to open the access to contained_tys_vec_t ( #514 )
...
Signed-off-by: Mengchi Zhang <mengchi@fb.com >
2022-05-13 11:46:12 -07:00
Jason Ansel
d954a05989
[FRONTEND] Handle torch.uint8 args ( #513 )
...
Co-authored-by: Philippe Tillet <Phil.Tillet@gmail.com >
2022-05-12 13:07:39 -07:00
Philippe Tillet
0835a4fb05
[TUTORIALS] Removed #noformat in layer norm tutorial
2022-05-12 12:41:25 -07:00
Philippe Tillet
c736ba7c3e
[TUTORIALS] Fixed formatting
2022-05-12 12:31:23 -07:00
Philippe Tillet
cd30a99aa2
[TUTORIALS] fixed formatting
2022-05-12 12:28:22 -07:00
Philippe Tillet
d87435e536
[TUTORIALS] Layer norm tutorial now uses residency control ( #510 )
2022-05-05 19:53:54 -07:00
Sriram Murali
7c9bc5a47b
[CODEGEN] Change return type of generator::packed_type to appease build warnings ( #507 )
2022-05-04 20:03:37 -07:00
Philippe Tillet
95feb10ec9
[FRONTEND] fixup ( #505 )
2022-04-30 14:25:06 -07:00
Philippe Tillet
11a908655d
[FRONTEND] Fixup
2022-04-29 14:35:09 -07:00
Phil Tillet
cd78ce4888
[FRONTEND] Improved error message when assigning None to non-constexpr
2022-04-29 09:17:54 -07:00
Philippe Tillet
ae2a1ab225
[BACKEND] Alignment pass improvements ( #503 )
2022-04-25 21:16:00 -07:00
Philippe Tillet
7d544799a0
[BACKEND] Now disabling L2 eviction policy for sm < 80
2022-04-25 09:35:36 -07:00