rsanthanam-amd
531ef18cb6
Fix for binop % (mod) unit test failures. ( #13 )
...
If the either data type if fp, then fmod should be used for the
reference computation.
2022-10-28 15:06:17 -04:00
Michael Melesse
6e50f8b2c0
print irs
2022-10-28 17:46:52 +00:00
Michael Melesse
ed9638801a
fix for test_cast
2022-10-26 21:34:58 +00:00
Michael Melesse
8ecab462f6
skip segfaults on ROCM
2022-10-26 20:46:47 +00:00
Michael Melesse
648e4cfe89
skip test_atomic_rmw on rocm
2022-10-26 18:22:23 +00:00
Michael Melesse
0cae0168ec
fix bfloat failure
2022-10-26 17:40:28 +00:00
Michael Melesse
9184b5cf65
add prints
2022-10-24 18:28:28 +00:00
Michael Melesse
4d6d4c9431
hip src
2022-10-17 20:18:44 +00:00
Michael Melesse
5c548fb57e
Merge branch 'master' into rcom52_fixes
2022-10-17 17:53:48 +00:00
Daniil Fukalov
406d03bfaf
Improve ROCm support. ( #780 )
...
- updates to support ROCm 5.2
- workarounds in tests where NV tools were used unconditionally
- implemented `get_num_blocks()` and `add_memfence()` for AMD GPU
- backported from history some atomics
- added bf16 support
- minor warnings cleanup
- added dockerfile to run on a ROCm enabled machine
Co-authored-by: B1tway <andrew.shukshov@gmail.com >
Co-authored-by: Andrey Shukshov <36711069+B1tway@users.noreply.github.com >
2022-10-14 11:33:42 -07:00
Keren Zhou
bc98aead33
[Backend] Fix for mov.u8 ( #766 )
...
Init a potential fix for mov.u8 which is not supported by ptx for now.
Use mov.u16 instead and cast it to u8.
2022-10-12 14:32:27 -07:00
Bin Bao
09cc2d454b
[FRONTEND] Fix a bool tensor storing problem ( #746 )
2022-10-10 12:11:50 -07:00
Natalia Gimelshein
d3c925db8a
[FRONTEND] properly broadcast scalar where condition ( #736 )
2022-10-04 12:44:03 -07:00
fdrocha
2b0f877fad
[RUNTIME] Support environments with multiple cudalibs ( #733 )
2022-10-03 18:36:24 +00:00
Natalia Gimelshein
f55960e773
[FRONTEND] fix broadcasting for where ( #729 )
...
Fixes #532 , all 3 inputs to where have to be broadcast together.
2022-10-01 13:18:47 -07:00
Shintaro Iwasaki
ae59f51c2d
[CODEGEN] Fix an inliner to call a function with a phi-node ( #727 )
2022-09-29 21:36:40 -07:00
Philippe Tillet
4a77dfb042
[FRONTEND] Complete rewrite of the runtime ( #644 )
...
This PR completely rewrites the runtime of Triton to be more lean and
clearly separate the compilation step from the just-in-time caching logic.
This should substantially reduce launch overhead.
2022-09-18 08:51:48 -07:00
Shintaro Iwasaki
c668d6596e
[DOCS] Fix spelling ( #664 )
...
This PR applies minor spelling fix in comments and string literals to
`master`. It shouldn't hurt anything.
2022-09-16 12:26:40 -07:00
Da Yan
437ced38c2
fp8 <> bf16 conversion ( #637 )
...
Co-authored-by: Philippe Tillet <phil@openai.com >
2022-08-30 14:20:12 -07:00
Jason Ansel
e02e56dc63
[FRONTEND] Add missing rfloordiv ( #598 )
...
* [FRONTEND] Add missing rfloordiv
* fix tests
2022-07-23 21:54:12 -07:00
Da Yan
f28caddbf8
[FRONTEND] Allow tl.where to select pointers ( #595 )
2022-07-21 09:54:27 -07:00
daadaada
9b2bc88d11
[BACKEND] Better bf16 support ( #588 )
2022-07-19 21:22:37 -07:00
Keren Zhou
4912916c11
[FRONTEND] Added support for element-wise function defined in external LLVM bitcode (e.g., libdevice) ( #562 )
2022-07-13 15:52:21 -07:00
Philippe Tillet
4a399a7e40
[BACKEND] Fix some bugs (atomics, a segfault...) ( #577 )
...
This should fix #558 , #573 and #574
2022-07-06 20:03:04 -07:00
Keren Zhou
a74cce375f
[FRONTEND] Raise broadcast error ( #555 )
2022-06-30 17:32:07 -07:00
Philippe Tillet
5b4c8f221e
[BACKEND] Compiler improvements ( #557 )
...
This PR adds several optimization capabilities in the compiler backend:
- Now using inline PTX for `tl.store`, making it possible to use things like evict_last
- For A100, mma layout can be directly converted to shared memory
- For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major.
- Fixed liveness analysis; this was broken.
- Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop.
- `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.
2022-06-27 11:49:19 -07:00
Keren Zhou
87413bc925
[BACKEND] Fix layout convert for non-contiguous input ( #564 )
2022-06-25 23:12:03 -07:00
Keren Zhou
b5e728cb14
Add argmin argmax ( #552 )
2022-06-15 13:55:20 -07:00
Keren Zhou
93209c07e0
[BACKEND][CODEGEN] Fix reduce uint ( #547 )
2022-06-13 16:43:57 -07:00
Philippe Tillet
58c8889235
[FRONTEND] Fix scanline layout ( #548 )
2022-06-13 16:21:10 -07:00
Natalia Gimelshein
7094657aa9
[FRONTEND] fix bool conversion of floating types ( #545 )
2022-06-13 15:52:37 -07:00
TC
f13cbaab9f
[FRONTEND] assert that num_warps is a power of 2 ( #539 )
2022-06-06 11:37:08 -07:00
Philippe Tillet
8876e53206
[BACKEND] Restored reduction bugfixes
2022-06-03 11:38:52 -07:00
Philippe Tillet
a60374a597
Revert "[BACKEND] Various bug fixes; making reductions faster ( #533 )".
...
This is a more stable commit that produce bitwise identical code to earlier
versions. Using commits after this one may lead to slightly different numerics
2022-06-03 11:36:06 -07:00
Philippe Tillet
3e7500dfe6
[BACKEND] Various bug fixes; making reductions faster ( #533 )
2022-05-31 17:14:44 -07:00
Philippe Tillet
c82a206684
[FRONTEND] Better dot error message ( #531 )
2022-05-26 17:41:09 -07:00
daadaada
205a493b10
[FRONTEND] Fix a bug in atomic_cas (correct cmp to val) & more tests on atomic_cas ( #520 )
...
Fix a bug in atomic_cas (correct cmp to val) & more tests on atomic_cas
2022-05-21 09:45:54 -07:00
Jiabao Lei
abea3dc2c6
[FRONTEND] provide device kwargs && fix fstring error for py<3.8 ( #515 )
...
Co-authored-by: Philippe Tillet <phil@openai.com >
2022-05-14 16:21:46 -07:00
Philippe Tillet
d35617bea1
[BACKEND][CODEGEN] Faster reduction for scanline layout ( #516 )
2022-05-14 15:26:13 -07:00
Philippe Tillet
ae2a1ab225
[BACKEND] Alignment pass improvements ( #503 )
2022-04-25 21:16:00 -07:00
Philippe Tillet
3ca792043f
[TEST] Added test for vectorization
2022-04-24 13:50:48 -07:00
Philippe Tillet
76bfac9f15
[FRONTEND] Improved constexpr handling ( #493 )
2022-04-12 00:02:54 -07:00
Philippe Tillet
9f08ecd684
[FRONTEND] Semantic analysis refactor ( #491 )
...
Moved dispatch.cc to semantic.py (@ptillet)
Integer signedness analysis was moved from C++ to python (@daadaada)
Cleaner frontend types (@daadaada)
Moved SSA construction to a separate object (@ptillet)
Co-authored-by: Yan Da <dyanab@connect.ust.hk >
2022-04-06 16:13:53 -07:00
Philippe Tillet
2bed6fc850
[LANG] Added support for device functions ( #484 )
2022-04-03 20:58:16 -07:00
Philippe Tillet
76a9ee50a8
Revert "[FRONTEND] Semantic analysis refactor ( #473 )" ( #483 )
...
This reverts commit 539961072c
.
2022-03-24 17:16:50 -07:00
daadaada
539961072c
[FRONTEND] Semantic analysis refactor ( #473 )
...
Moved dispatch.cc to semantic.py
Integer signedness now moved from C++ to python
Cleaner frontend type
Co-authored-by: Phil Tillet <phil@openai.com >
2022-03-16 21:25:30 -07:00
TC
137bb67fad
[LANG] Add fp16 to fp8 conversion ( #444 )
2022-02-02 20:42:09 -08:00
daadaada
59d371c6eb
[BACKEND] Added Int8 mma ( #440 )
2022-01-27 09:12:44 -08:00
Philippe Tillet
4c97d1ecd7
[FRONTEND] Bunch of fixes here and there ( #436 )
2022-01-20 10:55:59 -08:00
daadaada
2a944ded53
[TESTS] Added bfloat16 tests ( #430 )
2022-01-13 23:38:32 -08:00