Philippe Tillet
751e325d2e
[TUTORIALS] Fixed typo
2022-06-05 13:33:21 -07:00
Philippe Tillet
801c8a4c92
[TUTORIALS] Fixed typo
2022-06-05 12:32:07 -07:00
Philippe Tillet
8876e53206
[BACKEND] Restored reduction bugfixes
2022-06-03 11:38:52 -07:00
Philippe Tillet
a60374a597
Revert "[BACKEND] Various bug fixes; making reductions faster ( #533 )".
...
This is a more stable commit that produce bitwise identical code to earlier
versions. Using commits after this one may lead to slightly different numerics
2022-06-03 11:36:06 -07:00
Philippe Tillet
efa04cac1f
[FRONTEND] A couple of bugfixes ( #534 )
2022-06-02 16:57:37 -07:00
Philippe Tillet
3e7500dfe6
[BACKEND] Various bug fixes; making reductions faster ( #533 )
2022-05-31 17:14:44 -07:00
Bert Maher
37037bb3be
[FRONTEND] Default cache dir to /tmp/triton_$USER ( #527 )
2022-05-27 13:51:05 -07:00
Philippe Tillet
c82a206684
[FRONTEND] Better dot error message ( #531 )
2022-05-26 17:41:09 -07:00
Philippe Tillet
0e2883020a
[BACKEND] Fixed typo in alignment analysis ( #528 )
2022-05-25 20:01:19 -07:00
Bert Maher
43fec2adca
[FRONTEND] Add binding for create_int_to_ptr ( #526 )
2022-05-25 15:26:18 -07:00
Philippe Tillet
011bc83c1b
[FRONTEND] For loops now promote initial value ( #524 )
2022-05-24 13:20:10 -07:00
Natalia Gimelshein
96bff90471
[FRONTEND] faster jit function launch ( #523 )
...
With fast (200 ns) get_stream function soon to be available from pytorch this shaves off approx 25-30 us from function launch, but even without that function due to caching device properties we are saving ~15-20us.
2022-05-24 12:08:49 -07:00
daadaada
d5eaa8dfa0
Making the generated Triton IR deterministic & a script to compare cached assembly ( #522 )
2022-05-24 08:56:36 -07:00
Shantanu
80f6a2698b
[FRONTEND] Ensure version_key is called at most once ( #519 )
...
Co-authored-by: hauntsaninja <>
2022-05-23 13:40:08 -07:00
daadaada
205a493b10
[FRONTEND] Fix a bug in atomic_cas (correct cmp to val) & more tests on atomic_cas ( #520 )
...
Fix a bug in atomic_cas (correct cmp to val) & more tests on atomic_cas
2022-05-21 09:45:54 -07:00
Jiabao Lei
abea3dc2c6
[FRONTEND] provide device kwargs && fix fstring error for py<3.8 ( #515 )
...
Co-authored-by: Philippe Tillet <phil@openai.com >
2022-05-14 16:21:46 -07:00
Philippe Tillet
d35617bea1
[BACKEND][CODEGEN] Faster reduction for scanline layout ( #516 )
2022-05-14 15:26:13 -07:00
Mengchi Zhang
d1a22a94e6
[FRONTEND] Add empty return value and remove protect to open the access to contained_tys_vec_t ( #514 )
...
Signed-off-by: Mengchi Zhang <mengchi@fb.com >
2022-05-13 11:46:12 -07:00
Jason Ansel
d954a05989
[FRONTEND] Handle torch.uint8 args ( #513 )
...
Co-authored-by: Philippe Tillet <Phil.Tillet@gmail.com >
2022-05-12 13:07:39 -07:00
Philippe Tillet
0835a4fb05
[TUTORIALS] Removed #noformat in layer norm tutorial
2022-05-12 12:41:25 -07:00
Philippe Tillet
c736ba7c3e
[TUTORIALS] Fixed formatting
2022-05-12 12:31:23 -07:00
Philippe Tillet
cd30a99aa2
[TUTORIALS] fixed formatting
2022-05-12 12:28:22 -07:00
Philippe Tillet
d87435e536
[TUTORIALS] Layer norm tutorial now uses residency control ( #510 )
2022-05-05 19:53:54 -07:00
Sriram Murali
7c9bc5a47b
[CODEGEN] Change return type of generator::packed_type to appease build warnings ( #507 )
2022-05-04 20:03:37 -07:00
Philippe Tillet
95feb10ec9
[FRONTEND] fixup ( #505 )
2022-04-30 14:25:06 -07:00
Philippe Tillet
11a908655d
[FRONTEND] Fixup
2022-04-29 14:35:09 -07:00
Phil Tillet
cd78ce4888
[FRONTEND] Improved error message when assigning None to non-constexpr
2022-04-29 09:17:54 -07:00
Philippe Tillet
ae2a1ab225
[BACKEND] Alignment pass improvements ( #503 )
2022-04-25 21:16:00 -07:00
Philippe Tillet
7d544799a0
[BACKEND] Now disabling L2 eviction policy for sm < 80
2022-04-25 09:35:36 -07:00
Philippe Tillet
3ca792043f
[TEST] Added test for vectorization
2022-04-24 13:50:48 -07:00
Philippe Tillet
bda209002e
[BACKEND][CODEGEN] vectorization bugfix ( #502 )
2022-04-23 13:18:33 -07:00
Philippe Tillet
0cc3b1129b
[BACKEND][CODE_GEN] eviction policies now also apply to L2 ( #501 )
2022-04-21 23:56:01 -07:00
Philippe Tillet
7d6c504e8d
[TESTING] Added testing utilities for fixing clock and using cuda-memcheck ( #500 )
2022-04-21 22:40:10 -07:00
Philippe Tillet
073be1d2ee
[FRONTEND] check that tensors have power-of-two number of elements ( #499 )
2022-04-14 19:30:02 -07:00
Philippe Tillet
5c7122004c
[TUTORIALS] Tutorial shouldn't expose clock
. Just removed it.
2022-04-14 17:33:44 -07:00
Philippe Tillet
dc4d40faec
[FRONTEND] now mangle constexpr float containing "e-"
2022-04-14 10:26:48 -07:00
Philippe Tillet
25f6689508
[FRONTEND] rename current stream monkey patch ( #495 )
2022-04-13 11:45:55 -07:00
Philippe Tillet
76bfac9f15
[FRONTEND] Improved constexpr handling ( #493 )
2022-04-12 00:02:54 -07:00
Philippe Tillet
14b0fd4cfb
[FRONTEND] Added possibility for users to customize current stream query ( #492 )
2022-04-07 12:11:32 -07:00
Philippe Tillet
6424771f55
[CI] Documentation fixup
2022-04-07 09:42:35 -07:00
Philippe Tillet
9f08ecd684
[FRONTEND] Semantic analysis refactor ( #491 )
...
Moved dispatch.cc to semantic.py (@ptillet)
Integer signedness analysis was moved from C++ to python (@daadaada)
Cleaner frontend types (@daadaada)
Moved SSA construction to a separate object (@ptillet)
Co-authored-by: Yan Da <dyanab@connect.ust.hk >
2022-04-06 16:13:53 -07:00
Philippe Tillet
2bed6fc850
[LANG] Added support for device functions ( #484 )
2022-04-03 20:58:16 -07:00
apd10
e85c7a7fc7
Bugfix in ptxas path. ( #487 )
...
Bug: "ret" value is destroyed when a failing "ptxas --version" is run
overwriting the previous valid "ret" value.
Fix: keep rets only for those runs which are successful. Pick the first
one
2022-03-30 20:45:41 -07:00
Philippe Tillet
bace26143d
[TUTORIALS] Removed leftover print
2022-03-28 16:53:23 -07:00
Philippe Tillet
e0cc488055
[FRONTEND] Added tl.clock
and tl.globaltimer
( #485 )
2022-03-28 16:15:43 -07:00
Philippe Tillet
76a9ee50a8
Revert "[FRONTEND] Semantic analysis refactor ( #473 )" ( #483 )
...
This reverts commit 539961072c
.
2022-03-24 17:16:50 -07:00
Philippe Tillet
ea6d1f1b85
[DRIVER] LLVM driver fixup ( #482 )
...
Current way of doing things is probably not super thread safe. init is shared between threads and some threads my not call the LLVMInitialize* function.
2022-03-23 00:24:45 -07:00
Keren Zhou
a4f68165cd
[FRONTEND] Hot fix for lineno ( #481 )
...
Override __reduce__ to make CompilationError pickable and print out error messages
2022-03-22 22:09:49 -07:00
daadaada
539961072c
[FRONTEND] Semantic analysis refactor ( #473 )
...
Moved dispatch.cc to semantic.py
Integer signedness now moved from C++ to python
Cleaner frontend type
Co-authored-by: Phil Tillet <phil@openai.com >
2022-03-16 21:25:30 -07:00
Yongjik Kim
0dd2ec2e3a
[FRONTEND] Add an assert in case we get a CPU tensor. ( #478 )
2022-03-16 14:38:56 -07:00