triton

Author	SHA1	Message	Date
Michael Melesse	4d6d4c9431	hip src	2022-10-17 20:18:44 +00:00
Michael Melesse	5c548fb57e	Merge branch 'master' into rcom52_fixes	2022-10-17 17:53:48 +00:00
Daniil Fukalov	406d03bfaf	Improve ROCm support. (#780 ) - updates to support ROCm 5.2 - workarounds in tests where NV tools were used unconditionally - implemented `get_num_blocks()` and `add_memfence()` for AMD GPU - backported from history some atomics - added bf16 support - minor warnings cleanup - added dockerfile to run on a ROCm enabled machine Co-authored-by: B1tway <andrew.shukshov@gmail.com> Co-authored-by: Andrey Shukshov <36711069+B1tway@users.noreply.github.com>	2022-10-14 11:33:42 -07:00
Keren Zhou	bc98aead33	[Backend] Fix for mov.u8 (#766 ) Init a potential fix for mov.u8 which is not supported by ptx for now. Use mov.u16 instead and cast it to u8.	2022-10-12 14:32:27 -07:00
Yu Guo	71b46acc42	[IR] Added special-purpose `dequantize` instruction (#759 ) It is currently necessary for optimal performance in quantized workloads to add a special-purpose instruction in the IR. Backward compatibility with this instruction is NOT guaranteed.	2022-10-12 14:14:45 -07:00
Bin Bao	09cc2d454b	[FRONTEND] Fix a bool tensor storing problem (#746 )	2022-10-10 12:11:50 -07:00
Natalia Gimelshein	d3c925db8a	[FRONTEND] properly broadcast scalar where condition (#736 )	2022-10-04 12:44:03 -07:00
fdrocha	2b0f877fad	[RUNTIME] Support environments with multiple cudalibs (#733 )	2022-10-03 18:36:24 +00:00
Natalia Gimelshein	f55960e773	[FRONTEND] fix broadcasting for where (#729 ) Fixes #532, all 3 inputs to where have to be broadcast together.	2022-10-01 13:18:47 -07:00
Shintaro Iwasaki	ae59f51c2d	[CODEGEN] Fix an inliner to call a function with a phi-node (#727 )	2022-09-29 21:36:40 -07:00
Jason Ansel	998fd5f9af	[FRONTEND] Make triton.compile work without a cuda context (#708 ) This allows compiling in a subprocess. I'm not seeing a ton of speedup from this, but figure it is a good change anyway.	2022-09-24 13:41:47 -07:00
Philippe Tillet	677ddae618	[FRONTEND] Add warmup for triton.jit() (#684 ) This revives #671 , removing the static functions that may unnecessarily hold a reference to the grid and the JITFunction object Co-authored-by: Jason Ansel <jansel@jansel.net>	2022-09-21 19:13:20 +00:00
Philippe Tillet	7dc2a70edb	Revert "Add .warmup() for triton.jit()" (#682 ) Reverts openai/triton#671 It seems like for some reason this caused out-of-memory errors on some of our internal workloads. I'm reverting this so that HEAD can be used in production at OpenAI, and I will work on digging into this issue asynchronously.	2022-09-20 16:05:14 -07:00
Jason Ansel	93b1adc53b	[FRONTEND] Add .warmup() for triton.jit() (#671 )	2022-09-18 23:09:34 -07:00
Philippe Tillet	4a77dfb042	[FRONTEND] Complete rewrite of the runtime (#644 ) This PR completely rewrites the runtime of Triton to be more lean and clearly separate the compilation step from the just-in-time caching logic. This should substantially reduce launch overhead.	2022-09-18 08:51:48 -07:00
Shintaro Iwasaki	c668d6596e	[DOCS] Fix spelling (#664 ) This PR applies minor spelling fix in comments and string literals to `master`. It shouldn't hurt anything.	2022-09-16 12:26:40 -07:00
Da Yan	437ced38c2	fp8 <> bf16 conversion (#637 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2022-08-30 14:20:12 -07:00
Jason Ansel	027321cdcf	[FRONTEND] Make tl.rand() 1-exclusive (#601 )	2022-07-24 17:47:23 -07:00
Jason Ansel	e02e56dc63	[FRONTEND] Add missing rfloordiv (#598 ) * [FRONTEND] Add missing rfloordiv * fix tests	2022-07-23 21:54:12 -07:00
Da Yan	f28caddbf8	[FRONTEND] Allow tl.where to select pointers (#595 )	2022-07-21 09:54:27 -07:00
Keren Zhou	af85f5fa46	[FRONTEND] Refresh cache when the source code of outlined functions are changed (#590 )	2022-07-20 17:34:07 -07:00
daadaada	9b2bc88d11	[BACKEND] Better bf16 support (#588 )	2022-07-19 21:22:37 -07:00
Keren Zhou	4912916c11	[FRONTEND] Added support for element-wise function defined in external LLVM bitcode (e.g., libdevice) (#562 )	2022-07-13 15:52:21 -07:00
Philippe Tillet	4a399a7e40	[BACKEND] Fix some bugs (atomics, a segfault...) (#577 ) This should fix #558 , #573 and #574	2022-07-06 20:03:04 -07:00
Keren Zhou	a74cce375f	[FRONTEND] Raise broadcast error (#555 )	2022-06-30 17:32:07 -07:00
Philippe Tillet	5b4c8f221e	[BACKEND] Compiler improvements (#557 ) This PR adds several optimization capabilities in the compiler backend: - Now using inline PTX for `tl.store`, making it possible to use things like evict_last - For A100, mma layout can be directly converted to shared memory - For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major. - Fixed liveness analysis; this was broken. - Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop. - `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.	2022-06-27 11:49:19 -07:00
Keren Zhou	87413bc925	[BACKEND] Fix layout convert for non-contiguous input (#564 )	2022-06-25 23:12:03 -07:00
Keren Zhou	b5e728cb14	Add argmin argmax (#552 )	2022-06-15 13:55:20 -07:00
Keren Zhou	93209c07e0	[BACKEND][CODEGEN] Fix reduce uint (#547 )	2022-06-13 16:43:57 -07:00
Philippe Tillet	58c8889235	[FRONTEND] Fix scanline layout (#548 )	2022-06-13 16:21:10 -07:00
Natalia Gimelshein	7094657aa9	[FRONTEND] fix bool conversion of floating types (#545 )	2022-06-13 15:52:37 -07:00
TC	f13cbaab9f	[FRONTEND] assert that num_warps is a power of 2 (#539 )	2022-06-06 11:37:08 -07:00
Philippe Tillet	8876e53206	[BACKEND] Restored reduction bugfixes	2022-06-03 11:38:52 -07:00
Philippe Tillet	a60374a597	Revert "[BACKEND] Various bug fixes; making reductions faster (#533 )". This is a more stable commit that produce bitwise identical code to earlier versions. Using commits after this one may lead to slightly different numerics	2022-06-03 11:36:06 -07:00
Philippe Tillet	3e7500dfe6	[BACKEND] Various bug fixes; making reductions faster (#533 )	2022-05-31 17:14:44 -07:00
Philippe Tillet	c82a206684	[FRONTEND] Better dot error message (#531 )	2022-05-26 17:41:09 -07:00
daadaada	205a493b10	[FRONTEND] Fix a bug in atomic_cas (correct cmp to val) & more tests on atomic_cas (#520 ) Fix a bug in atomic_cas (correct cmp to val) & more tests on atomic_cas	2022-05-21 09:45:54 -07:00
Jiabao Lei	abea3dc2c6	[FRONTEND] provide device kwargs && fix fstring error for py<3.8 (#515 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2022-05-14 16:21:46 -07:00
Philippe Tillet	d35617bea1	[BACKEND][CODEGEN] Faster reduction for scanline layout (#516 )	2022-05-14 15:26:13 -07:00
Philippe Tillet	ae2a1ab225	[BACKEND] Alignment pass improvements (#503 )	2022-04-25 21:16:00 -07:00
Philippe Tillet	3ca792043f	[TEST] Added test for vectorization	2022-04-24 13:50:48 -07:00
Philippe Tillet	bda209002e	[BACKEND][CODEGEN] vectorization bugfix (#502 )	2022-04-23 13:18:33 -07:00
Philippe Tillet	76bfac9f15	[FRONTEND] Improved constexpr handling (#493 )	2022-04-12 00:02:54 -07:00
Philippe Tillet	9f08ecd684	[FRONTEND] Semantic analysis refactor (#491 ) Moved dispatch.cc to semantic.py (@ptillet) Integer signedness analysis was moved from C++ to python (@daadaada) Cleaner frontend types (@daadaada) Moved SSA construction to a separate object (@ptillet) Co-authored-by: Yan Da <dyanab@connect.ust.hk>	2022-04-06 16:13:53 -07:00
Philippe Tillet	2bed6fc850	[LANG] Added support for device functions (#484 )	2022-04-03 20:58:16 -07:00
Philippe Tillet	76a9ee50a8	Revert "[FRONTEND] Semantic analysis refactor (#473 )" (#483 ) This reverts commit `539961072c`.	2022-03-24 17:16:50 -07:00
daadaada	539961072c	[FRONTEND] Semantic analysis refactor (#473 ) Moved dispatch.cc to semantic.py Integer signedness now moved from C++ to python Cleaner frontend type Co-authored-by: Phil Tillet <phil@openai.com>	2022-03-16 21:25:30 -07:00
Philippe Tillet	d4d8eaf6c0	[FRONTEND] improved caching mechanism (#474 ) Co-authored-by: Greg Brockman <gdb@gregbrockman.com> Co-authored-by: Christopher Hesse <christopherhesse@users.noreply.github.com>	2022-03-15 12:20:51 -07:00
Philippe Tillet	5a8a544d10	[OPS][BLOCKSPARSE] Improved robustness, clarity and performance (#450 ) * dds layout now internally re-uses dsd code path for increased code * at_mask and kp_mask related things are now dropped from the softmax API. I couldn't think of any case where it was needed beyond is_causal. And if there is any, we should probably find a way to get it implemented statically so that users don't have to materialize masks. * fixed bug in blocksparse matmul that caused troubles when layout had a full row/col of zeros * blocksparse softmax now no longer modifies any data in-place * blocksparse softmax now takes an is_dense arguments that provides better performance. Passing is_dense=True, is_causal=True is the best way to achieve triangular attention. * unit tests now test backward pass	2022-02-06 18:00:45 -08:00
TC	137bb67fad	[LANG] Add fp16 to fp8 conversion (#444 )	2022-02-02 20:42:09 -08:00

1 2

79 Commits