triton

Author	SHA1	Message	Date
Michael Melesse	d024f0cfb8	update test_dot to use float 32	2022-10-31 18:58:10 +00:00
Michael Melesse	9b3f2487b5	fix minor bug	2022-10-31 18:33:47 +00:00
Michael Melesse	15683986cd	unskip most bfloat tests	2022-10-31 18:04:54 +00:00
Michael Melesse	8d9572bc63	add similar fixes two addition tests	2022-10-28 20:34:58 +00:00
Michael Melesse	ffb30cdc52	skip ptx assert	2022-10-28 20:23:11 +00:00
rsanthanam-amd	531ef18cb6	Fix for binop % (mod) unit test failures. (#13 ) If the either data type if fp, then fmod should be used for the reference computation.	2022-10-28 15:06:17 -04:00
Michael Melesse	6e50f8b2c0	print irs	2022-10-28 17:46:52 +00:00
Michael Melesse	ed9638801a	fix for test_cast	2022-10-26 21:34:58 +00:00
Michael Melesse	8ecab462f6	skip segfaults on ROCM	2022-10-26 20:46:47 +00:00
Michael Melesse	648e4cfe89	skip test_atomic_rmw on rocm	2022-10-26 18:22:23 +00:00
Michael Melesse	0cae0168ec	fix bfloat failure	2022-10-26 17:40:28 +00:00
Michael Melesse	9184b5cf65	add prints	2022-10-24 18:28:28 +00:00
Michael Melesse	4d6d4c9431	hip src	2022-10-17 20:18:44 +00:00
Michael Melesse	5c548fb57e	Merge branch 'master' into rcom52_fixes	2022-10-17 17:53:48 +00:00
Daniil Fukalov	406d03bfaf	Improve ROCm support. (#780 ) - updates to support ROCm 5.2 - workarounds in tests where NV tools were used unconditionally - implemented `get_num_blocks()` and `add_memfence()` for AMD GPU - backported from history some atomics - added bf16 support - minor warnings cleanup - added dockerfile to run on a ROCm enabled machine Co-authored-by: B1tway <andrew.shukshov@gmail.com> Co-authored-by: Andrey Shukshov <36711069+B1tway@users.noreply.github.com>	2022-10-14 11:33:42 -07:00
Keren Zhou	bc98aead33	[Backend] Fix for mov.u8 (#766 ) Init a potential fix for mov.u8 which is not supported by ptx for now. Use mov.u16 instead and cast it to u8.	2022-10-12 14:32:27 -07:00
Yu Guo	71b46acc42	[IR] Added special-purpose `dequantize` instruction (#759 ) It is currently necessary for optimal performance in quantized workloads to add a special-purpose instruction in the IR. Backward compatibility with this instruction is NOT guaranteed.	2022-10-12 14:14:45 -07:00
Bin Bao	09cc2d454b	[FRONTEND] Fix a bool tensor storing problem (#746 )	2022-10-10 12:11:50 -07:00
Natalia Gimelshein	d3c925db8a	[FRONTEND] properly broadcast scalar where condition (#736 )	2022-10-04 12:44:03 -07:00
fdrocha	2b0f877fad	[RUNTIME] Support environments with multiple cudalibs (#733 )	2022-10-03 18:36:24 +00:00
Natalia Gimelshein	f55960e773	[FRONTEND] fix broadcasting for where (#729 ) Fixes #532, all 3 inputs to where have to be broadcast together.	2022-10-01 13:18:47 -07:00
Shintaro Iwasaki	ae59f51c2d	[CODEGEN] Fix an inliner to call a function with a phi-node (#727 )	2022-09-29 21:36:40 -07:00
Philippe Tillet	4a77dfb042	[FRONTEND] Complete rewrite of the runtime (#644 ) This PR completely rewrites the runtime of Triton to be more lean and clearly separate the compilation step from the just-in-time caching logic. This should substantially reduce launch overhead.	2022-09-18 08:51:48 -07:00
Shintaro Iwasaki	c668d6596e	[DOCS] Fix spelling (#664 ) This PR applies minor spelling fix in comments and string literals to `master`. It shouldn't hurt anything.	2022-09-16 12:26:40 -07:00
Da Yan	437ced38c2	fp8 <> bf16 conversion (#637 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2022-08-30 14:20:12 -07:00
Jason Ansel	027321cdcf	[FRONTEND] Make tl.rand() 1-exclusive (#601 )	2022-07-24 17:47:23 -07:00
Jason Ansel	e02e56dc63	[FRONTEND] Add missing rfloordiv (#598 ) * [FRONTEND] Add missing rfloordiv * fix tests	2022-07-23 21:54:12 -07:00
Da Yan	f28caddbf8	[FRONTEND] Allow tl.where to select pointers (#595 )	2022-07-21 09:54:27 -07:00
daadaada	9b2bc88d11	[BACKEND] Better bf16 support (#588 )	2022-07-19 21:22:37 -07:00
Keren Zhou	4912916c11	[FRONTEND] Added support for element-wise function defined in external LLVM bitcode (e.g., libdevice) (#562 )	2022-07-13 15:52:21 -07:00
Philippe Tillet	4a399a7e40	[BACKEND] Fix some bugs (atomics, a segfault...) (#577 ) This should fix #558 , #573 and #574	2022-07-06 20:03:04 -07:00
Keren Zhou	a74cce375f	[FRONTEND] Raise broadcast error (#555 )	2022-06-30 17:32:07 -07:00
Philippe Tillet	5b4c8f221e	[BACKEND] Compiler improvements (#557 ) This PR adds several optimization capabilities in the compiler backend: - Now using inline PTX for `tl.store`, making it possible to use things like evict_last - For A100, mma layout can be directly converted to shared memory - For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major. - Fixed liveness analysis; this was broken. - Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop. - `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.	2022-06-27 11:49:19 -07:00
Keren Zhou	87413bc925	[BACKEND] Fix layout convert for non-contiguous input (#564 )	2022-06-25 23:12:03 -07:00
Keren Zhou	b5e728cb14	Add argmin argmax (#552 )	2022-06-15 13:55:20 -07:00
Keren Zhou	93209c07e0	[BACKEND][CODEGEN] Fix reduce uint (#547 )	2022-06-13 16:43:57 -07:00
Philippe Tillet	58c8889235	[FRONTEND] Fix scanline layout (#548 )	2022-06-13 16:21:10 -07:00
Natalia Gimelshein	7094657aa9	[FRONTEND] fix bool conversion of floating types (#545 )	2022-06-13 15:52:37 -07:00
TC	f13cbaab9f	[FRONTEND] assert that num_warps is a power of 2 (#539 )	2022-06-06 11:37:08 -07:00
Philippe Tillet	8876e53206	[BACKEND] Restored reduction bugfixes	2022-06-03 11:38:52 -07:00
Philippe Tillet	a60374a597	Revert "[BACKEND] Various bug fixes; making reductions faster (#533 )". This is a more stable commit that produce bitwise identical code to earlier versions. Using commits after this one may lead to slightly different numerics	2022-06-03 11:36:06 -07:00
Philippe Tillet	3e7500dfe6	[BACKEND] Various bug fixes; making reductions faster (#533 )	2022-05-31 17:14:44 -07:00
Philippe Tillet	c82a206684	[FRONTEND] Better dot error message (#531 )	2022-05-26 17:41:09 -07:00
daadaada	205a493b10	[FRONTEND] Fix a bug in atomic_cas (correct cmp to val) & more tests on atomic_cas (#520 ) Fix a bug in atomic_cas (correct cmp to val) & more tests on atomic_cas	2022-05-21 09:45:54 -07:00
Jiabao Lei	abea3dc2c6	[FRONTEND] provide device kwargs && fix fstring error for py<3.8 (#515 ) Co-authored-by: Philippe Tillet <phil@openai.com>	2022-05-14 16:21:46 -07:00
Philippe Tillet	d35617bea1	[BACKEND][CODEGEN] Faster reduction for scanline layout (#516 )	2022-05-14 15:26:13 -07:00
Philippe Tillet	ae2a1ab225	[BACKEND] Alignment pass improvements (#503 )	2022-04-25 21:16:00 -07:00
Philippe Tillet	3ca792043f	[TEST] Added test for vectorization	2022-04-24 13:50:48 -07:00
Philippe Tillet	76bfac9f15	[FRONTEND] Improved constexpr handling (#493 )	2022-04-12 00:02:54 -07:00
Philippe Tillet	9f08ecd684	[FRONTEND] Semantic analysis refactor (#491 ) Moved dispatch.cc to semantic.py (@ptillet) Integer signedness analysis was moved from C++ to python (@daadaada) Cleaner frontend types (@daadaada) Moved SSA construction to a separate object (@ptillet) Co-authored-by: Yan Da <dyanab@connect.ust.hk>	2022-04-06 16:13:53 -07:00

1 2

75 Commits