triton

Author	SHA1	Message	Date
Yan Da	9dafa0e2e3	Update trtion dependencies	2022-04-01 20:16:07 +08:00
Yan Da	2041b67fbf	Now vecadd works	2022-03-30 20:21:47 +08:00
Yan Da	38e67b4293	Add more Ops	2022-03-28 19:50:23 +08:00
Yan Da	0d139ec460	Introducing SCF	2022-03-26 17:02:32 +08:00
Yan Da	5e117966d0	CatOp	2022-03-25 14:17:17 +08:00
Yan Da	f2ab318614	New python binding	2022-03-22 21:53:22 +08:00
Yan Da	419bbe0f6e	Reverts back to MLIR 14 & updates CMakeLists	2022-03-20 16:41:48 +08:00
Yan Da	a2c31ff434	Init commit	2022-03-17 20:40:55 +08:00
daadaada	539961072c	[FRONTEND] Semantic analysis refactor (#473 ) Moved dispatch.cc to semantic.py Integer signedness now moved from C++ to python Cleaner frontend type Co-authored-by: Phil Tillet <phil@openai.com>	2022-03-16 21:25:30 -07:00
Philippe Tillet	bb5765df5c	[CODEGEN] Now padding shared memory for layout conversion (#468 )	2022-03-03 22:19:05 -08:00
daadaada	d9dd97492f	Use unique_ptr in ir::context_impl (#462 ) Co-authored-by: Philippe Tillet <Phil.Tillet@gmail.com>	2022-02-24 16:07:10 -08:00
Philippe Tillet	98ed7db8c1	[CODEGEN] Improvements and bugfixes (#463 )	2022-02-24 14:56:24 -08:00
Philippe Tillet	807d8a1945	[ALL] Merge master (#447 )	2022-01-30 20:21:20 -08:00
Philippe Tillet	bef76b142a	[BACKEND] float division is now approximate by default (#446 )	2022-01-29 18:29:29 -08:00
daadaada	59d371c6eb	[BACKEND] Added Int8 mma (#440 )	2022-01-27 09:12:44 -08:00
Benjamin Lefaudeux	3a23c1dd33	[BACKEND] minor, hotfix for gcc compilation (#439 )	2022-01-23 14:24:02 -08:00
Philippe Tillet	4c94359199	[FRONTEND] Alignment fix-up (#428 )	2022-01-11 23:11:58 -08:00
daadaada	94a2e10fe5	[BACKEND] Add bf16 & tf32 mma supports (on A100) (#426 )	2022-01-11 10:20:31 -08:00
Madeleine Thompson	0ab9d67bad	uint8, uint16, uint32, and uint64 in kernels (#413 ) A forthcoming PR will update the RNG to use these types. Also: - Add tests for the `//`, `<<`, and `>>` operators. - Change `TensorWrapper` to unwrap objects when the resulting object would be simpler. - Clean up `throw_unreachable`, since it was triggering compiler warnings.	2022-01-05 15:27:17 -08:00
Philippe Tillet	03f1256f60	[FRONTEND] Added `volatile` flag for load (#407 )	2021-12-30 22:33:24 -08:00
Madeleine Thompson	985798f101	add missing bfloat16 repr and improve assertions (#403 ) - `BF16TyID` was missing a repr implementation. - Throw a better exception on impossible casts. - Add a few assertions. Tested with a debug build. - Add `pointer_dtype.__str__` to aid kernel debugging.	2021-12-23 17:01:17 -08:00
daadaada	39d4bfed83	[OPS] Add performance model for gemm/gemv (#397 ) Significantly improves the performance of `triton.ops.matmul` in memory-bound settings via the use of many more block configs coupled with a performance model to drive the auto-tuning process.	2021-12-21 09:56:10 -08:00
Madeleine Thompson	fa62b4a8f6	[FRONTEND] better stringification (#394 ) - Don't override `self.args` in `CompilationError`, and show the line number and column in error messages. This causes it to generate an easier-to-read backtrace. - Better `__str__` on `TensorWrapper`, `dtype`, and `block`.	2021-12-17 20:11:45 -08:00
Philippe Tillet	558555630f	[FRONTEND] Added xor_sum	2021-12-16 17:55:35 -08:00
Philippe Tillet	5ce1b726dc	[CODEGEN] Various bugfixes that make it possible to fuse RNG in a matmul epilogue (#356 )	2021-10-24 02:30:46 -07:00
daadaada	858dec8372	[CODEGEN] Add cache modifier to tl.load (#351 ) * Add cache modifier to tl.load * Add comment to cache_modifier * Remove force_nc_cache * Update test	2021-10-17 22:14:04 -07:00
Stephen McGroarty	c2e6b90ff1	[CODEGEN] Fixes masked load exception (#342 )	2021-10-13 13:31:52 -07:00
Philippe Tillet	6e5b0b4301	[FRONTEND] Added on-disk cache for compiled kernels (#287 )	2021-09-18 22:48:26 -07:00
Philippe Tillet	94c83d30ce	[GENERAL] Removed deprecated driver files and added basic compatibility with rocm (#268 ) - Removed driver module -- accelerator runtime is handled by pytorch - Added basic support for ROCM based on @micmelesse 's PR -- now can execute empty kernel on AMD devices without any compile-time changes - Now only using PREFER_SHARED for kernels when the size of shared memory is greater than 49k. Otherwise there can be poor L1 performance for broadcast tensors	2021-09-09 00:04:28 -07:00
daadaada	274d613488	[IR] Better printer (#256 )	2021-09-01 09:55:12 -07:00
Philippe Tillet	4ff3714d61	[CODEGEN] Various bugfixes and stability improvements in compiler backend (#240 )	2021-08-30 11:50:35 -07:00
daadaada	85426dbaf7	[DOCS] Add comments in layout.h (#249 )	2021-08-28 18:07:32 -07:00
milesial	5b29da719d	[DRIVER] Add CUDA P2P support (#209 )	2021-08-20 21:00:54 -07:00
Philippe Tillet	226fde6ea1	[CODEGEN] Now using atomic_rmw code path for atomic_xchg (#222 )	2021-08-17 16:33:23 -07:00
Philippe Tillet	bb1eebb4b4	[CODEGEN] Fixed bug for visit_reduce1d with 64-bit data-types (#207 )	2021-08-14 21:07:01 -07:00
Philippe Tillet	83da7065da	[DRIVER] Portability fixup (#195 )	2021-08-07 18:53:11 -07:00
Philippe Tillet	298da78058	[CODEGEN/DRIVER] Tweaks for performance optimization (#193 )	2021-08-07 16:41:44 -07:00
Philippe Tillet	76c6f24fb6	[CI] Made build-wheels compatible with system LLVM setup (#138 ) This speeds up wheelhouse build time by ~10x	2021-07-27 12:38:49 -07:00
Philippe Tillet	01276b5153	[FRONTEND] Added compilation flag to force use of `.nc` cache modifier (#134 ) in DRAM loads. /!\ USE CAREFULLY - THIS CAN BREAK CORRECTNESS IF MISUSED /!\	2021-07-27 12:38:49 -07:00
Philippe Tillet	2824345065	[LANGUAGE] Added cos/sin (#132 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	8cea583109	[IR] Preliminary support for BF16 (#129 ) This PR adds a BF16 data-type, along with FP32 <-> BF16 conversion instructions in the LLVM codegen. Other kinds of ops on bfloat16 are not yet supported.	2021-07-27 12:38:49 -07:00
daadaada	d8d6b715c8	[CODEGEN] Performance improvement on A100 (#125 ) Improved codegen for the Ampere GPUs. * Make the layout pass recognize the multistage pipelined pattern. * Now the pipeline pass can automate the multistage pipelining transformation. * Remove extra barriers (from the prefetch pass & WAR) on Ampere. * Update the code generator (generator.cc) to make Triton generate n-buffered shared memory loads/stores.	2021-07-27 12:38:49 -07:00
Philippe Tillet	5a51f3e529	[CODEGEN] Bugfix in membar pass (#124 ) Membar pass on top of master is buggy with asynchronous copy. For example, it doesn't wait for asynchronous copies to complete before recoalescing accumulator in GEMM, which leads to undefined behavior when the program doesn't enter the loop. This PR proposes	2021-07-27 12:38:49 -07:00
Philippe Tillet	b7b05a560e	[DRIVER] Now giving the option to use system ptxas through environment variable (#123 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	80c86ecf4a	[LANG] Minor semantic changes (#121 ) * Now using unordered instead of ordered float (fixes NaN issues) * Bool -> int32 now converts to 1 rather than -1 * Reduce extend arguments to 32-bits if possible	2021-07-27 12:38:49 -07:00
Philippe Tillet	0274429429	[IR] Added IR and Codegen support for atomic_rmw (#120 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	59b0ac672a	[LANGUAGE] Added support for bitcast (#119 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	f81012a8cf	[CODEGEN] Fixed atomic_add issue (#112 ) * [CODEGEN] Fixed atomic_add issue * [CODEGEN] Fixed liveness analysis bug for instructions that are not DCE'd but have no users (e.g., atomic_cas)	2021-07-27 12:38:49 -07:00
Philippe Tillet	325ee38581	[PYTHON] Fixed bug in scoping mechanism (#111 ) Inline functions didn't restore scope of parents. Also some control flow structure still had the scoping semantics of C++	2021-07-27 12:38:49 -07:00
Philippe Tillet	288b4f7f58	[PYTHON] Added frontend to print sass using turingas disasm.py (#109 )	2021-07-27 12:38:49 -07:00

1 2 3 4

195 Commits