triton

Author	SHA1	Message	Date
Yan Da	2239ac1998	more progress on TritonGPU	2022-04-28 18:51:31 +08:00
Philippe Tillet	012e8c5b2b	fixup	2022-04-27 16:39:27 -07:00
Philippe Tillet	513bcaee50	Added some ASCII art for encoding documentation	2022-04-27 16:28:27 -07:00
Yan Da	38d13ae618	Some progress on TritonGPU	2022-04-27 21:16:45 +08:00
Yan Da	edca91bf8f	Update traits (NoSideEffect)	2022-04-27 19:41:07 +08:00
Yan Da	8dfe78f6cf	Add TritonCombineOps	2022-04-27 19:28:21 +08:00
Yan Da	c70f6b666e	Merge previous changes	2022-04-27 14:06:55 +08:00
Yan Da	74585fb970	Add Triton CombineOps	2022-04-27 13:45:56 +08:00
Philippe Tillet	81001d318c	Putting Triton dialect in its own folder	2022-04-26 14:39:27 -07:00
Philippe Tillet	62a64ff29b	Fixed Python link bug in CMakeLists	2022-04-26 14:39:18 -07:00
Yan Da	fcbbb3c10e	Fix visit_While issues	2022-04-10 16:16:13 +08:00
Yan Da	f1cc67bbc3	triton -> tt	2022-04-10 12:07:19 +08:00
Yan Da	28e96bbfd1	Remove the dependency on TensorDialect	2022-04-08 19:43:09 +08:00
Yan Da	62f7609612	More on type inference & assembly format	2022-04-08 19:37:57 +08:00
Yan Da	13aead4808	Use TableGen to define new types	2022-04-08 16:32:46 +08:00
Yan Da	6002340456	Better textual representation	2022-04-07 20:44:41 +08:00
Yan Da	62f772123c	now kernel functions return nothing (instead of none)	2022-04-07 20:22:17 +08:00
Yan Da	040a2b6c75	Fix OpBuilder	2022-04-07 20:01:31 +08:00
Yan Da	6b4da6f016	Documentation	2022-04-07 16:00:53 +08:00
Yan Da	16d44e5c4c	Verify power-of-2	2022-04-07 15:28:02 +08:00
Yan Da	9cf4107990	Add TensorSizeTrait	2022-04-07 15:18:43 +08:00
Yan Da	9dafa0e2e3	Update trtion dependencies	2022-04-01 20:16:07 +08:00
Yan Da	2041b67fbf	Now vecadd works	2022-03-30 20:21:47 +08:00
Yan Da	38e67b4293	Add more Ops	2022-03-28 19:50:23 +08:00
Yan Da	0d139ec460	Introducing SCF	2022-03-26 17:02:32 +08:00
Yan Da	5e117966d0	CatOp	2022-03-25 14:17:17 +08:00
Yan Da	f2ab318614	New python binding	2022-03-22 21:53:22 +08:00
Yan Da	419bbe0f6e	Reverts back to MLIR 14 & updates CMakeLists	2022-03-20 16:41:48 +08:00
Yan Da	a2c31ff434	Init commit	2022-03-17 20:40:55 +08:00
daadaada	539961072c	[FRONTEND] Semantic analysis refactor (#473 ) Moved dispatch.cc to semantic.py Integer signedness now moved from C++ to python Cleaner frontend type Co-authored-by: Phil Tillet <phil@openai.com>	2022-03-16 21:25:30 -07:00
Philippe Tillet	bb5765df5c	[CODEGEN] Now padding shared memory for layout conversion (#468 )	2022-03-03 22:19:05 -08:00
daadaada	d9dd97492f	Use unique_ptr in ir::context_impl (#462 ) Co-authored-by: Philippe Tillet <Phil.Tillet@gmail.com>	2022-02-24 16:07:10 -08:00
Philippe Tillet	98ed7db8c1	[CODEGEN] Improvements and bugfixes (#463 )	2022-02-24 14:56:24 -08:00
Philippe Tillet	807d8a1945	[ALL] Merge master (#447 )	2022-01-30 20:21:20 -08:00
Philippe Tillet	bef76b142a	[BACKEND] float division is now approximate by default (#446 )	2022-01-29 18:29:29 -08:00
daadaada	59d371c6eb	[BACKEND] Added Int8 mma (#440 )	2022-01-27 09:12:44 -08:00
Benjamin Lefaudeux	3a23c1dd33	[BACKEND] minor, hotfix for gcc compilation (#439 )	2022-01-23 14:24:02 -08:00
Philippe Tillet	4c94359199	[FRONTEND] Alignment fix-up (#428 )	2022-01-11 23:11:58 -08:00
daadaada	94a2e10fe5	[BACKEND] Add bf16 & tf32 mma supports (on A100) (#426 )	2022-01-11 10:20:31 -08:00
Madeleine Thompson	0ab9d67bad	uint8, uint16, uint32, and uint64 in kernels (#413 ) A forthcoming PR will update the RNG to use these types. Also: - Add tests for the `//`, `<<`, and `>>` operators. - Change `TensorWrapper` to unwrap objects when the resulting object would be simpler. - Clean up `throw_unreachable`, since it was triggering compiler warnings.	2022-01-05 15:27:17 -08:00
Philippe Tillet	03f1256f60	[FRONTEND] Added `volatile` flag for load (#407 )	2021-12-30 22:33:24 -08:00
Madeleine Thompson	985798f101	add missing bfloat16 repr and improve assertions (#403 ) - `BF16TyID` was missing a repr implementation. - Throw a better exception on impossible casts. - Add a few assertions. Tested with a debug build. - Add `pointer_dtype.__str__` to aid kernel debugging.	2021-12-23 17:01:17 -08:00
daadaada	39d4bfed83	[OPS] Add performance model for gemm/gemv (#397 ) Significantly improves the performance of `triton.ops.matmul` in memory-bound settings via the use of many more block configs coupled with a performance model to drive the auto-tuning process.	2021-12-21 09:56:10 -08:00
Madeleine Thompson	fa62b4a8f6	[FRONTEND] better stringification (#394 ) - Don't override `self.args` in `CompilationError`, and show the line number and column in error messages. This causes it to generate an easier-to-read backtrace. - Better `__str__` on `TensorWrapper`, `dtype`, and `block`.	2021-12-17 20:11:45 -08:00
Philippe Tillet	558555630f	[FRONTEND] Added xor_sum	2021-12-16 17:55:35 -08:00
Philippe Tillet	5ce1b726dc	[CODEGEN] Various bugfixes that make it possible to fuse RNG in a matmul epilogue (#356 )	2021-10-24 02:30:46 -07:00
daadaada	858dec8372	[CODEGEN] Add cache modifier to tl.load (#351 ) * Add cache modifier to tl.load * Add comment to cache_modifier * Remove force_nc_cache * Update test	2021-10-17 22:14:04 -07:00
Stephen McGroarty	c2e6b90ff1	[CODEGEN] Fixes masked load exception (#342 )	2021-10-13 13:31:52 -07:00
Philippe Tillet	6e5b0b4301	[FRONTEND] Added on-disk cache for compiled kernels (#287 )	2021-09-18 22:48:26 -07:00
Philippe Tillet	94c83d30ce	[GENERAL] Removed deprecated driver files and added basic compatibility with rocm (#268 ) - Removed driver module -- accelerator runtime is handled by pytorch - Added basic support for ROCM based on @micmelesse 's PR -- now can execute empty kernel on AMD devices without any compile-time changes - Now only using PREFER_SHARED for kernels when the size of shared memory is greater than 49k. Otherwise there can be poor L1 performance for broadcast tensors	2021-09-09 00:04:28 -07:00

1 2 3 4 5

216 Commits