triton

Author	SHA1	Message	Date
Yan Da	2041b67fbf	Now vecadd works	2022-03-30 20:21:47 +08:00
Yan Da	e381dc72c5	Use mlir::Block to replace MlirBlock	2022-03-30 16:31:03 +08:00
Yan Da	e95d98a886	bindings for ModuleOp	2022-03-30 13:32:52 +08:00
Yan Da	38e67b4293	Add more Ops	2022-03-28 19:50:23 +08:00
Yan Da	0d139ec460	Introducing SCF	2022-03-26 17:02:32 +08:00
Yan Da	c53f3486e4	create shr	2022-03-26 16:41:49 +08:00
Yan Da	ba16116f96	Let python manage created objects	2022-03-26 16:31:01 +08:00
Yan Da	a17fba86b1	Logic Op creation	2022-03-26 16:16:20 +08:00
Yan Da	d5612333c0	More fcmp ops	2022-03-25 14:12:20 +08:00
Yan Da	07881b4d41	Update includes	2022-03-24 13:46:35 +08:00
Yan Da	cf7fc8d642	Update includes	2022-03-24 13:33:54 +08:00
Yan Da	14a71dcb6f	Replace MlirOperation with MlirValue	2022-03-23 13:31:14 +08:00
Yan Da	f2ab318614	New python binding	2022-03-22 21:53:22 +08:00
Yan Da	419bbe0f6e	Reverts back to MLIR 14 & updates CMakeLists	2022-03-20 16:41:48 +08:00
Yan Da	a2c31ff434	Init commit	2022-03-17 20:40:55 +08:00
daadaada	539961072c	[FRONTEND] Semantic analysis refactor (#473 ) Moved dispatch.cc to semantic.py Integer signedness now moved from C++ to python Cleaner frontend type Co-authored-by: Phil Tillet <phil@openai.com>	2022-03-16 21:25:30 -07:00
Philippe Tillet	d4d8eaf6c0	[FRONTEND] improved caching mechanism (#474 ) Co-authored-by: Greg Brockman <gdb@gregbrockman.com> Co-authored-by: Christopher Hesse <christopherhesse@users.noreply.github.com>	2022-03-15 12:20:51 -07:00
Philippe Tillet	98ed7db8c1	[CODEGEN] Improvements and bugfixes (#463 )	2022-02-24 14:56:24 -08:00
Philippe Tillet	9b100302d3	[FRONTEND] Now using pybind11 to release GIL (#458 )	2022-02-10 01:57:39 -08:00
Philippe Tillet	7b48340ffd	[CI] Some fixes for the build (#451 )	2022-02-06 19:11:33 -08:00
Philippe Tillet	807d8a1945	[ALL] Merge master (#447 )	2022-01-30 20:21:20 -08:00
Philippe Tillet	bef76b142a	[BACKEND] float division is now approximate by default (#446 )	2022-01-29 18:29:29 -08:00
Philippe Tillet	4c97d1ecd7	[FRONTEND] Bunch of fixes here and there (#436 )	2022-01-20 10:55:59 -08:00
Philippe Tillet	4c94359199	[FRONTEND] Alignment fix-up (#428 )	2022-01-11 23:11:58 -08:00
Madeleine Thompson	0ab9d67bad	uint8, uint16, uint32, and uint64 in kernels (#413 ) A forthcoming PR will update the RNG to use these types. Also: - Add tests for the `//`, `<<`, and `>>` operators. - Change `TensorWrapper` to unwrap objects when the resulting object would be simpler. - Clean up `throw_unreachable`, since it was triggering compiler warnings.	2022-01-05 15:27:17 -08:00
Philippe Tillet	03f1256f60	[FRONTEND] Added `volatile` flag for load (#407 )	2021-12-30 22:33:24 -08:00
Madeleine Thompson	985798f101	add missing bfloat16 repr and improve assertions (#403 ) - `BF16TyID` was missing a repr implementation. - Throw a better exception on impossible casts. - Add a few assertions. Tested with a debug build. - Add `pointer_dtype.__str__` to aid kernel debugging.	2021-12-23 17:01:17 -08:00
Philippe Tillet	a425f24d54	[FRONTEND] Better cache hook (#400 ) Added an additional `repr` argument to the cache hook, which represents a human-readable string representation of the signature and argument attributes associated with the compiled binary.	2021-12-21 21:29:47 -08:00
daadaada	39d4bfed83	[OPS] Add performance model for gemm/gemv (#397 ) Significantly improves the performance of `triton.ops.matmul` in memory-bound settings via the use of many more block configs coupled with a performance model to drive the auto-tuning process.	2021-12-21 09:56:10 -08:00
daadaada	4a8953efa3	[FRONTEND] Replace the legacy print call in triton.cc with the SlotTracker-based one. (#396 ) The legacy print call will assign names (e.g., %10) to values, which can be undesirable in some cases.	2021-12-18 18:03:22 -08:00
Philippe Tillet	558555630f	[FRONTEND] Added xor_sum	2021-12-16 17:55:35 -08:00
Philippe Tillet	e31b9b4e66	[RUNTIME] Better support for `None` (#387 ) * regression test fails but it doesn't make sense to me.	2021-12-09 13:21:22 -08:00
Philippe Tillet	f23bf55f15	[RUNTIME] release the gil on launch (#383 )	2021-12-03 13:01:01 -08:00
Philippe Tillet	c86ad9c9ab	[FRONTEND] Added default arguments to non-kernel @triton.jit'd function (#379 )	2021-11-29 19:11:26 -08:00
Philippe Tillet	5693b582ea	[RUNTIME] Now using pybind11 to avoid memory leaks (#377 )	2021-11-21 02:30:22 -08:00
Philippe Tillet	01cc3d4503	[RUNTIME] Restored `do_not_specialize` (#374 )	2021-11-12 15:06:55 -08:00
Philippe Tillet	5d54352164	[FRONTEND] Significantly reduce kernel launch time (#367 )	2021-11-04 13:25:24 -07:00
Philippe Tillet	5ce1b726dc	[CODEGEN] Various bugfixes that make it possible to fuse RNG in a matmul epilogue (#356 )	2021-10-24 02:30:46 -07:00
daadaada	858dec8372	[CODEGEN] Add cache modifier to tl.load (#351 ) * Add cache modifier to tl.load * Add comment to cache_modifier * Remove force_nc_cache * Update test	2021-10-17 22:14:04 -07:00
Philippe Tillet	5123db0b7d	[LANG] Various (relatively minor) improvements (#320 )	2021-10-04 18:39:40 -07:00
Philippe Tillet	6e5b0b4301	[FRONTEND] Added on-disk cache for compiled kernels (#287 )	2021-09-18 22:48:26 -07:00
Philippe Tillet	94c83d30ce	[GENERAL] Removed deprecated driver files and added basic compatibility with rocm (#268 ) - Removed driver module -- accelerator runtime is handled by pytorch - Added basic support for ROCM based on @micmelesse 's PR -- now can execute empty kernel on AMD devices without any compile-time changes - Now only using PREFER_SHARED for kernels when the size of shared memory is greater than 49k. Otherwise there can be poor L1 performance for broadcast tensors	2021-09-09 00:04:28 -07:00
daadaada	85426dbaf7	[DOCS] Add comments in layout.h (#249 )	2021-08-28 18:07:32 -07:00
milesial	5b29da719d	[DRIVER] Add CUDA P2P support (#209 )	2021-08-20 21:00:54 -07:00
Philippe Tillet	298da78058	[CODEGEN/DRIVER] Tweaks for performance optimization (#193 )	2021-08-07 16:41:44 -07:00
Philippe Tillet	01276b5153	[FRONTEND] Added compilation flag to force use of `.nc` cache modifier (#134 ) in DRAM loads. /!\ USE CAREFULLY - THIS CAN BREAK CORRECTNESS IF MISUSED /!\	2021-07-27 12:38:49 -07:00
Philippe Tillet	2824345065	[LANGUAGE] Added cos/sin (#132 )	2021-07-27 12:38:49 -07:00
Philippe Tillet	8cea583109	[IR] Preliminary support for BF16 (#129 ) This PR adds a BF16 data-type, along with FP32 <-> BF16 conversion instructions in the LLVM codegen. Other kinds of ops on bfloat16 are not yet supported.	2021-07-27 12:38:49 -07:00
daadaada	d8d6b715c8	[CODEGEN] Performance improvement on A100 (#125 ) Improved codegen for the Ampere GPUs. * Make the layout pass recognize the multistage pipelined pattern. * Now the pipeline pass can automate the multistage pipelining transformation. * Remove extra barriers (from the prefetch pass & WAR) on Ampere. * Update the code generator (generator.cc) to make Triton generate n-buffered shared memory loads/stores.	2021-07-27 12:38:49 -07:00
Philippe Tillet	0274429429	[IR] Added IR and Codegen support for atomic_rmw (#120 )	2021-07-27 12:38:49 -07:00

1 2

63 Commits