triton

Author	SHA1	Message	Date
Philippe Tillet	061ef3920e	[CODEGEN] Fixed bug that caused conditional operator to not always properly mask load operations Also includes minor improvement to benchmarking infrastructure	2021-03-08 20:04:26 -05:00
Philippe Tillet	db6bf71564	[CODEGEN] Bugfixes with FP32 async copy	2021-02-24 14:44:09 -05:00
Philippe Tillet	bfe92a5d91	[CODEGEN] Now initializing cp.async to zero when predicate is false WARNING: case for non-zero initialization is still not handled. Will require manual copy to shared	2021-02-22 17:14:33 -05:00
Philippe Tillet	15f8e8c3b7	[CODEGEN] Major performance improvements on A100 (#70 ) Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks	2021-02-21 18:19:39 -05:00
Philippe Tillet	79d098450f	[PYTHON][TESTS][DOC] Various improvement of the API and code quality: * Simplified `triton.kernel` API to achieve lower latency: > .data_ptr() must now be passed as kernel argument. No more implicit conversion from torch.tensor > compilation options are now constant attributes, i.e., opt.d('VAR') becomes opt.VAR > torch.device must now be passed explicitly to triton.kernel (no longer inferred from torch.tensor arguments) * C++ tests moved to `python/tests/` * C++ tutorial created in `tutorials/` * Python tutorial created in python/tutorials/ * Version changed to 1.0alpha * No longer copying C++ headers into the Python package * added python/triton/ops/ package for pre-written Triton ops	2021-01-29 17:27:16 -05:00
Philippe Tillet	aef1b2b3c9	[CODEGEN] Fixed bug in recoalesce_inst LLVM codegen	2021-01-19 19:19:51 -05:00
Philippe Tillet	299cfe743f	[CODEGEN] Fixed issue in traversal order for atomic_add and store_inst	2021-01-14 17:41:26 -05:00
Philippe Tillet	af080740f2	[GENERAL] Merged v1.0alpha into master. Added features are: - A100 support via mma.16816 - Thread swizzling for conflict-free shared memory accesses without padding - Complete overhaul of the LLVM code generation in codegen/selection/generator.cc to remove overengineering - Added debugging capabilities in the Python binding - Compilation error for kernels that spill	2021-01-11 19:23:24 -05:00
Philippe Tillet	75131b4622	[LANG] Added some more atomic_add support	2020-12-01 22:31:32 -05:00
Philippe Tillet	bcc5745ea0	[CODEGEN] Fixed bug in atomic_add	2020-11-19 18:19:55 -05:00
Philippe Tillet	fa066b531c	[PYTHON] Compiling Triton in Release mode now...	2020-11-13 01:44:52 -05:00
Philippe Tillet	d73de44070	tmp	2020-11-12 16:56:36 -05:00
Philippe Tillet	28e19443d0	[CODEGEN] Progress on atom.add.f16x2	2020-11-12 16:48:04 -05:00
Philippe Tillet	6c5284ed3b	[GENERAL] Various bugfixes	2020-11-11 14:44:56 -05:00
Philippe Tillet	a2d54b5ad3	[General] LLVM-9 -> LLVM-10	2020-11-07 22:46:18 -05:00
Philippe Tillet	e2c1ac8d24	[LANG] Added log intrinsic	2020-11-03 15:50:11 -05:00
Philippe Tillet	ef122ca9cf	[CODEGEN] Fixed bug in 2D reductions	2020-11-02 15:05:41 -05:00
Philippe Tillet	9be1d5afc2	[GENERAL] Various improvements: * Sparse einsum in triton.ops.einsum * Hacky support for fixed-tile-size atomic-add * Various bugfixes in parser	2020-10-25 12:16:40 -07:00
Philippe Tillet	0cbee3ec56	[CODEGEN] More work on the CPU backend	2020-09-14 10:59:40 -04:00
Philippe Tillet	30ac1359b9	[RUNTIME] Lower-level interface for executing functions	2020-08-12 18:33:35 -04:00
Philippe Tillet	2d6484482f	[CODEGEN][ANALYSIS] Fixed issue in layout inference	2020-08-10 11:53:11 -04:00
Philippe Tillet	7c519e09c5	[ANALYSIS] Replaced min by gcd in layout inference	2020-06-06 17:42:36 -04:00
Philippe Tillet	8e9d793d11	[CODEGEN] Fixed various issues in alignment inference pass	2020-06-06 11:28:43 -04:00
Philippe Tillet	547434d7f0	[CODEGEN] Fixed bug in alignment inference that prevented vectorization in some cases	2020-06-06 01:13:38 -04:00
Philippe Tillet	52eca7676b	[TRITON] Fixed misaligned address issue	2020-06-05 12:00:42 -04:00
Philippe Tillet	a92031b214	[CODEGEN] Removed unnecessary coalescing rematerialization	2020-05-30 15:03:48 -04:00
Philippe Tillet	05214d22e3	[CODEGEN] Bugfix in Disassociate pass; Added fp32 atomic_add support	2020-05-13 23:21:21 -04:00
Philippe Tillet	cd21151b98	[GENERAL] Fixed some undefined behavior with GCC-9	2020-05-11 11:07:21 -04:00
Philippe Tillet	9da8fe11ed	[CODEGEN] Fixed bug that caused missing recoalescing for some transpose operations	2020-05-11 00:26:03 -04:00
Philippe Tillet	57a0b0a132	[CODEGEN] Fixed bug for phi nodes with constant incoming value	2020-05-02 17:30:15 -04:00
Philippe Tillet	a9efb27fde	[CODEGEN][ANALYSIS] bugfix in alignment analysis	2020-05-01 17:38:23 -04:00
Philippe Tillet	e04efc1c85	[GENERAL] Error messages now no longer make terminal color green	2020-04-03 23:25:29 -04:00
Philippe Tillet	7c09ff80eb	[CORE] Fixed several issues that arose in the development of the torch-blocksparse package: * Now using warp shuffle in reductions when possible * Various bugfixes in layout inference * Added INFINITY, exponential and select * Better error messages for unimplemented constructs	2020-03-31 18:57:28 -04:00
Philippe Tillet	7621aeda3f	[CODEGEN][TRANSFORM][PEEPHOLE] Fixed bug in *1 multiplication	2020-02-19 00:18:55 -05:00
Philippe Tillet	d11d2db6ee	[PYTHON][EINSUM] Now handling reduction sizes that are not a multiple of TK	2020-02-17 13:52:58 -05:00
Philippe Tillet	2fcf5cec5b	[TRITON][CODEGEN] Fixed flawed assert()	2020-01-24 15:25:00 -05:00
Philippe Tillet	78b98fb7cf	[GENERAL] Cleaned polymorphic structure of layouts analysis pass	2020-01-21 11:38:39 -05:00
Philippe Tillet	382ca2c745	[CODEGEN][ANALYSIS] cleaning: moving towards better polymorphism for tile layouts	2020-01-20 12:43:04 -05:00
Philippe Tillet	fbf2a3f56f	[CODEGEN][TRANSFORM] some bug-fixes for FP32 einsum	2020-01-20 12:42:53 -05:00
Philippe Tillet	f278d9741a	[GENERAL] Merged einsum feature branch. Various feature, performance improvements and bugfixes: * Added preliminary support for extended Einstein summation in PyTriton * Significant performance improvement on FP32 kernels containing matrix multiplication * Added re-coalescing pass for FP16 kernels containing matrix multiplication * Various bugfixes	2020-01-20 12:42:48 -05:00
Philippe Tillet	f4bbbbe5e4	[PYTHON][OPS] Bugfix in conv fprop	2019-11-01 00:43:02 -04:00
Philippe Tillet	d65a94c768	[PYTHON][OPS] Added batch normalization op	2019-10-29 17:29:11 -04:00
Philippe Tillet	e9c787ef05	[PYTHON][EINSUM] Added support for FP16	2019-10-28 14:07:17 -04:00
Philippe Tillet	0ec213547c	[PYTHON][KERNEL] Added benchmarking functionalities for kernels	2019-10-28 00:30:04 -04:00
Philippe Tillet	e11557855f	[PYTHON] [OPS] Added einsum implementation	2019-10-26 22:14:50 -04:00
Philippe Tillet	655f43fb5b	more work	2019-10-26 15:10:19 -04:00
Philippe Tillet	b615af2e7e	[codegen] [generator] fixed issue when tile size is 1 along one or more dimensions	2019-10-25 14:22:28 -04:00
Philippe Tillet	0770ccf537	[codegen] [selection] disassociation prototype	2019-10-25 09:39:46 -04:00
Philippe Tillet	b81734553b	[lang] added support for batched matrix multiplication	2019-10-21 15:41:50 -04:00
Philippe Tillet	de6fdd5625	[general] removed useless files and includes	2019-10-20 19:29:48 -04:00

1 2 3 4 5 ...

291 Commits