triton

Author	SHA1	Message	Date
Philippe Tillet	b79bcbaee8	[auto-tuning] now not compiling kernels that use too much shared memory	2019-09-05 21:03:09 -04:00
Philippe Tillet	1f8fd525b5	[python] fixed warnings for pybind11 and pytorch	2019-09-05 20:28:00 -04:00
Philippe Tillet	18848cbb71	[driver] now passing std::unique_ptr<> instead of cloning LLVM module when compiling it	2019-09-05 17:25:58 -04:00
Philippe Tillet	2d6c8311e8	[python] upgraded pybind11 ; forcing torch tensors to be contiguous()	2019-09-05 12:30:51 -04:00
Philippe Tillet	f6e9c24fe8	[python] more progress towards tensorflow/pytorch unification	2019-09-04 19:45:50 -04:00
Philippe Tillet	a842d337c5	[general] various cleaning and bugfix: * added copy1d and copy2d benchmark * fixed issue in reassociation pass	2019-09-02 23:00:49 -04:00
Philippe Tillet	90d80c3b2e	[codegen][selection] bugfix in scanline dot lowering	2019-09-01 16:30:53 -04:00
Philippe Tillet	5db3a7adfe	[python][examples] some more cleaning of dot product example	2019-08-30 17:05:03 -07:00
Philippe Tillet	7e0af2118c	[codegen] worked around bug seemingly from nvptx/ptxas by simplifying multiplications by 1: - Generated LLVM-IR looked correct - Illegal addressing disappeared when running cuda-memcheck - Illegal addressing disappeared when using nvptx-short-pointer	2019-08-30 16:45:14 -07:00
Philippe Tillet	d457482539	[codegen] fixed issue in double buffering pointer update	2019-08-28 17:50:45 -07:00
Philippe Tillet	37cbcfabd0	[examples] back to 96 TFLOPS on V100	2019-08-26 22:49:14 -07:00
Philippe Tillet	b4ae06a714	tracking down performance regression	2019-08-26 20:38:39 -07:00
Philippe Tillet	4075949f80	[python] basic tensorflow wrapper working	2019-08-26 16:53:49 -07:00
Philippe Tillet	321d268a4a	more progress	2019-08-25 21:26:09 -07:00
Philippe Tillet	96b4d5e411	[examples] multiple transposition schemes now supported	2019-08-24 13:08:38 -07:00
Philippe Tillet	732156b942	[general] rename .cpp -> .cc	2019-08-23 19:06:39 -07:00
Philippe Tillet	f98b0b8e2a	[general] deleted the old compiler frontend	2019-08-23 17:28:02 -07:00
Philippe Tillet	8798d240dc	matmul test passes	2019-08-23 17:13:30 -07:00
Philippe Tillet	845c0e5b93	adding tunable parameters	2019-08-22 19:21:01 -07:00
Philippe Tillet	87072203c1	[codegen] triton-ir code generation does not crash	2019-08-22 17:27:10 -07:00
Philippe Tillet	a6ec807223	more debugging	2019-08-21 21:53:41 -07:00
Philippe Tillet	bc11e31419	[lang] more progress on parser	2019-08-19 20:56:39 -07:00
Philippe Tillet	0970fe12dd	[general] cleaned tensorflow source code generation	2019-08-18 15:39:36 -07:00
Philippe Tillet	457c330f15	more cleaning	2019-08-18 14:20:42 -07:00
Philippe Tillet	81571246cf	[general] fixed some warnings	2019-08-18 14:08:57 -07:00
Philippe Tillet	c05445d001	[general] removed dnn/ module and runtime/jit.cpp	2019-08-18 00:41:05 -07:00
Philippe Tillet	4de22df930	[python] added skeleton for python interface	2019-08-15 20:50:10 -07:00
Philippe Tillet	3ece461ce2	added tensorflow code generator	2019-08-15 15:59:53 -07:00
Philippe Tillet	38a8b0ab19	[runtime] overall of the run-time API	2019-08-14 20:26:11 -07:00
Philippe Tillet	1400d960a6	[auto-tuning] much smaller parameters space	2019-08-12 21:15:21 -07:00
Philippe Tillet	fd49cdc92b	[dnn][blocksparse] added dw code	2019-08-08 19:15:35 -07:00
Philippe Tillet	7578c27d3d	[general][filesystem] added structure and namespace to code generation files	2019-08-07 21:17:17 -07:00
Philippe Tillet	392b55280d	[codegen] some cleaning for batched matmul	2019-08-07 21:17:17 -07:00
Philippe Tillet	7b75b68edc	dirty but working warp-splitting	2019-08-06 21:07:13 -07:00
Philippe Tillet	cf256a636c	fixup	2019-08-06 16:44:16 -07:00
Philippe Tillet	5efdb7978e	more improvements and regressions	2019-08-06 16:21:20 -07:00
Philippe Tillet	d869d9a924	[codegen][selection] more flexible instruction selection for reduce_inst	2019-08-04 16:34:36 -07:00
Philippe Tillet	6be532c6a2	[codegen][selection] adding support for reduction along arbitrary axis	2019-08-02 21:29:36 -07:00
Philippe Tillet	d9945692a9	[dnn] better specification of recompilation key	2019-08-02 17:42:48 -07:00
Philippe Tillet	f7bd976fc7	[dnn/blocksparse] added heuristics for block-sparse dot	2019-07-31 17:12:36 -07:00
Philippe Tillet	5af7e5adac	Made sure it works for FP16	2019-07-30 20:02:16 -07:00
Philippe Tillet	17cb2db356	[dnn/blocksparse/dot] prototype version seems to pass basic test	2019-07-27 21:21:36 -07:00
Philippe Tillet	c448876178	better benchmarking	2019-07-22 19:26:12 -07:00
Philippe Tillet	ead368d1ed	[general] a bunch of fixes in anticipation of proper triton vs cudnn benchmarks * DNN: Added partial auto-tuning mode and skeleton for heuristics * Examples: Moduralized benchmarking and now evaluating ResNet-18 shapes	2019-07-21 20:17:56 -07:00
Philippe Tillet	b1d81a5802	more work on heuristics	2019-07-21 18:11:54 -07:00
Philippe Tillet	28c250216c	[dnn/gemm] added some bounds checking	2019-07-19 21:32:55 -07:00
Philippe Tillet	5215fb0424	[codegen] some more optimizations	2019-07-19 20:29:03 -07:00
Philippe Tillet	f0d8306437	[codegen/alignment_info] better handling of constants	2019-07-18 16:12:06 -07:00
Philippe Tillet	bfa39b8992	preparing the field for tensor cores transposes	2019-07-17 13:20:33 -07:00
Philippe Tillet	a55b098e88	[dnn/shift] now using constant divisions	2019-07-16 21:05:21 -07:00

1 2

67 Commits