triton

Author	SHA1	Message	Date
Philippe Tillet	d3e584d4ba	Revert "[DRIVER] Fixed CUDA 10.1 bug (#357 )" (#358 ) This reverts commit `d35014ba47`.	2021-10-26 15:04:49 -07:00
Philippe Tillet	d35014ba47	[DRIVER] Fixed CUDA 10.1 bug (#357 )	2021-10-26 11:17:06 -07:00
Philippe Tillet	5ce1b726dc	[CODEGEN] Various bugfixes that make it possible to fuse RNG in a matmul epilogue (#356 )	2021-10-24 02:30:46 -07:00
daadaada	858dec8372	[CODEGEN] Add cache modifier to tl.load (#351 ) * Add cache modifier to tl.load * Add comment to cache_modifier * Remove force_nc_cache * Update test	2021-10-17 22:14:04 -07:00
Philippe Tillet	90ded16c32	[DOCS] Added placeholder docstring for layernorm tutorial	2021-10-15 19:04:01 -07:00
Philippe Tillet	abbc554838	[VERSION] Bumped version to 1.1.1 (#350 ) v1.1.1	2021-10-14 18:09:39 -07:00
Philippe Tillet	9b32075062	[CODEGEN] Some compiler improvements (#349 )	2021-10-13 17:49:39 -07:00
Stephen McGroarty	c2e6b90ff1	[CODEGEN] Fixes masked load exception (#342 )	2021-10-13 13:31:52 -07:00
Philippe Tillet	bfacc191b3	[FRONTEND] Now cache re-compiles when `language` changes (#348 )	2021-10-13 12:29:57 -07:00
Shantanu	f5ad168686	[PYTHON] Fix up __version__ (#345 ) Co-authored-by: hauntsaninja <>	2021-10-13 00:09:00 -07:00
Philippe Tillet	c3c0ff0552	[LANGUAGE] Fixed issue with duplicates in large arrays of random uniform numbers (#338 )	2021-10-10 15:22:34 -07:00
daadaada	9e9d781912	[CODEGEN] Pipeline fixup (#336 )	2021-10-10 01:47:11 -07:00
daadaada	d5f20dbce0	[IR] Fix error when building in debug mode (#331 )	2021-10-08 21:40:20 -07:00
Philippe Tillet	d4baad426d	[DOCS] Added layer norm example (#326 )	2021-10-08 11:02:10 -07:00
Philippe Tillet	5123db0b7d	[LANG] Various (relatively minor) improvements (#320 )	2021-10-04 18:39:40 -07:00
Min Xu	12b6158c5c	[DOCS] Minor fix (#317 ) Co-authored-by: Min Xu <min.xu.public@gmail.com>	2021-09-30 17:33:08 -07:00
Philippe Tillet	b352b16567	[DOCS] Installation documentation now doesn't suggest to run regression tests	2021-09-29 18:32:33 -07:00
Philippe Tillet	d132b7442b	[DOCS] Minor README edits	2021-09-28 00:39:33 -07:00
Philippe Tillet	44442db96e	[VERSION] Bumped to 1.1 (#313 ) v1.1	2021-09-28 00:25:42 -07:00
Philippe Tillet	bfcfad7abe	[FRONTEND] Disable P2P (#312 )	2021-09-27 21:18:27 -07:00
Philippe Tillet	2c287544cb	[OPS] Faster and cleaner block-sparse implementation (#311 )	2021-09-27 18:25:16 -07:00
Philippe Tillet	c3756d1c33	[FRONTEND] Add `do_not_specialize` to triton.jit to prevent specialization of kernel argument (#309 )	2021-09-24 20:27:10 -07:00
Philippe Tillet	83da3febf2	[FRONTEND] Added simple hook for when something is written to the cache (#308 )	2021-09-23 22:23:17 -07:00
Shantanu	0735061fce	[FRONTEND] fix for unpickleable keys (#307 ) In #306, I added the key to the cache data, so we can introspect to investigate cache misses. Unfortunately, the key isn't pickleable, so just add the str version instead. Co-authored-by: hauntsaninja <>	2021-09-23 21:23:59 -07:00
Shantanu	2066ccd87e	[FRONTEND] single file caches (#306 ) Co-authored-by: hauntsaninja <>	2021-09-23 20:21:19 -07:00
Philippe Tillet	e22d92c63c	[RUNTIME] removed obsolete putenv call (#305 )	2021-09-23 17:51:58 -07:00
Shantanu	87f8d9f163	[PYTHON] Fix up __version__ (#304 ) This should match setup.py Co-authored-by: hauntsaninja <> Co-authored-by: Philippe Tillet <phil@openai.com>	2021-09-23 17:36:33 -07:00
Philippe Tillet	ec2e7b8f48	[CODEGEN] Fixed nasty bug in coalesce pass (#303 )	2021-09-23 17:05:11 -07:00
Shantanu	d253eb8719	[FRONTEND] Add cache_version to triton.jit (#301 )	2021-09-23 16:45:54 -07:00
Philippe Tillet	5211f23a63	[FRONTEND] updated TensorWrapper (#299 )	2021-09-22 13:53:27 -07:00
Philippe Tillet	2849e7a773	[CODEGEN] now re-coalescing before atomics (#298 )	2021-09-22 13:35:53 -07:00
Philippe Tillet	41dbaf3b3f	[FRONTEND] Fixed typo in cache for .dumb db (#296 )	2021-09-21 17:03:41 -07:00
Philippe Tillet	c151e0f6aa	[FRONTEND] Simplified detection of corrupted cache (#295 )	2021-09-21 16:36:24 -07:00
Philippe Tillet	e96edc16ff	[FRONTEND] Compute cache now supports atomic writes (#294 ) Note that killing a Triton process while it updates the cache will result in the cache being wiped out. This is because copying a whole `db` to a temporary file can be quite expensive on some systems.	2021-09-21 14:10:02 -07:00
Benjamin Lefaudeux	b53f5f3803	[OPS][BLOCKSPARSE] safeguarding a couple more configurations (#292 )	2021-09-20 17:15:31 -07:00
Philippe Tillet	a12827848d	[FRONTEND] Now using exist_ok=True when creating cache directories (#288 )	2021-09-18 23:44:21 -07:00
Philippe Tillet	6e5b0b4301	[FRONTEND] Added on-disk cache for compiled kernels (#287 )	2021-09-18 22:48:26 -07:00
Benjamin Lefaudeux	bd855ac13d	[DOCS] Adding some doc on the benchmarks + requirements file (#285 )	2021-09-18 16:37:30 -07:00
Philippe Tillet	313d6488f6	[CODEGEN] Fixed over-aggressive division handling in alignment pass (#280 )	2021-09-15 00:40:17 -07:00
Philippe Tillet	da5063d898	[TEST] Added performance regression tests (#283 )	2021-09-14 01:46:32 -07:00
Philippe Tillet	8fdd7e7ed6	[LANG] Fixed semantics of boolean load/store (#282 )	2021-09-13 17:39:06 -07:00
Philippe Tillet	3e395bc84e	[LANG] Fixed semantics of NaN in float comparisons (#281 )	2021-09-13 15:06:29 -07:00
Min Xu	cecca90bea	[DOCS] update installation doc and add gitignore (#279 ) Co-authored-by: Min Xu <min.xu.public@gmail.com>	2021-09-12 21:11:45 -07:00
Philippe Tillet	4163d32c49	[DOCS] Fixed leftover exit() in 01-vector-add tutorial	2021-09-10 15:52:26 -07:00
Philippe Tillet	34369906b4	[PYTHON] Fix-up the previous commit	2021-09-10 11:13:25 -07:00
Philippe Tillet	ac10551d55	[PYTHON] Now providing triton.next_power_of_2 (#273 )	2021-09-10 11:05:44 -07:00
Philippe Tillet	43723ccb95	[FRONTEND] Removed circular import that broke Python 3.6 support (#272 )	2021-09-09 13:46:55 -07:00
Philippe Tillet	585e5cd0ec	[TEST] Added test for empty kernel (#271 )	2021-09-09 10:20:37 -07:00
Philippe Tillet	94c83d30ce	[GENERAL] Removed deprecated driver files and added basic compatibility with rocm (#268 ) - Removed driver module -- accelerator runtime is handled by pytorch - Added basic support for ROCM based on @micmelesse 's PR -- now can execute empty kernel on AMD devices without any compile-time changes - Now only using PREFER_SHARED for kernels when the size of shared memory is greater than 49k. Otherwise there can be poor L1 performance for broadcast tensors	2021-09-09 00:04:28 -07:00
Szymon Sidor	8bedcce9be	[LANG] Added seeded random number generation - philox (#261 )	2021-09-02 22:02:40 -07:00

... 2 3 4 5 6 ...

498 Commits