Philippe Tillet
d3e584d4ba
Revert "[DRIVER] Fixed CUDA 10.1 bug ( #357 )" ( #358 )
...
This reverts commit d35014ba47
.
2021-10-26 15:04:49 -07:00
Philippe Tillet
d35014ba47
[DRIVER] Fixed CUDA 10.1 bug ( #357 )
2021-10-26 11:17:06 -07:00
Philippe Tillet
5ce1b726dc
[CODEGEN] Various bugfixes that make it possible to fuse RNG in a matmul epilogue ( #356 )
2021-10-24 02:30:46 -07:00
daadaada
858dec8372
[CODEGEN] Add cache modifier to tl.load ( #351 )
...
* Add cache modifier to tl.load
* Add comment to cache_modifier
* Remove force_nc_cache
* Update test
2021-10-17 22:14:04 -07:00
Philippe Tillet
90ded16c32
[DOCS] Added placeholder docstring for layernorm tutorial
2021-10-15 19:04:01 -07:00
Philippe Tillet
abbc554838
[VERSION] Bumped version to 1.1.1 ( #350 )
v1.1.1
2021-10-14 18:09:39 -07:00
Philippe Tillet
9b32075062
[CODEGEN] Some compiler improvements ( #349 )
2021-10-13 17:49:39 -07:00
Stephen McGroarty
c2e6b90ff1
[CODEGEN] Fixes masked load exception ( #342 )
2021-10-13 13:31:52 -07:00
Philippe Tillet
bfacc191b3
[FRONTEND] Now cache re-compiles when language
changes ( #348 )
2021-10-13 12:29:57 -07:00
Shantanu
f5ad168686
[PYTHON] Fix up __version__ ( #345 )
...
Co-authored-by: hauntsaninja <>
2021-10-13 00:09:00 -07:00
Philippe Tillet
c3c0ff0552
[LANGUAGE] Fixed issue with duplicates in large arrays of random uniform numbers ( #338 )
2021-10-10 15:22:34 -07:00
daadaada
9e9d781912
[CODEGEN] Pipeline fixup ( #336 )
2021-10-10 01:47:11 -07:00
daadaada
d5f20dbce0
[IR] Fix error when building in debug mode ( #331 )
2021-10-08 21:40:20 -07:00
Philippe Tillet
d4baad426d
[DOCS] Added layer norm example ( #326 )
2021-10-08 11:02:10 -07:00
Philippe Tillet
5123db0b7d
[LANG] Various (relatively minor) improvements ( #320 )
2021-10-04 18:39:40 -07:00
Min Xu
12b6158c5c
[DOCS] Minor fix ( #317 )
...
Co-authored-by: Min Xu <min.xu.public@gmail.com >
2021-09-30 17:33:08 -07:00
Philippe Tillet
b352b16567
[DOCS] Installation documentation now doesn't suggest to run regression
...
tests
2021-09-29 18:32:33 -07:00
Philippe Tillet
d132b7442b
[DOCS] Minor README edits
2021-09-28 00:39:33 -07:00
Philippe Tillet
44442db96e
[VERSION] Bumped to 1.1 ( #313 )
v1.1
2021-09-28 00:25:42 -07:00
Philippe Tillet
bfcfad7abe
[FRONTEND] Disable P2P ( #312 )
2021-09-27 21:18:27 -07:00
Philippe Tillet
2c287544cb
[OPS] Faster and cleaner block-sparse implementation ( #311 )
2021-09-27 18:25:16 -07:00
Philippe Tillet
c3756d1c33
[FRONTEND] Add do_not_specialize
to triton.jit to prevent specialization of kernel argument ( #309 )
2021-09-24 20:27:10 -07:00
Philippe Tillet
83da3febf2
[FRONTEND] Added simple hook for when something is written to the cache ( #308 )
2021-09-23 22:23:17 -07:00
Shantanu
0735061fce
[FRONTEND] fix for unpickleable keys ( #307 )
...
In #306 , I added the key to the cache data, so we can introspect to
investigate cache misses. Unfortunately, the key isn't pickleable,
so just add the str version instead.
Co-authored-by: hauntsaninja <>
2021-09-23 21:23:59 -07:00
Shantanu
2066ccd87e
[FRONTEND] single file caches ( #306 )
...
Co-authored-by: hauntsaninja <>
2021-09-23 20:21:19 -07:00
Philippe Tillet
e22d92c63c
[RUNTIME] removed obsolete putenv call ( #305 )
2021-09-23 17:51:58 -07:00
Shantanu
87f8d9f163
[PYTHON] Fix up __version__ ( #304 )
...
This should match setup.py
Co-authored-by: hauntsaninja <>
Co-authored-by: Philippe Tillet <phil@openai.com >
2021-09-23 17:36:33 -07:00
Philippe Tillet
ec2e7b8f48
[CODEGEN] Fixed nasty bug in coalesce pass ( #303 )
2021-09-23 17:05:11 -07:00
Shantanu
d253eb8719
[FRONTEND] Add cache_version to triton.jit ( #301 )
2021-09-23 16:45:54 -07:00
Philippe Tillet
5211f23a63
[FRONTEND] updated TensorWrapper ( #299 )
2021-09-22 13:53:27 -07:00
Philippe Tillet
2849e7a773
[CODEGEN] now re-coalescing before atomics ( #298 )
2021-09-22 13:35:53 -07:00
Philippe Tillet
41dbaf3b3f
[FRONTEND] Fixed typo in cache for .dumb db ( #296 )
2021-09-21 17:03:41 -07:00
Philippe Tillet
c151e0f6aa
[FRONTEND] Simplified detection of corrupted cache ( #295 )
2021-09-21 16:36:24 -07:00
Philippe Tillet
e96edc16ff
[FRONTEND] Compute cache now supports atomic writes ( #294 )
...
Note that killing a Triton process while it updates the cache will result in the cache being wiped out. This is because copying a whole `db` to a temporary file can be quite expensive on some systems.
2021-09-21 14:10:02 -07:00
Benjamin Lefaudeux
b53f5f3803
[OPS][BLOCKSPARSE] safeguarding a couple more configurations ( #292 )
2021-09-20 17:15:31 -07:00
Philippe Tillet
a12827848d
[FRONTEND] Now using exist_ok=True when creating cache directories ( #288 )
2021-09-18 23:44:21 -07:00
Philippe Tillet
6e5b0b4301
[FRONTEND] Added on-disk cache for compiled kernels ( #287 )
2021-09-18 22:48:26 -07:00
Benjamin Lefaudeux
bd855ac13d
[DOCS] Adding some doc on the benchmarks + requirements file ( #285 )
2021-09-18 16:37:30 -07:00
Philippe Tillet
313d6488f6
[CODEGEN] Fixed over-aggressive division handling in alignment pass ( #280 )
2021-09-15 00:40:17 -07:00
Philippe Tillet
da5063d898
[TEST] Added performance regression tests ( #283 )
2021-09-14 01:46:32 -07:00
Philippe Tillet
8fdd7e7ed6
[LANG] Fixed semantics of boolean load/store ( #282 )
2021-09-13 17:39:06 -07:00
Philippe Tillet
3e395bc84e
[LANG] Fixed semantics of NaN in float comparisons ( #281 )
2021-09-13 15:06:29 -07:00
Min Xu
cecca90bea
[DOCS] update installation doc and add gitignore ( #279 )
...
Co-authored-by: Min Xu <min.xu.public@gmail.com >
2021-09-12 21:11:45 -07:00
Philippe Tillet
4163d32c49
[DOCS] Fixed leftover exit() in 01-vector-add tutorial
2021-09-10 15:52:26 -07:00
Philippe Tillet
34369906b4
[PYTHON] Fix-up the previous commit
2021-09-10 11:13:25 -07:00
Philippe Tillet
ac10551d55
[PYTHON] Now providing triton.next_power_of_2 ( #273 )
2021-09-10 11:05:44 -07:00
Philippe Tillet
43723ccb95
[FRONTEND] Removed circular import that broke Python 3.6 support ( #272 )
2021-09-09 13:46:55 -07:00
Philippe Tillet
585e5cd0ec
[TEST] Added test for empty kernel ( #271 )
2021-09-09 10:20:37 -07:00
Philippe Tillet
94c83d30ce
[GENERAL] Removed deprecated driver files and added basic compatibility with rocm ( #268 )
...
- Removed driver module -- accelerator runtime is handled by pytorch
- Added basic support for ROCM based on @micmelesse 's PR -- now can execute empty kernel on AMD devices without any compile-time changes
- Now only using PREFER_SHARED for kernels when the size of shared memory is greater than 49k. Otherwise there can be poor L1 performance for broadcast tensors
2021-09-09 00:04:28 -07:00
Szymon Sidor
8bedcce9be
[LANG] Added seeded random number generation - philox ( #261 )
2021-09-02 22:02:40 -07:00