Philippe Tillet
33e6f0df7f
[DRIVER] Bumped CUDA requirement to 11.4+. This is to avoid bad performance surprises as older ptxas
are much slower. ( #769 )
...
This also makes codegen simpler by avoiding special handling of eviction policies
2022-10-12 12:02:30 -07:00
Jason Ansel
998fd5f9af
[FRONTEND] Make triton.compile work without a cuda context ( #708 )
...
This allows compiling in a subprocess. I'm not seeing a ton of speedup from this, but figure it is a good change anyway.
2022-09-24 13:41:47 -07:00
Shintaro Iwasaki
c668d6596e
[DOCS] Fix spelling ( #664 )
...
This PR applies minor spelling fix in comments and string literals to
`master`. It shouldn't hurt anything.
2022-09-16 12:26:40 -07:00
Keren Zhou
4912916c11
[FRONTEND] Added support for element-wise function defined in external LLVM bitcode (e.g., libdevice) ( #562 )
2022-07-13 15:52:21 -07:00
Philippe Tillet
2bed6fc850
[LANG] Added support for device functions ( #484 )
2022-04-03 20:58:16 -07:00
apd10
e85c7a7fc7
Bugfix in ptxas path. ( #487 )
...
Bug: "ret" value is destroyed when a failing "ptxas --version" is run
overwriting the previous valid "ret" value.
Fix: keep rets only for those runs which are successful. Pick the first
one
2022-03-30 20:45:41 -07:00
Philippe Tillet
e0cc488055
[FRONTEND] Added tl.clock
and tl.globaltimer
( #485 )
2022-03-28 16:15:43 -07:00
Philippe Tillet
ea6d1f1b85
[DRIVER] LLVM driver fixup ( #482 )
...
Current way of doing things is probably not super thread safe. init is shared between threads and some threads my not call the LLVMInitialize* function.
2022-03-23 00:24:45 -07:00
Philippe Tillet
98ed7db8c1
[CODEGEN] Improvements and bugfixes ( #463 )
2022-02-24 14:56:24 -08:00
Philippe Tillet
2922dc141c
Merge branch 'master' into v2.0
2022-01-30 20:25:01 -08:00
Philippe Tillet
807d8a1945
[ALL] Merge master ( #447 )
2022-01-30 20:21:20 -08:00
daadaada
59d371c6eb
[BACKEND] Added Int8 mma ( #440 )
2022-01-27 09:12:44 -08:00
daadaada
94a2e10fe5
[BACKEND] Add bf16 & tf32 mma supports (on A100) ( #426 )
2022-01-11 10:20:31 -08:00
Philippe Tillet
2509124dd0
[DRIVER] Fixed some issue with how ptxas is used ( #399 )
...
Now using tmpnam and properly deleting temporaries when an exception is raised
2021-12-21 14:31:51 -08:00
Philippe Tillet
4e93b41c52
[GENERAL] Some minor fixups ( #393 )
...
* [RUNTIME] Now displaying error message when generated PTX is invalid
* [CODEGEN] Now converting `if` condition to bool implicitly
2021-12-17 18:06:21 -08:00
Victor
73b04d71b2
Fixes for building on Windows ( #382 )
...
* make C++ code compatible with Windows + MSVC
* added dlfcn-win32 for cross-platform dlopen
* fixed building and pip install on Windows
* fixed shared library file name under Windows
2021-12-07 14:10:58 -08:00
Philippe Tillet
2acaa4d0dd
[LANG] Added support for constexpr ( #361 )
2021-10-30 00:32:58 -07:00
Philippe Tillet
b7f0e87dc2
[DRIVER] Removed std::cout log message
2021-10-29 10:42:10 -07:00
Philippe Tillet
d3e584d4ba
Revert "[DRIVER] Fixed CUDA 10.1 bug ( #357 )" ( #358 )
...
This reverts commit d35014ba47
.
2021-10-26 15:04:49 -07:00
Philippe Tillet
d35014ba47
[DRIVER] Fixed CUDA 10.1 bug ( #357 )
2021-10-26 11:17:06 -07:00
Philippe Tillet
5ce1b726dc
[CODEGEN] Various bugfixes that make it possible to fuse RNG in a matmul epilogue ( #356 )
2021-10-24 02:30:46 -07:00
Philippe Tillet
6e5b0b4301
[FRONTEND] Added on-disk cache for compiled kernels ( #287 )
2021-09-18 22:48:26 -07:00
Philippe Tillet
94c83d30ce
[GENERAL] Removed deprecated driver files and added basic compatibility with rocm ( #268 )
...
- Removed driver module -- accelerator runtime is handled by pytorch
- Added basic support for ROCM based on @micmelesse 's PR -- now can execute empty kernel on AMD devices without any compile-time changes
- Now only using PREFER_SHARED for kernels when the size of shared memory is greater than 49k. Otherwise there can be poor L1 performance for broadcast tensors
2021-09-09 00:04:28 -07:00