triton

Author	SHA1	Message	Date
Philippe Tillet	2922dc141c	Merge branch 'master' into v2.0	2022-01-30 20:25:01 -08:00
Philippe Tillet	807d8a1945	[ALL] Merge master (#447 )	2022-01-30 20:21:20 -08:00
daadaada	59d371c6eb	[BACKEND] Added Int8 mma (#440 )	2022-01-27 09:12:44 -08:00
daadaada	94a2e10fe5	[BACKEND] Add bf16 & tf32 mma supports (on A100) (#426 )	2022-01-11 10:20:31 -08:00
Philippe Tillet	2509124dd0	[DRIVER] Fixed some issue with how ptxas is used (#399 ) Now using tmpnam and properly deleting temporaries when an exception is raised	2021-12-21 14:31:51 -08:00
Philippe Tillet	4e93b41c52	[GENERAL] Some minor fixups (#393 ) * [RUNTIME] Now displaying error message when generated PTX is invalid * [CODEGEN] Now converting `if` condition to bool implicitly	2021-12-17 18:06:21 -08:00
Victor	73b04d71b2	Fixes for building on Windows (#382 ) * make C++ code compatible with Windows + MSVC * added dlfcn-win32 for cross-platform dlopen * fixed building and pip install on Windows * fixed shared library file name under Windows	2021-12-07 14:10:58 -08:00
Philippe Tillet	2acaa4d0dd	[LANG] Added support for constexpr (#361 )	2021-10-30 00:32:58 -07:00
Philippe Tillet	b7f0e87dc2	[DRIVER] Removed std::cout log message	2021-10-29 10:42:10 -07:00
Philippe Tillet	d3e584d4ba	Revert "[DRIVER] Fixed CUDA 10.1 bug (#357 )" (#358 ) This reverts commit `d35014ba47`.	2021-10-26 15:04:49 -07:00
Philippe Tillet	d35014ba47	[DRIVER] Fixed CUDA 10.1 bug (#357 )	2021-10-26 11:17:06 -07:00
Philippe Tillet	5ce1b726dc	[CODEGEN] Various bugfixes that make it possible to fuse RNG in a matmul epilogue (#356 )	2021-10-24 02:30:46 -07:00
Philippe Tillet	6e5b0b4301	[FRONTEND] Added on-disk cache for compiled kernels (#287 )	2021-09-18 22:48:26 -07:00
Philippe Tillet	94c83d30ce	[GENERAL] Removed deprecated driver files and added basic compatibility with rocm (#268 ) - Removed driver module -- accelerator runtime is handled by pytorch - Added basic support for ROCM based on @micmelesse 's PR -- now can execute empty kernel on AMD devices without any compile-time changes - Now only using PREFER_SHARED for kernels when the size of shared memory is greater than 49k. Otherwise there can be poor L1 performance for broadcast tensors	2021-09-09 00:04:28 -07:00

14 Commits