Commit Graph

16 Commits

Author SHA1 Message Date
Philippe Tillet
083bbd1e8d [GENERAL] Merged v1.0alpha into master. Added features are:
- A100 support via mma.16816
- Thread swizzling for conflict-free shared memory accesses without
padding
- Complete overhaul of the LLVM code generation in
codegen/selection/generator.cc to remove overengineering
- Added debugging capabilities in the Python binding
- Compilation error for kernels that spill
2021-07-27 12:38:48 -07:00
Philippe Tillet
5e8f4c934c [DRIVER] Better exception handling of invalid ptx 2021-07-27 12:38:48 -07:00
Philippe Tillet
4f08d87fed [DRIVER] Simplified Driver API by substantially removing reliance on driver::context 2021-07-27 12:38:48 -07:00
Philippe Tillet
f42b04d925 [DRIVER] Added (slow) support for CUDA11 and Ampere 2021-07-27 12:38:48 -07:00
Philippe Tillet
8f8d36c7a4 [GENERAL] Various bugfixes 2021-07-27 12:38:48 -07:00
Philippe Tillet
50587bbf4b [General] LLVM-9 -> LLVM-10 2021-07-27 12:38:48 -07:00
Philippe Tillet
049ab989b5 [GENERAL] Various improvements:
* Sparse einsum in triton.ops.einsum
* Hacky support for fixed-tile-size atomic-add
* Various bugfixes in parser
2021-07-27 12:38:48 -07:00
Philippe Tillet
664d3cae89 [DRIVER] Removed OpenCL support
There is no plan to support OpenCL anytime soon (Vulkan would be preferred). Removing the adequate portion of the driver code
2021-07-27 12:38:48 -07:00
Philippe Tillet
840308ab5d [CODEGEN] More work on the CPU backend 2021-07-27 12:38:48 -07:00
Philippe Tillet
29a0ad6c4d [DRIVER] Now always using PTXv6.4 2021-07-27 12:38:48 -07:00
Philippe Tillet
4bb0311f60 [TRITON] Fixed misaligned address issue 2021-07-27 12:38:48 -07:00
Philippe Tillet
ddd89e1b22 [GENERAL] Fixed some undefined behavior with GCC-9 2021-07-27 12:38:48 -07:00
Philippe Tillet
3304629de9 [CORE] Fixed several issues that arose in the development of the
torch-blocksparse package:

* Now using warp shuffle in reductions when possible
* Various bugfixes in layout inference
* Added INFINITY, exponential and select
* Better error messages for unimplemented constructs
2021-07-27 12:38:48 -07:00
Philippe Tillet
9cb3fd899a [CORE][DRIVER] Now only using PTX6.4 if CUDA10.1+ is detected 2021-07-27 12:38:48 -07:00
Philippe Tillet
dfb844bf41 [GENERAL] Improved caching mechanism:
* Now computing hash in libtriton
* Now only compiling a single pytorch hook per function signature
2021-07-27 12:38:48 -07:00
Philippe Tillet
6d7cf35123 History prior to this date belonged to the now deprecated ISAAC project, and was deleted to save space 2021-07-27 12:38:38 -07:00