Philippe Tillet
083bbd1e8d
[GENERAL] Merged v1.0alpha into master. Added features are:
...
- A100 support via mma.16816
- Thread swizzling for conflict-free shared memory accesses without
padding
- Complete overhaul of the LLVM code generation in
codegen/selection/generator.cc to remove overengineering
- Added debugging capabilities in the Python binding
- Compilation error for kernels that spill
2021-07-27 12:38:48 -07:00
Philippe Tillet
c0bc7ed8b0
[PYTHON] Added TRITON_DEBUG_MODE which reallocates input tensors outside of the pytorch memory pool to spot out-of-bounds accesses more easily
2021-07-27 12:38:48 -07:00
Philippe Tillet
8f8d36c7a4
[GENERAL] Various bugfixes
2021-07-27 12:38:48 -07:00
Philippe Tillet
8f3ee53f24
[PYTHON] Added option to show PTX source code in Python
2021-07-27 12:38:48 -07:00
Philippe Tillet
049ab989b5
[GENERAL] Various improvements:
...
* Sparse einsum in triton.ops.einsum
* Hacky support for fixed-tile-size atomic-add
* Various bugfixes in parser
2021-07-27 12:38:48 -07:00
Philippe Tillet
acff1b5e05
[RUNTIME] Lower-level interface for executing functions
2021-07-27 12:38:48 -07:00
Philippe Tillet
ba9955ae39
[CODEGEN][ANALYSIS] Fixed issue in layout inference
2021-07-27 12:38:48 -07:00
Philippe Tillet
89e456107b
[EXAMPLES] Improved mat_mul example
2021-07-27 12:38:48 -07:00
Philippe Tillet
68c18238a9
[EXAMPLES] Added conv2d example
2021-07-27 12:38:48 -07:00
Philippe Tillet
4ccd78f1a6
[EXAMPLES][TUTORIAL] Changed to new triton.kernel API
2021-07-27 12:38:48 -07:00
jack-willturner
180ed26b61
[DOCS] Transposition fix
2021-07-27 12:38:48 -07:00
jack-willturner
a98a2db2c2
[DOCS] Matrix copy and transpose
2021-07-27 12:38:48 -07:00
jack-willturner
32819dea51
[DOCS] Matmul and vecadd working examples
2021-07-27 12:38:48 -07:00