Philippe Tillet
af080740f2
[GENERAL] Merged v1.0alpha into master. Added features are:
...
- A100 support via mma.16816
- Thread swizzling for conflict-free shared memory accesses without
padding
- Complete overhaul of the LLVM code generation in
codegen/selection/generator.cc to remove overengineering
- Added debugging capabilities in the Python binding
- Compilation error for kernels that spill
2021-01-11 19:23:24 -05:00
Philippe Tillet
e01e623333
[codegen][auto-coalesce] more debugging
2019-09-16 20:34:08 -04:00
Philippe Tillet
32234c2612
ugh
2019-09-08 17:35:24 -04:00
Philippe Tillet
a842d337c5
[general] various cleaning and bugfix:
...
* added copy1d and copy2d benchmark
* fixed issue in reassociation pass
2019-09-02 23:00:49 -04:00
Philippe Tillet
2d4ddab4d0
[ir][print] improved pretty-printing of constants and instructions
2019-08-30 18:02:33 -07:00
Philippe Tillet
7e0af2118c
[codegen] worked around bug seemingly from nvptx/ptxas by simplifying multiplications by 1:
...
- Generated LLVM-IR looked correct
- Illegal addressing disappeared when running cuda-memcheck
- Illegal addressing disappeared when using nvptx-short-pointer
2019-08-30 16:45:14 -07:00
Philippe Tillet
732156b942
[general] rename *.cpp -> *.cc
2019-08-23 19:06:39 -07:00