[GENERAL] Merged v1.0alpha into master. Added features are:
- A100 support via mma.16816 - Thread swizzling for conflict-free shared memory accesses without padding - Complete overhaul of the LLVM code generation in codegen/selection/generator.cc to remove overengineering - Added debugging capabilities in the Python binding - Compilation error for kernels that spill
This commit is contained in:
@@ -111,7 +111,7 @@ setup(
|
||||
author_email='ptillet@g.harvard.edu',
|
||||
description='A language and compiler for custom Deep Learning operations',
|
||||
long_description='',
|
||||
packages=['triton', 'triton/_C', 'triton/ops'],
|
||||
packages=['triton', 'triton/_C'],
|
||||
install_requires=['numpy', 'torch', 'sympy'],
|
||||
package_data={'': data},
|
||||
ext_modules=[CMakeExtension('triton', 'triton/_C/')],
|
||||
|
Reference in New Issue
Block a user