Commit Graph

286 Commits

Author SHA1 Message Date
Philippe Tillet
376c876eb8 [RUNTIME] Disable error on spills 2021-07-27 12:38:48 -07:00
Philippe Tillet
3b36a1e60c [CODEGEN] Fixed issue in traversal order for atomic_add and store_inst 2021-07-27 12:38:48 -07:00
Philippe Tillet
fdc8e8ef61 [TESTS] Fixed bug in how test arguments are enqueued 2021-07-27 12:38:48 -07:00
Philippe Tillet
083bbd1e8d [GENERAL] Merged v1.0alpha into master. Added features are:
- A100 support via mma.16816
- Thread swizzling for conflict-free shared memory accesses without
padding
- Complete overhaul of the LLVM code generation in
codegen/selection/generator.cc to remove overengineering
- Added debugging capabilities in the Python binding
- Compilation error for kernels that spill
2021-07-27 12:38:48 -07:00
Philippe Tillet
c0bc7ed8b0 [PYTHON] Added TRITON_DEBUG_MODE which reallocates input tensors outside of the pytorch memory pool to spot out-of-bounds accesses more easily 2021-07-27 12:38:48 -07:00
Philippe Tillet
c4fceeea49 [LANG] Added hacky min/max 2021-07-27 12:38:48 -07:00
Philippe Tillet
d70f54fd6a Merge pull request #45 from daadaada/master
[LANG] Add support for PREFIX_INC, PREFIX_DEC, POSTFIX_INC and POSTFIX_DEC
2021-07-27 12:38:48 -07:00
Philippe Tillet
547a99a5d4 [VERSION] 0.2.3 -> 0.3.0 2021-07-27 12:38:48 -07:00
Yan Da
27dc780871 [IR] Check constant_int type 2021-07-27 12:38:48 -07:00
Philippe Tillet
fd5c72d6a0 [LANG] Added some more atomic_add support 2021-07-27 12:38:48 -07:00
Yan Da
01ef691b84 [LANG] Fix gep bug in INC 2021-07-27 12:38:48 -07:00
Philippe Tillet
5e8f4c934c [DRIVER] Better exception handling of invalid ptx 2021-07-27 12:38:48 -07:00
Yan Da
e9b2335224 [LANG] Add support for POSTFIX_INC and POSTFIX_DEC, and pointer type 2021-07-27 12:38:48 -07:00
Philippe Tillet
44ca2c0cb8 [DRIVER] Removed deprecated files and functions 2021-07-27 12:38:48 -07:00
Yan Da
05b95b7fa6 [LANG] Add support for PREFIX_INC and PREFIX_DEC. 2021-07-27 12:38:48 -07:00
Philippe Tillet
7ab2c2a356 [DRIVER] Removed obsolete SetArg 2021-07-27 12:38:48 -07:00
Philippe Tillet
8ab62803db [PYTHON] Context switching logic moved to PyTorch 2021-07-27 12:38:48 -07:00
Philippe Tillet
4f08d87fed [DRIVER] Simplified Driver API by substantially removing reliance on driver::context 2021-07-27 12:38:48 -07:00
Philippe Tillet
f42b04d925 [DRIVER] Added (slow) support for CUDA11 and Ampere 2021-07-27 12:38:48 -07:00
Philippe Tillet
baa858aa74 [CODEGEN] Fixed bug in atomic_add 2021-07-27 12:38:48 -07:00
Philippe Tillet
7d095ec686 [LANG] Added sqrtf support 2021-07-27 12:38:48 -07:00
Philippe Tillet
073fddffc1 [PYTHON] Compiling Triton in Release mode now... 2021-07-27 12:38:48 -07:00
Philippe Tillet
5d84fde733 tmp 2021-07-27 12:38:48 -07:00
Philippe Tillet
da287bb710 [CODEGEN] Progress on atom.add.f16x2 2021-07-27 12:38:48 -07:00
Philippe Tillet
a77c925dfd [DRIVER] Improved performance of Host driver code 2021-07-27 12:38:48 -07:00
Philippe Tillet
8f8d36c7a4 [GENERAL] Various bugfixes 2021-07-27 12:38:48 -07:00
Philippe Tillet
50587bbf4b [General] LLVM-9 -> LLVM-10 2021-07-27 12:38:48 -07:00
Philippe Tillet
8f3ee53f24 [PYTHON] Added option to show PTX source code in Python 2021-07-27 12:38:48 -07:00
Philippe Tillet
cf80ccc798 [PYTHON] Fixed torch ABI issue 2021-07-27 12:38:48 -07:00
Philippe Tillet
06abc8cb40 [GENERAL] Fix compatibility issue with older Torch versions 2021-07-27 12:38:48 -07:00
Philippe Tillet
f152150e7d [LANG] Added log intrinsic 2021-07-27 12:38:48 -07:00
Philippe Tillet
02a6e81b88 [PYTHON] Cleaning C++ bindings 2021-07-27 12:38:48 -07:00
Philippe Tillet
34f1d5e565 [CODEGEN] Fixed bug in 2D reductions 2021-07-27 12:38:48 -07:00
Philippe Tillet
049ab989b5 [GENERAL] Various improvements:
* Sparse einsum in triton.ops.einsum
* Hacky support for fixed-tile-size atomic-add
* Various bugfixes in parser
2021-07-27 12:38:48 -07:00
Philippe Tillet
444907589d [GENERAL] Fixed MacOS compilation issues 2021-07-27 12:38:48 -07:00
Philippe Tillet
664d3cae89 [DRIVER] Removed OpenCL support
There is no plan to support OpenCL anytime soon (Vulkan would be preferred). Removing the adequate portion of the driver code
2021-07-27 12:38:48 -07:00
Philippe Tillet
840308ab5d [CODEGEN] More work on the CPU backend 2021-07-27 12:38:48 -07:00
Philippe Tillet
64eaec016f [Version] Now version 0.2.3 2021-07-27 12:38:48 -07:00
Philippe Tillet
a7d285a480 Merge pull request #41 from jeffra/fix-conda-build
fix llvm build inside conda environment
2021-07-27 12:38:48 -07:00
Philippe Tillet
db4e4b9dbf [VERSION] Now version 0.2.2 2021-07-27 12:38:48 -07:00
Jeff Rasley
7fdf2e378c fix llvm build inside conda environment (see link for similar issue)
https://github.com/tensorflow/tensorflow/issues/12998
2021-07-27 12:38:48 -07:00
Philippe Tillet
7af9d812cf [PYTHON] Added credits to Scott Gray for the idea used in launch.cc 2021-07-27 12:38:48 -07:00
Philippe Tillet
150ba0c70b [TESTS] Updated the test to be compatible with the new runtime API 2021-07-27 12:38:48 -07:00
Philippe Tillet
acff1b5e05 [RUNTIME] Lower-level interface for executing functions 2021-07-27 12:38:48 -07:00
Philippe Tillet
f4f216b88a [EXAMPLES] Added C++ example for Conv2d 2021-07-27 12:38:48 -07:00
Philippe Tillet
ba9955ae39 [CODEGEN][ANALYSIS] Fixed issue in layout inference 2021-07-27 12:38:48 -07:00
Philippe Tillet
89e456107b [EXAMPLES] Improved mat_mul example 2021-07-27 12:38:48 -07:00
Philippe Tillet
68c18238a9 [EXAMPLES] Added conv2d example 2021-07-27 12:38:48 -07:00
Philippe Tillet
46297a949f [PACKAGING] Now version 0.2.1 2021-07-27 12:38:48 -07:00
Philippe Tillet
29a0ad6c4d [DRIVER] Now always using PTXv6.4 2021-07-27 12:38:48 -07:00