Philippe Tillet
840140bf26
[CODEGEN] Removed dedicated reassociate pass to merge it into LLVM isel ( #101 )
...
This massively simplifies implementation of `reassociate` and also fixes
a bunch of bug. The pass could still be improved, but can already be used
to generate constant pointer offsets in eg the matmul epilogue
2021-07-27 12:38:49 -07:00
Philippe Tillet
6a9810ccf2
[codegen] small bugfix: ( #97 )
...
* Added fp32 -> fp8 for ConstantFP = 0
* Added some more robust semantic check for atomic_add
2021-07-27 12:38:49 -07:00
Philippe Tillet
7355efa745
[LANG] Preliminary FP8 support ( #96 )
2021-07-27 12:38:49 -07:00
daadaada
f6688372db
[PYTHON] Allow triton.code_gen.Binary to print Triton-IR asm. ( #89 )
2021-07-27 12:38:49 -07:00
Philippe Tillet
39f4730305
Deprecation of Triton-C and Replacement by decorated Python functions ( #86 )
...
This PR implements a major overhaul of the frontend for Triton, and replaces Triton-C by a pure Python API in which kernels are defined as @triton.jit decorated functions. The documentation and tutorials have also been updated to accommodate these changes.
See documentations for more information on the new API
2021-07-27 12:38:49 -07:00
Philippe Tillet
5ba5a77561
[BUILD] Remove compilation warnings
2021-07-27 12:38:49 -07:00
Philippe Tillet
5b9afaa688
[CODEGEN] Fixed bug that caused conditional operator to not always
...
properly mask load operations
Also includes minor improvement to benchmarking infrastructure
2021-07-27 12:38:49 -07:00
Philippe Tillet
567a1a3d17
[CODEGEN] Bugfixes with FP32 async copy
2021-07-27 12:38:49 -07:00
Philippe Tillet
11215f0f03
[CODEGEN] Now initializing cp.async to zero when predicate is false
...
WARNING: case for non-zero initialization is still not handled. Will
require manual copy to shared
2021-07-27 12:38:49 -07:00
Philippe Tillet
5b83259592
[CODEGEN] Major performance improvements on A100 ( #70 )
...
Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks
2021-07-27 12:38:49 -07:00
Philippe Tillet
a5a477c36b
[CODEGEN] Fixed bug in recoalesce_inst LLVM codegen
2021-07-27 12:38:48 -07:00
Philippe Tillet
3b36a1e60c
[CODEGEN] Fixed issue in traversal order for atomic_add and store_inst
2021-07-27 12:38:48 -07:00
Philippe Tillet
083bbd1e8d
[GENERAL] Merged v1.0alpha into master. Added features are:
...
- A100 support via mma.16816
- Thread swizzling for conflict-free shared memory accesses without
padding
- Complete overhaul of the LLVM code generation in
codegen/selection/generator.cc to remove overengineering
- Added debugging capabilities in the Python binding
- Compilation error for kernels that spill
2021-07-27 12:38:48 -07:00
Philippe Tillet
fd5c72d6a0
[LANG] Added some more atomic_add support
2021-07-27 12:38:48 -07:00
Philippe Tillet
073fddffc1
[PYTHON] Compiling Triton in Release mode now...
2021-07-27 12:38:48 -07:00
Philippe Tillet
5d84fde733
tmp
2021-07-27 12:38:48 -07:00
Philippe Tillet
da287bb710
[CODEGEN] Progress on atom.add.f16x2
2021-07-27 12:38:48 -07:00
Philippe Tillet
8f8d36c7a4
[GENERAL] Various bugfixes
2021-07-27 12:38:48 -07:00
Philippe Tillet
50587bbf4b
[General] LLVM-9 -> LLVM-10
2021-07-27 12:38:48 -07:00
Philippe Tillet
f152150e7d
[LANG] Added log intrinsic
2021-07-27 12:38:48 -07:00
Philippe Tillet
34f1d5e565
[CODEGEN] Fixed bug in 2D reductions
2021-07-27 12:38:48 -07:00
Philippe Tillet
049ab989b5
[GENERAL] Various improvements:
...
* Sparse einsum in triton.ops.einsum
* Hacky support for fixed-tile-size atomic-add
* Various bugfixes in parser
2021-07-27 12:38:48 -07:00
Philippe Tillet
840308ab5d
[CODEGEN] More work on the CPU backend
2021-07-27 12:38:48 -07:00
Philippe Tillet
acff1b5e05
[RUNTIME] Lower-level interface for executing functions
2021-07-27 12:38:48 -07:00
Philippe Tillet
4bb0311f60
[TRITON] Fixed misaligned address issue
2021-07-27 12:38:48 -07:00
Philippe Tillet
e7461a862b
[CODEGEN] Bugfix in Disassociate pass; Added fp32 atomic_add support
2021-07-27 12:38:48 -07:00
Philippe Tillet
ddd89e1b22
[GENERAL] Fixed some undefined behavior with GCC-9
2021-07-27 12:38:48 -07:00
Philippe Tillet
c73dee080c
[CODEGEN] Fixed bug for phi nodes with constant incoming value
2021-07-27 12:38:48 -07:00
Philippe Tillet
3304629de9
[CORE] Fixed several issues that arose in the development of the
...
torch-blocksparse package:
* Now using warp shuffle in reductions when possible
* Various bugfixes in layout inference
* Added INFINITY, exponential and select
* Better error messages for unimplemented constructs
2021-07-27 12:38:48 -07:00
Philippe Tillet
3816f2f259
[PYTHON][EINSUM] Now handling reduction sizes that are not a multiple of
...
TK
2021-07-27 12:38:48 -07:00
Philippe Tillet
6d7cf35123
History prior to this date belonged to the now deprecated ISAAC project, and was deleted to save space
2021-07-27 12:38:38 -07:00