Commit Graph

46 Commits

Author SHA1 Message Date
daadaada
840d65d8c6 [CODEGEN] Clean up visit_mma884 (#107) 2021-07-27 12:38:49 -07:00
daadaada
967e629c0c [CODEGEN] Add a pass to prefetch operands of dot if applicable. (#105)
* update membar pass when data is double buffered

* Add instruction prefetch_s

* prefetch tests pass (except the 1 warp case)

* Fix the 1-warp bug

* Add back prefetch files

* Disable prefetch on a100

* Always add war barrier on sm>=80
2021-07-27 12:38:49 -07:00
Philippe Tillet
d10265f054 [CODEGEN] Bugfix for immediate offsets in inline PTX (#104) 2021-07-27 12:38:49 -07:00
Philippe Tillet
1e844ba78d [CODEGEN] Switching to predicated inline PTX for LDGs (#103) 2021-07-27 12:38:49 -07:00
Philippe Tillet
840140bf26 [CODEGEN] Removed dedicated reassociate pass to merge it into LLVM isel (#101)
This massively simplifies implementation of `reassociate` and also fixes
a bunch of bug. The pass could still be improved, but can already be used
to generate constant pointer offsets in eg the matmul epilogue
2021-07-27 12:38:49 -07:00
Philippe Tillet
6a9810ccf2 [codegen] small bugfix: (#97)
* Added fp32 -> fp8 for ConstantFP = 0
 * Added some more robust semantic check for atomic_add
2021-07-27 12:38:49 -07:00
Philippe Tillet
7355efa745 [LANG] Preliminary FP8 support (#96) 2021-07-27 12:38:49 -07:00
daadaada
f6688372db [PYTHON] Allow triton.code_gen.Binary to print Triton-IR asm. (#89) 2021-07-27 12:38:49 -07:00
Philippe Tillet
39f4730305 Deprecation of Triton-C and Replacement by decorated Python functions (#86)
This PR implements a major overhaul of the frontend for Triton, and replaces Triton-C by a pure Python API in which kernels are defined as @triton.jit decorated functions. The documentation and tutorials have also been updated to accommodate these changes.

See documentations for more information on the new API
2021-07-27 12:38:49 -07:00
Philippe Tillet
5ba5a77561 [BUILD] Remove compilation warnings 2021-07-27 12:38:49 -07:00
Philippe Tillet
5b9afaa688 [CODEGEN] Fixed bug that caused conditional operator to not always
properly mask load operations

Also includes minor improvement to benchmarking infrastructure
2021-07-27 12:38:49 -07:00
Philippe Tillet
567a1a3d17 [CODEGEN] Bugfixes with FP32 async copy 2021-07-27 12:38:49 -07:00
Philippe Tillet
11215f0f03 [CODEGEN] Now initializing cp.async to zero when predicate is false
WARNING: case for non-zero initialization is still not handled. Will
require manual copy to shared
2021-07-27 12:38:49 -07:00
Philippe Tillet
5b83259592 [CODEGEN] Major performance improvements on A100 (#70)
Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks
2021-07-27 12:38:49 -07:00
Philippe Tillet
269ebc12e5 [PYTHON][TESTS][DOC] Various improvement of the API and code quality:
* Simplified `triton.kernel` API to achieve lower latency:
  > .data_ptr() must now be passed as kernel argument. No more implicit
conversion from torch.tensor
  > compilation options are now constant attributes, i.e., opt.d('VAR')
becomes opt.VAR
  > torch.device must now be passed explicitly to triton.kernel (no
longer inferred from torch.tensor arguments)
* C++ tests moved to `python/tests/`
* C++ tutorial created in `tutorials/`
* Python tutorial created in python/tutorials/
* Version changed to 1.0alpha
* No longer copying C++ headers into the Python package
* added python/triton/ops/ package for pre-written Triton ops
2021-07-27 12:38:48 -07:00
Philippe Tillet
a5a477c36b [CODEGEN] Fixed bug in recoalesce_inst LLVM codegen 2021-07-27 12:38:48 -07:00
Philippe Tillet
3b36a1e60c [CODEGEN] Fixed issue in traversal order for atomic_add and store_inst 2021-07-27 12:38:48 -07:00
Philippe Tillet
083bbd1e8d [GENERAL] Merged v1.0alpha into master. Added features are:
- A100 support via mma.16816
- Thread swizzling for conflict-free shared memory accesses without
padding
- Complete overhaul of the LLVM code generation in
codegen/selection/generator.cc to remove overengineering
- Added debugging capabilities in the Python binding
- Compilation error for kernels that spill
2021-07-27 12:38:48 -07:00
Philippe Tillet
fd5c72d6a0 [LANG] Added some more atomic_add support 2021-07-27 12:38:48 -07:00
Philippe Tillet
baa858aa74 [CODEGEN] Fixed bug in atomic_add 2021-07-27 12:38:48 -07:00
Philippe Tillet
073fddffc1 [PYTHON] Compiling Triton in Release mode now... 2021-07-27 12:38:48 -07:00
Philippe Tillet
5d84fde733 tmp 2021-07-27 12:38:48 -07:00
Philippe Tillet
da287bb710 [CODEGEN] Progress on atom.add.f16x2 2021-07-27 12:38:48 -07:00
Philippe Tillet
8f8d36c7a4 [GENERAL] Various bugfixes 2021-07-27 12:38:48 -07:00
Philippe Tillet
50587bbf4b [General] LLVM-9 -> LLVM-10 2021-07-27 12:38:48 -07:00
Philippe Tillet
f152150e7d [LANG] Added log intrinsic 2021-07-27 12:38:48 -07:00
Philippe Tillet
34f1d5e565 [CODEGEN] Fixed bug in 2D reductions 2021-07-27 12:38:48 -07:00
Philippe Tillet
049ab989b5 [GENERAL] Various improvements:
* Sparse einsum in triton.ops.einsum
* Hacky support for fixed-tile-size atomic-add
* Various bugfixes in parser
2021-07-27 12:38:48 -07:00
Philippe Tillet
840308ab5d [CODEGEN] More work on the CPU backend 2021-07-27 12:38:48 -07:00
Philippe Tillet
acff1b5e05 [RUNTIME] Lower-level interface for executing functions 2021-07-27 12:38:48 -07:00
Philippe Tillet
ba9955ae39 [CODEGEN][ANALYSIS] Fixed issue in layout inference 2021-07-27 12:38:48 -07:00
Philippe Tillet
8bdfbe2514 [ANALYSIS] Replaced min by gcd in layout inference 2021-07-27 12:38:48 -07:00
Philippe Tillet
e18f169a39 [CODEGEN] Fixed various issues in alignment inference pass 2021-07-27 12:38:48 -07:00
Philippe Tillet
da6008128e [CODEGEN] Fixed bug in alignment inference that prevented vectorization
in some cases
2021-07-27 12:38:48 -07:00
Philippe Tillet
4bb0311f60 [TRITON] Fixed misaligned address issue 2021-07-27 12:38:48 -07:00
Philippe Tillet
a8f1b85c5f [CODEGEN] Removed unnecessary coalescing rematerialization 2021-07-27 12:38:48 -07:00
Philippe Tillet
e7461a862b [CODEGEN] Bugfix in Disassociate pass; Added fp32 atomic_add support 2021-07-27 12:38:48 -07:00
Philippe Tillet
ddd89e1b22 [GENERAL] Fixed some undefined behavior with GCC-9 2021-07-27 12:38:48 -07:00
Philippe Tillet
0516ea96d0 [CODEGEN] Fixed bug that caused missing recoalescing for some transpose
operations
2021-07-27 12:38:48 -07:00
Philippe Tillet
c73dee080c [CODEGEN] Fixed bug for phi nodes with constant incoming value 2021-07-27 12:38:48 -07:00
Philippe Tillet
54805596f5 [CODEGEN][ANALYSIS] bugfix in alignment analysis 2021-07-27 12:38:48 -07:00
Philippe Tillet
5943baa53f [GENERAL] Error messages now no longer make terminal color green 2021-07-27 12:38:48 -07:00
Philippe Tillet
3304629de9 [CORE] Fixed several issues that arose in the development of the
torch-blocksparse package:

* Now using warp shuffle in reductions when possible
* Various bugfixes in layout inference
* Added INFINITY, exponential and select
* Better error messages for unimplemented constructs
2021-07-27 12:38:48 -07:00
Philippe Tillet
4181f9f2af [CODEGEN][TRANSFORM][PEEPHOLE] Fixed bug in *1 multiplication 2021-07-27 12:38:48 -07:00
Philippe Tillet
3816f2f259 [PYTHON][EINSUM] Now handling reduction sizes that are not a multiple of
TK
2021-07-27 12:38:48 -07:00
Philippe Tillet
6d7cf35123 History prior to this date belonged to the now deprecated ISAAC project, and was deleted to save space 2021-07-27 12:38:38 -07:00