Philippe Tillet
e66bf76354
[RUNTIME] Bunch of bugfixes ( #372 )
2021-11-12 00:55:00 -08:00
Philippe Tillet
5ce1b726dc
[CODEGEN] Various bugfixes that make it possible to fuse RNG in a matmul epilogue ( #356 )
2021-10-24 02:30:46 -07:00
daadaada
858dec8372
[CODEGEN] Add cache modifier to tl.load ( #351 )
...
* Add cache modifier to tl.load
* Add comment to cache_modifier
* Remove force_nc_cache
* Update test
2021-10-17 22:14:04 -07:00
Philippe Tillet
5123db0b7d
[LANG] Various (relatively minor) improvements ( #320 )
2021-10-04 18:39:40 -07:00
Philippe Tillet
2c287544cb
[OPS] Faster and cleaner block-sparse implementation ( #311 )
2021-09-27 18:25:16 -07:00
Philippe Tillet
8a882b215f
[CODEGEN] Fixed performance regression on vectorized loads ( #259 )
2021-09-02 01:07:31 -07:00
Philippe Tillet
4ff3714d61
[CODEGEN] Various bugfixes and stability improvements in compiler backend ( #240 )
2021-08-30 11:50:35 -07:00
daadaada
85426dbaf7
[DOCS] Add comments in layout.h ( #249 )
2021-08-28 18:07:32 -07:00
Philippe Tillet
226fde6ea1
[CODEGEN] Now using atomic_rmw code path for atomic_xchg ( #222 )
2021-08-17 16:33:23 -07:00
Philippe Tillet
bb1eebb4b4
[CODEGEN] Fixed bug for visit_reduce1d with 64-bit data-types ( #207 )
2021-08-14 21:07:01 -07:00
Philippe Tillet
01276b5153
[FRONTEND] Added compilation flag to force use of .nc
cache modifier ( #134 )
...
in DRAM loads. /!\ USE CAREFULLY - THIS CAN BREAK CORRECTNESS IF MISUSED
/!\
2021-07-27 12:38:49 -07:00
Philippe Tillet
2824345065
[LANGUAGE] Added cos/sin ( #132 )
2021-07-27 12:38:49 -07:00
Philippe Tillet
8cea583109
[IR] Preliminary support for BF16 ( #129 )
...
This PR adds a BF16 data-type, along with FP32 <-> BF16 conversion instructions in the LLVM codegen. Other kinds of ops on bfloat16 are not yet supported.
2021-07-27 12:38:49 -07:00
daadaada
d8d6b715c8
[CODEGEN] Performance improvement on A100 ( #125 )
...
Improved codegen for the Ampere GPUs.
* Make the layout pass recognize the multistage pipelined pattern.
* Now the pipeline pass can automate the multistage pipelining transformation.
* Remove extra barriers (from the prefetch pass & WAR) on Ampere.
* Update the code generator (generator.cc) to make Triton generate n-buffered shared memory loads/stores.
2021-07-27 12:38:49 -07:00
Philippe Tillet
80c86ecf4a
[LANG] Minor semantic changes ( #121 )
...
* Now using unordered instead of ordered float (fixes NaN issues)
* Bool -> int32 now converts to 1 rather than -1
* Reduce extend arguments to 32-bits if possible
2021-07-27 12:38:49 -07:00
Philippe Tillet
0274429429
[IR] Added IR and Codegen support for atomic_rmw ( #120 )
2021-07-27 12:38:49 -07:00
Philippe Tillet
38ab4e955a
[CODEGEN] Bugfix in prefetch pass ( #118 )
2021-07-27 12:38:49 -07:00
Philippe Tillet
b5dcac484d
[CODEGEN] Small bugfix in atomic-add ( #114 )
2021-07-27 12:38:49 -07:00
Philippe Tillet
f81012a8cf
[CODEGEN] Fixed atomic_add issue ( #112 )
...
* [CODEGEN] Fixed atomic_add issue
* [CODEGEN] Fixed liveness analysis bug for instructions that are not
DCE'd but have no users (e.g., atomic_cas)
2021-07-27 12:38:49 -07:00
daadaada
840d65d8c6
[CODEGEN] Clean up visit_mma884 ( #107 )
2021-07-27 12:38:49 -07:00
daadaada
967e629c0c
[CODEGEN] Add a pass to prefetch operands of dot if applicable. ( #105 )
...
* update membar pass when data is double buffered
* Add instruction prefetch_s
* prefetch tests pass (except the 1 warp case)
* Fix the 1-warp bug
* Add back prefetch files
* Disable prefetch on a100
* Always add war barrier on sm>=80
2021-07-27 12:38:49 -07:00
Philippe Tillet
d10265f054
[CODEGEN] Bugfix for immediate offsets in inline PTX ( #104 )
2021-07-27 12:38:49 -07:00
Philippe Tillet
1e844ba78d
[CODEGEN] Switching to predicated inline PTX for LDGs ( #103 )
2021-07-27 12:38:49 -07:00
Philippe Tillet
840140bf26
[CODEGEN] Removed dedicated reassociate pass to merge it into LLVM isel ( #101 )
...
This massively simplifies implementation of `reassociate` and also fixes
a bunch of bug. The pass could still be improved, but can already be used
to generate constant pointer offsets in eg the matmul epilogue
2021-07-27 12:38:49 -07:00
Philippe Tillet
6a9810ccf2
[codegen] small bugfix: ( #97 )
...
* Added fp32 -> fp8 for ConstantFP = 0
* Added some more robust semantic check for atomic_add
2021-07-27 12:38:49 -07:00
Philippe Tillet
7355efa745
[LANG] Preliminary FP8 support ( #96 )
2021-07-27 12:38:49 -07:00
daadaada
f6688372db
[PYTHON] Allow triton.code_gen.Binary to print Triton-IR asm. ( #89 )
2021-07-27 12:38:49 -07:00
Philippe Tillet
39f4730305
Deprecation of Triton-C and Replacement by decorated Python functions ( #86 )
...
This PR implements a major overhaul of the frontend for Triton, and replaces Triton-C by a pure Python API in which kernels are defined as @triton.jit decorated functions. The documentation and tutorials have also been updated to accommodate these changes.
See documentations for more information on the new API
2021-07-27 12:38:49 -07:00
Philippe Tillet
5ba5a77561
[BUILD] Remove compilation warnings
2021-07-27 12:38:49 -07:00
Philippe Tillet
5b9afaa688
[CODEGEN] Fixed bug that caused conditional operator to not always
...
properly mask load operations
Also includes minor improvement to benchmarking infrastructure
2021-07-27 12:38:49 -07:00
Philippe Tillet
567a1a3d17
[CODEGEN] Bugfixes with FP32 async copy
2021-07-27 12:38:49 -07:00
Philippe Tillet
11215f0f03
[CODEGEN] Now initializing cp.async to zero when predicate is false
...
WARNING: case for non-zero initialization is still not handled. Will
require manual copy to shared
2021-07-27 12:38:49 -07:00
Philippe Tillet
5b83259592
[CODEGEN] Major performance improvements on A100 ( #70 )
...
Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks
2021-07-27 12:38:49 -07:00
Philippe Tillet
a5a477c36b
[CODEGEN] Fixed bug in recoalesce_inst LLVM codegen
2021-07-27 12:38:48 -07:00
Philippe Tillet
3b36a1e60c
[CODEGEN] Fixed issue in traversal order for atomic_add and store_inst
2021-07-27 12:38:48 -07:00
Philippe Tillet
083bbd1e8d
[GENERAL] Merged v1.0alpha into master. Added features are:
...
- A100 support via mma.16816
- Thread swizzling for conflict-free shared memory accesses without
padding
- Complete overhaul of the LLVM code generation in
codegen/selection/generator.cc to remove overengineering
- Added debugging capabilities in the Python binding
- Compilation error for kernels that spill
2021-07-27 12:38:48 -07:00
Philippe Tillet
fd5c72d6a0
[LANG] Added some more atomic_add support
2021-07-27 12:38:48 -07:00
Philippe Tillet
073fddffc1
[PYTHON] Compiling Triton in Release mode now...
2021-07-27 12:38:48 -07:00
Philippe Tillet
5d84fde733
tmp
2021-07-27 12:38:48 -07:00
Philippe Tillet
da287bb710
[CODEGEN] Progress on atom.add.f16x2
2021-07-27 12:38:48 -07:00
Philippe Tillet
8f8d36c7a4
[GENERAL] Various bugfixes
2021-07-27 12:38:48 -07:00
Philippe Tillet
50587bbf4b
[General] LLVM-9 -> LLVM-10
2021-07-27 12:38:48 -07:00
Philippe Tillet
f152150e7d
[LANG] Added log intrinsic
2021-07-27 12:38:48 -07:00
Philippe Tillet
34f1d5e565
[CODEGEN] Fixed bug in 2D reductions
2021-07-27 12:38:48 -07:00
Philippe Tillet
049ab989b5
[GENERAL] Various improvements:
...
* Sparse einsum in triton.ops.einsum
* Hacky support for fixed-tile-size atomic-add
* Various bugfixes in parser
2021-07-27 12:38:48 -07:00
Philippe Tillet
840308ab5d
[CODEGEN] More work on the CPU backend
2021-07-27 12:38:48 -07:00
Philippe Tillet
acff1b5e05
[RUNTIME] Lower-level interface for executing functions
2021-07-27 12:38:48 -07:00
Philippe Tillet
4bb0311f60
[TRITON] Fixed misaligned address issue
2021-07-27 12:38:48 -07:00
Philippe Tillet
e7461a862b
[CODEGEN] Bugfix in Disassociate pass; Added fp32 atomic_add support
2021-07-27 12:38:48 -07:00
Philippe Tillet
ddd89e1b22
[GENERAL] Fixed some undefined behavior with GCC-9
2021-07-27 12:38:48 -07:00