20 Commits

Author SHA1 Message Date
Philippe Tillet
5b4c8f221e [BACKEND] Compiler improvements (#557)
This PR adds several optimization capabilities in the compiler backend:
- Now using inline PTX for `tl.store`, making it possible to use things like evict_last
- For A100, mma layout can be directly converted to shared memory
- For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major.
- Fixed liveness analysis; this was broken.
- Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop.
- `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.
2022-06-27 11:49:19 -07:00
Philippe Tillet
2bed6fc850 [LANG] Added support for device functions (#484) 2022-04-03 20:58:16 -07:00
Philippe Tillet
98ed7db8c1 [CODEGEN] Improvements and bugfixes (#463) 2022-02-24 14:56:24 -08:00
Philippe Tillet
69ff52ea1f [CODEGEN] removed buggy (and mostly useless) optimization in peephole pass (#449) 2022-02-05 21:37:23 -08:00
daadaada
59d371c6eb [BACKEND] Added Int8 mma (#440) 2022-01-27 09:12:44 -08:00
daadaada
94a2e10fe5 [BACKEND] Add bf16 & tf32 mma supports (on A100) (#426) 2022-01-11 10:20:31 -08:00
Philippe Tillet
03f1256f60 [FRONTEND] Added volatile flag for load (#407) 2021-12-30 22:33:24 -08:00
Philippe Tillet
e062812969 [CODEGEN] Disabled peephole for masked load + select -- masked_load
doesn't work as expected when vectorized
2021-12-17 12:44:47 -08:00
daadaada
858dec8372 [CODEGEN] Add cache modifier to tl.load (#351)
* Add cache modifier to tl.load
* Add comment to cache_modifier
* Remove force_nc_cache
* Update test
2021-10-17 22:14:04 -07:00
Philippe Tillet
43723ccb95 [FRONTEND] Removed circular import that broke Python 3.6 support (#272) 2021-09-09 13:46:55 -07:00
Philippe Tillet
4ff3714d61 [CODEGEN] Various bugfixes and stability improvements in compiler backend (#240) 2021-08-30 11:50:35 -07:00
Philippe Tillet
39f4730305 Deprecation of Triton-C and Replacement by decorated Python functions (#86)
This PR implements a major overhaul of the frontend for Triton, and replaces Triton-C by a pure Python API in which kernels are defined as @triton.jit decorated functions. The documentation and tutorials have also been updated to accommodate these changes.

See documentations for more information on the new API
2021-07-27 12:38:49 -07:00
Philippe Tillet
5b9afaa688 [CODEGEN] Fixed bug that caused conditional operator to not always
properly mask load operations

Also includes minor improvement to benchmarking infrastructure
2021-07-27 12:38:49 -07:00
Philippe Tillet
5b83259592 [CODEGEN] Major performance improvements on A100 (#70)
Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks
2021-07-27 12:38:49 -07:00
Philippe Tillet
083bbd1e8d [GENERAL] Merged v1.0alpha into master. Added features are:
- A100 support via mma.16816
- Thread swizzling for conflict-free shared memory accesses without
padding
- Complete overhaul of the LLVM code generation in
codegen/selection/generator.cc to remove overengineering
- Added debugging capabilities in the Python binding
- Compilation error for kernels that spill
2021-07-27 12:38:48 -07:00
Philippe Tillet
ddd89e1b22 [GENERAL] Fixed some undefined behavior with GCC-9 2021-07-27 12:38:48 -07:00
Philippe Tillet
0516ea96d0 [CODEGEN] Fixed bug that caused missing recoalescing for some transpose
operations
2021-07-27 12:38:48 -07:00
Philippe Tillet
3304629de9 [CORE] Fixed several issues that arose in the development of the
torch-blocksparse package:

* Now using warp shuffle in reductions when possible
* Various bugfixes in layout inference
* Added INFINITY, exponential and select
* Better error messages for unimplemented constructs
2021-07-27 12:38:48 -07:00
Philippe Tillet
4181f9f2af [CODEGEN][TRANSFORM][PEEPHOLE] Fixed bug in *1 multiplication 2021-07-27 12:38:48 -07:00
Philippe Tillet
6d7cf35123 History prior to this date belonged to the now deprecated ISAAC project, and was deleted to save space 2021-07-27 12:38:38 -07:00