Commit Graph

29 Commits

Author SHA1 Message Date
Philippe Tillet
5b4c8f221e [BACKEND] Compiler improvements (#557)
This PR adds several optimization capabilities in the compiler backend:
- Now using inline PTX for `tl.store`, making it possible to use things like evict_last
- For A100, mma layout can be directly converted to shared memory
- For A100, an additional "transpose" argument in `dot` allows tensors to be loaded once and used both row- and col- major.
- Fixed liveness analysis; this was broken.
- Now can load/store directly mma layout without converting. Useful for when tl.dot accumulator is initialized with DRAM data inside of an inner loop.
- `tl.dot` can now take LHS inputs in registers when it comes from a previous `tl.dot` instruction. Useful for e.g. fused attention.
2022-06-27 11:49:19 -07:00
Philippe Tillet
9f08ecd684 [FRONTEND] Semantic analysis refactor (#491)
Moved dispatch.cc to semantic.py (@ptillet)
Integer signedness analysis was moved from C++ to python (@daadaada)
Cleaner frontend types (@daadaada)
Moved SSA construction to a separate object (@ptillet)


Co-authored-by: Yan Da <dyanab@connect.ust.hk>
2022-04-06 16:13:53 -07:00
Philippe Tillet
2bed6fc850 [LANG] Added support for device functions (#484) 2022-04-03 20:58:16 -07:00
Philippe Tillet
e0cc488055 [FRONTEND] Added tl.clock and tl.globaltimer (#485) 2022-03-28 16:15:43 -07:00
Philippe Tillet
76a9ee50a8 Revert "[FRONTEND] Semantic analysis refactor (#473)" (#483)
This reverts commit 539961072c.
2022-03-24 17:16:50 -07:00
daadaada
539961072c [FRONTEND] Semantic analysis refactor (#473)
Moved dispatch.cc to semantic.py
Integer signedness now moved from C++ to python
Cleaner frontend type

Co-authored-by: Phil Tillet <phil@openai.com>
2022-03-16 21:25:30 -07:00
Philippe Tillet
98ed7db8c1 [CODEGEN] Improvements and bugfixes (#463) 2022-02-24 14:56:24 -08:00
Philippe Tillet
bef76b142a [BACKEND] float division is now approximate by default (#446) 2022-01-29 18:29:29 -08:00
daadaada
94a2e10fe5 [BACKEND] Add bf16 & tf32 mma supports (on A100) (#426) 2022-01-11 10:20:31 -08:00
Philippe Tillet
03f1256f60 [FRONTEND] Added volatile flag for load (#407) 2021-12-30 22:33:24 -08:00
Madeleine Thompson
985798f101 add missing bfloat16 repr and improve assertions (#403)
- `BF16TyID` was missing a repr implementation.
- Throw a better exception on impossible casts.
- Add a few assertions. Tested with a debug build.
- Add `pointer_dtype.__str__` to aid kernel debugging.
2021-12-23 17:01:17 -08:00
Philippe Tillet
5ce1b726dc [CODEGEN] Various bugfixes that make it possible to fuse RNG in a matmul epilogue (#356) 2021-10-24 02:30:46 -07:00
daadaada
858dec8372 [CODEGEN] Add cache modifier to tl.load (#351)
* Add cache modifier to tl.load
* Add comment to cache_modifier
* Remove force_nc_cache
* Update test
2021-10-17 22:14:04 -07:00
Philippe Tillet
5123db0b7d [LANG] Various (relatively minor) improvements (#320) 2021-10-04 18:39:40 -07:00
Philippe Tillet
4ff3714d61 [CODEGEN] Various bugfixes and stability improvements in compiler backend (#240) 2021-08-30 11:50:35 -07:00
Philippe Tillet
226fde6ea1 [CODEGEN] Now using atomic_rmw code path for atomic_xchg (#222) 2021-08-17 16:33:23 -07:00
Philippe Tillet
2824345065 [LANGUAGE] Added cos/sin (#132) 2021-07-27 12:38:49 -07:00
Philippe Tillet
0274429429 [IR] Added IR and Codegen support for atomic_rmw (#120) 2021-07-27 12:38:49 -07:00
Philippe Tillet
f81012a8cf [CODEGEN] Fixed atomic_add issue (#112)
* [CODEGEN] Fixed atomic_add issue

* [CODEGEN] Fixed liveness analysis bug for instructions that are not
DCE'd but have no users (e.g., atomic_cas)
2021-07-27 12:38:49 -07:00
daadaada
967e629c0c [CODEGEN] Add a pass to prefetch operands of dot if applicable. (#105)
* update membar pass when data is double buffered

* Add instruction prefetch_s

* prefetch tests pass (except the 1 warp case)

* Fix the 1-warp bug

* Add back prefetch files

* Disable prefetch on a100

* Always add war barrier on sm>=80
2021-07-27 12:38:49 -07:00
Philippe Tillet
840140bf26 [CODEGEN] Removed dedicated reassociate pass to merge it into LLVM isel (#101)
This massively simplifies implementation of `reassociate` and also fixes
a bunch of bug. The pass could still be improved, but can already be used
to generate constant pointer offsets in eg the matmul epilogue
2021-07-27 12:38:49 -07:00
Philippe Tillet
39f4730305 Deprecation of Triton-C and Replacement by decorated Python functions (#86)
This PR implements a major overhaul of the frontend for Triton, and replaces Triton-C by a pure Python API in which kernels are defined as @triton.jit decorated functions. The documentation and tutorials have also been updated to accommodate these changes.

See documentations for more information on the new API
2021-07-27 12:38:49 -07:00
Philippe Tillet
5b83259592 [CODEGEN] Major performance improvements on A100 (#70)
Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks
2021-07-27 12:38:49 -07:00
Philippe Tillet
083bbd1e8d [GENERAL] Merged v1.0alpha into master. Added features are:
- A100 support via mma.16816
- Thread swizzling for conflict-free shared memory accesses without
padding
- Complete overhaul of the LLVM code generation in
codegen/selection/generator.cc to remove overengineering
- Added debugging capabilities in the Python binding
- Compilation error for kernels that spill
2021-07-27 12:38:48 -07:00
Philippe Tillet
baa858aa74 [CODEGEN] Fixed bug in atomic_add 2021-07-27 12:38:48 -07:00
Philippe Tillet
f152150e7d [LANG] Added log intrinsic 2021-07-27 12:38:48 -07:00
Philippe Tillet
049ab989b5 [GENERAL] Various improvements:
* Sparse einsum in triton.ops.einsum
* Hacky support for fixed-tile-size atomic-add
* Various bugfixes in parser
2021-07-27 12:38:48 -07:00
Philippe Tillet
3304629de9 [CORE] Fixed several issues that arose in the development of the
torch-blocksparse package:

* Now using warp shuffle in reductions when possible
* Various bugfixes in layout inference
* Added INFINITY, exponential and select
* Better error messages for unimplemented constructs
2021-07-27 12:38:48 -07:00
Philippe Tillet
6d7cf35123 History prior to this date belonged to the now deprecated ISAAC project, and was deleted to save space 2021-07-27 12:38:38 -07:00