triton

Author	SHA1	Message	Date
Keren Zhou	153aecb339	[Triton-MLIR][BACKEND] insert_slice_async on GPUs < sm80 (#908 ) `insert_slice_async` is decomposed into `load + insert_slice` in the backend. Not sure if V100 perf can match the master branch though in this way. Maybe the performance can be improved if instructions are arranged in the following form: ``` %0 = load %1 = load %2 = load ... insert_slice %0 insert_slice %1 insert_slice %2 ``` Tested on A100 when manually enabling this decomposition. Tests on V100 haven't been integrated yet, we can divide the tests into two phases: 1. Test only load, insert_slice, and insert_slice_async, given TritonGPU IRs in `test_backend.py`. 2. End to end gemm tests on V100.	2022-11-24 14:05:54 -08:00
Philippe Tillet	23f71daa27	[OPTIMIZER] Fixed up order of shared layouts (#881 )	2022-11-21 06:25:02 +01:00
Philippe Tillet	f40c63fb03	[Triton-MLIR][OPTIMIZER] Cleaned up swizzling (#869 ) Swizzling is no longer implemented as a separate pass. It is instead done in a specialized constructor of SharedEncodingAttr, and tested via google tests instead of triton-opt + filecheck. In the future we may want to implement it as a pass again once we have an additional dialect between TritonGPU and LLVM.	2022-11-10 12:05:46 -08:00
Chenggang Zhao	57fd1864a7	[Triton-MLIR] Support FP8 (#864 ) Co-authored-by: Superjomn <yanchunwei@outlook.com>	2022-11-10 15:53:06 +08:00
Yan Chunwei	8832e32683	[Triton-MLIR][BACKEND] Refine ptxbuilder (#867 ) This PR does 1. Add `onlyBindMLIRArgs` argument to `PTXInstrCommon::call` method to support passing in a whole PTX code snippet 2. Refine the APIs and simplify the code usage.	2022-11-10 13:41:52 +08:00
Qingyi Liu	e517b58d59	[Triton-MLIR] Minor fixes to enable fused-softmax and layer-norm tutorials (#835 )	2022-11-09 02:18:56 +00:00
Keren Zhou	289ff293cc	[Triton-MLIR] Generate LLVM/PTX code for async ops (#735 )	2022-10-04 09:37:00 -07:00
Yan Chunwei	3a84278530	[Triton-MLIR][BACKEND] Refine dot conversion (#710 ) This PR does 1. Refine the dot conversion 2. some other tiny code refinement	2022-09-27 14:38:34 +08:00
Yan Chunwei	2a852044d9	[BACKEND] Add C++ tests for PTXFormat and some tiny refinement (#109 ) This PR does 1. Add some C++ tests for `PTXFormat` 2. Enhance the functionality of `PTXFormat`, make a `PTXInstr` instance can be called multiple times similar as a C function.	2022-09-09 09:15:07 -07:00
Jun Yang	ea175f689e	[CI]Added initial framework of CXX unittest (#98 ) Based on the discussion in #53 , I just added the initial flow of CXX unittests for this repo, with providing two dummy UTs as placeholder to show the usage, feel free to add your own CXX unittests. @Superjomn @ptillet @ptillet , in this PR, I also configure the integration-tests.yml to add the unittest into github CI check. Thanks	2022-09-04 12:50:27 +08:00

10 Commits