Gregory Axler
2193bee94e
[Example] Fix the compile function in copy_strided.py ( #1029 )
2023-01-05 10:37:41 -08:00
Philippe Tillet
20100a7254
Merge triton-mlir
branch - Complete rewrite of the backend from scratch ( #1004 )
...
This PR merges the `triton-mlir` branch, in which we have been quietly
rewriting the Triton backend from scratch to increase maintainability,
stability and ultimately performance. Changes to the runtime are
minimal, and this new version aims to remain backward-compatible with
the previous commit. The legacy backend is now officially deprecated,
but can still be accessed via the `legacy-backend` tag.
Co-authored-by: Keren Zhou <kerenzhou@openai.com >
Co-authored-by: Yan Chunwei <yanchunwei@outlook.com >
Co-authored-by: goostavz <109190422+goostavz@users.noreply.github.com >
Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com >
Co-authored-by: Yan Da <dyanab@connect.ust.hk >
Co-authored-by: Jun Yang <yangjunpro@gmail.com >
Co-authored-by: Ian Bearman <ianb@microsoft.com >
Co-authored-by: Jason Ansel <jansel@jansel.net >
Co-authored-by: Qingyi Liu <qingyil@nvidia.com >
Co-authored-by: ben-zhang-609 <110140741+ben-zhang-609@users.noreply.github.com >
Co-authored-by: Chenggang Zhao <lyricz@yeah.net >
Co-authored-by: ben-zhang-609 <benzh609@gmail.com >
Co-authored-by: dongdongl <dongdongl@nvidia.com >
2022-12-21 01:30:50 -08:00
Philippe Tillet
269ebc12e5
[PYTHON][TESTS][DOC] Various improvement of the API and code quality:
...
* Simplified `triton.kernel` API to achieve lower latency:
> .data_ptr() must now be passed as kernel argument. No more implicit
conversion from torch.tensor
> compilation options are now constant attributes, i.e., opt.d('VAR')
becomes opt.VAR
> torch.device must now be passed explicitly to triton.kernel (no
longer inferred from torch.tensor arguments)
* C++ tests moved to `python/tests/`
* C++ tutorial created in `tutorials/`
* Python tutorial created in python/tutorials/
* Version changed to 1.0alpha
* No longer copying C++ headers into the Python package
* added python/triton/ops/ package for pre-written Triton ops
2021-07-27 12:38:48 -07:00
Philippe Tillet
083bbd1e8d
[GENERAL] Merged v1.0alpha into master. Added features are:
...
- A100 support via mma.16816
- Thread swizzling for conflict-free shared memory accesses without
padding
- Complete overhaul of the LLVM code generation in
codegen/selection/generator.cc to remove overengineering
- Added debugging capabilities in the Python binding
- Compilation error for kernels that spill
2021-07-27 12:38:48 -07:00
Philippe Tillet
c0bc7ed8b0
[PYTHON] Added TRITON_DEBUG_MODE which reallocates input tensors outside of the pytorch memory pool to spot out-of-bounds accesses more easily
2021-07-27 12:38:48 -07:00
Philippe Tillet
8f8d36c7a4
[GENERAL] Various bugfixes
2021-07-27 12:38:48 -07:00
Philippe Tillet
8f3ee53f24
[PYTHON] Added option to show PTX source code in Python
2021-07-27 12:38:48 -07:00
Philippe Tillet
049ab989b5
[GENERAL] Various improvements:
...
* Sparse einsum in triton.ops.einsum
* Hacky support for fixed-tile-size atomic-add
* Various bugfixes in parser
2021-07-27 12:38:48 -07:00
Philippe Tillet
acff1b5e05
[RUNTIME] Lower-level interface for executing functions
2021-07-27 12:38:48 -07:00
Philippe Tillet
ba9955ae39
[CODEGEN][ANALYSIS] Fixed issue in layout inference
2021-07-27 12:38:48 -07:00
Philippe Tillet
89e456107b
[EXAMPLES] Improved mat_mul example
2021-07-27 12:38:48 -07:00
Philippe Tillet
68c18238a9
[EXAMPLES] Added conv2d example
2021-07-27 12:38:48 -07:00
Philippe Tillet
4ccd78f1a6
[EXAMPLES][TUTORIAL] Changed to new triton.kernel API
2021-07-27 12:38:48 -07:00
jack-willturner
180ed26b61
[DOCS] Transposition fix
2021-07-27 12:38:48 -07:00
jack-willturner
a98a2db2c2
[DOCS] Matrix copy and transpose
2021-07-27 12:38:48 -07:00
jack-willturner
32819dea51
[DOCS] Matmul and vecadd working examples
2021-07-27 12:38:48 -07:00
Philippe Tillet
c36ad6bf8a
[PYTHON][EXAMPLES][EINSUM] Updated configs for matmul
2021-07-27 12:38:48 -07:00
Philippe Tillet
7924642b78
[PYTHON][EXAMPLES][EINSUM] Added stride in CONV2D example
2021-07-27 12:38:48 -07:00
Philippe Tillet
f22ad0064c
[PYTHON][EXAMPLES][EINSUM] Added group-convolution test/benchmark
2021-07-27 12:38:48 -07:00
Philippe Tillet
5bb977173f
[PYTHON][EINSUM] re-established auto-tuning
2021-07-27 12:38:48 -07:00
Philippe Tillet
3304629de9
[CORE] Fixed several issues that arose in the development of the
...
torch-blocksparse package:
* Now using warp shuffle in reductions when possible
* Various bugfixes in layout inference
* Added INFINITY, exponential and select
* Better error messages for unimplemented constructs
2021-07-27 12:38:48 -07:00
Philippe Tillet
9fda39f64c
[PYTHON][EXAMPLES] Removed BlockSparse examples; see
...
https://github.com/ptillet/torch-blocksparse.git
2021-07-27 12:38:48 -07:00
Philippe Tillet
268894a5ce
[PYTHON] Merged blocksparse branch:
...
* Example for blocksparse matrix multiplication
* Simplified Triton kernel API
* Revived auto-tuning in einsum
2021-07-27 12:38:48 -07:00
Philippe Tillet
dfb844bf41
[GENERAL] Improved caching mechanism:
...
* Now computing hash in libtriton
* Now only compiling a single pytorch hook per function signature
2021-07-27 12:38:48 -07:00
Philippe Tillet
9e54a03006
[PYTHON][EXAMPLES] Removed obsolete files
2021-07-27 12:38:48 -07:00
Philippe Tillet
3816f2f259
[PYTHON][EINSUM] Now handling reduction sizes that are not a multiple of
...
TK
2021-07-27 12:38:48 -07:00
Philippe Tillet
404dd18333
[PYTHON][CORE] Deprecating Tensorflow support
2021-07-27 12:38:48 -07:00
Philippe Tillet
558422c18a
[PYTHON][EXAMPLES] Changed shape of einsum examples
2021-07-27 12:38:48 -07:00
Philippe Tillet
6d7cf35123
History prior to this date belonged to the now deprecated ISAAC project, and was deleted to save space
2021-07-27 12:38:38 -07:00