Go to file

Philippe Tillet 5a8a544d10 [OPS][BLOCKSPARSE] Improved robustness, clarity and performance (#450 )

* dds layout now internally re-uses dsd code path for increased code 
* at_mask and kp_mask related things are now dropped from the softmax API. I couldn't think of any case where it was needed beyond is_causal. And if there is any, we should probably find a way to get it implemented statically so that users don't have to materialize masks.
 * fixed bug in blocksparse matmul that caused troubles when layout had a full row/col of zeros
 * blocksparse softmax now no longer modifies any data in-place
 * blocksparse softmax now takes an is_dense arguments that provides better performance. Passing is_dense=True, is_causal=True is the best way to achieve triangular attention.
  * unit tests now test backward pass

2022-02-06 18:00:45 -08:00

.github/workflows

[STYLE] check python with flake8 (#424 )

2022-01-07 15:28:36 -08:00

cmake

[CI] Made build-wheels compatible with system LLVM setup (#138 )

2021-07-27 12:38:49 -07:00

deps

[ALL] Merge master (#447 )

2022-01-30 20:21:20 -08:00

docs

[DOCS] fix tutorials for v2.0 (#422 )

2022-01-07 12:34:38 -08:00

include/triton

[ALL] Merge master (#447 )

2022-01-30 20:21:20 -08:00

lib

[CODEGEN] removed buggy (and mostly useless) optimization in peephole pass (#449 )

2022-02-05 21:37:23 -08:00

python

[OPS][BLOCKSPARSE] Improved robustness, clarity and performance (#450 )

2022-02-06 18:00:45 -08:00

.gitignore

[ALL] Merge master (#447 )

2022-01-30 20:21:20 -08:00

.gitmodules

[ALL] Merge master (#447 )

2022-01-30 20:21:20 -08:00

.isort.cfg

[STYLE] add isort and autopep8 config files and check on CI (#423 )

2022-01-07 13:11:34 -08:00

CMakeLists.txt

[ALL] Merge master (#447 )

2022-01-30 20:21:20 -08:00

LICENSE

[LICENSE] Edit copyright notice (#219 )

2021-08-17 09:25:19 -07:00

README.md

[DOCS] Minor README edits

2021-09-28 00:39:33 -07:00

README.md

`Documentation`

Triton

This is the development repository of Triton, a language and compiler for writing highly efficient custom Deep-Learning primitives. The aim of Triton is to provide an open-source environment to write fast code at higher productivity than CUDA, but also with higher flexibility than other existing DSLs.

The foundations of this project are described in the following MAPL2019 publication: Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations. Please consider citing this work if you use Triton!

The official documentation contains installation instructions and tutorials.

Changelog

Version 1.1 is out! New features include:

Many, many bugfixes
More documentation
Automatic on-disk caching of compiled binary objects
Random Number Generation
Faster (up to 2x on A100), cleaner blocksparse ops

Contributing

Community contributions are more than welcome, whether it be to fix bugs or to add new features. Feel free to open GitHub issues about your contribution ideas, and we will review them. A contributor's guide containing general guidelines is coming soon!

If you’re interested in joining our team and working on Triton & GPU kernels, we’re hiring!

Compatibility

Supported Platforms:

Linux

Supported Hardware:

NVIDIA GPUs (Compute Capability 7.0+)
Under development: AMD GPUs, CPUs

Disclaimer

Triton is a fairly recent project, and it is under active development. We expect it to be pretty useful in a wide variety of cases, but don't be surprised if it's a bit rough around the edges :)

README.md Unescape Escape

Triton

Changelog

Contributing

Compatibility

Disclaimer

README.md