Commit Graph

92 Commits

Author SHA1 Message Date
Philippe Tillet
66c94f21d7 [PYTHON] Removed .softmax from ops/__init__.py following previous commit 2021-07-27 12:38:48 -07:00
Philippe Tillet
b0647cfd52 [PYTHON] Removed support for dense softmax
Interest seems limited now that it is fused in cross_entropy. Will
likely re-add once it's easier to share code between ops
2021-07-27 12:38:48 -07:00
Jared Kaplan
682ac4c60e Added a Softmax Xent Op (#53)
Also includes a bugfix in kernel.py to set the device before registering the c++ function object
2021-07-27 12:38:48 -07:00
Philippe Tillet
dffd66bc83 [PYTHON] Made codebase pep8 compliant 2021-07-27 12:38:48 -07:00
Philippe Tillet
2a02fabdac [PYTHON] Some cleaning of the PyBind11 wrappers (#62) 2021-07-27 12:38:48 -07:00
Philippe Tillet
80e8a2f1f2 [PYTHON][OPS][BLOCKSPARSE] Now rounding softmax tile sizes to next power
of 2
2021-07-27 12:38:48 -07:00
Philippe Tillet
cc84a476a3 [TESTS] test_matmul.py now plots benchmarks 2021-07-27 12:38:48 -07:00
Philippe Tillet
fedbe6f439 [PYTHON] Added triton.__version__ string 2021-07-27 12:38:48 -07:00
Philippe Tillet
6fb4800f57 Improvements w/ Auto-Tuning and standard benchmarks (#57)
[PYTHON] Bug-fixes in the auto-tuning module and improvement of the existing API for it
2021-07-27 12:38:48 -07:00
Philippe Tillet
ad005d49ac [PYTHON] Added benchmark code for CUTLASS 2021-07-27 12:38:48 -07:00
Philippe Tillet
3fde4b8f5b [RUNTIME] Auto-tuning now works as expected when the values of
autotune_key change
2021-07-27 12:38:48 -07:00
Philippe Tillet
52af8cda34 [PYTHON] Fixed issue with IS_TK_DIV_K 2021-07-27 12:38:48 -07:00
Philippe Tillet
7cf358a352 [TUTORIALS] Fixed TYPO in CMakeLists.txt 2021-07-27 12:38:48 -07:00
Philippe Tillet
9b31244897 [PYTHON] Added benchmarking code 2021-07-27 12:38:48 -07:00
Philippe Tillet
7ba242fcce [PYTHON][OPS] Added block-sparse softmax 2021-07-27 12:38:48 -07:00
Philippe Tillet
f81da73b6a [PYTHON] Added utility to read single Triton kernel from provided file
in triton.read
2021-07-27 12:38:48 -07:00
Philippe Tillet
269ebc12e5 [PYTHON][TESTS][DOC] Various improvement of the API and code quality:
* Simplified `triton.kernel` API to achieve lower latency:
  > .data_ptr() must now be passed as kernel argument. No more implicit
conversion from torch.tensor
  > compilation options are now constant attributes, i.e., opt.d('VAR')
becomes opt.VAR
  > torch.device must now be passed explicitly to triton.kernel (no
longer inferred from torch.tensor arguments)
* C++ tests moved to `python/tests/`
* C++ tutorial created in `tutorials/`
* Python tutorial created in python/tutorials/
* Version changed to 1.0alpha
* No longer copying C++ headers into the Python package
* added python/triton/ops/ package for pre-written Triton ops
2021-07-27 12:38:48 -07:00
Philippe Tillet
083bbd1e8d [GENERAL] Merged v1.0alpha into master. Added features are:
- A100 support via mma.16816
- Thread swizzling for conflict-free shared memory accesses without
padding
- Complete overhaul of the LLVM code generation in
codegen/selection/generator.cc to remove overengineering
- Added debugging capabilities in the Python binding
- Compilation error for kernels that spill
2021-07-27 12:38:48 -07:00
Philippe Tillet
c0bc7ed8b0 [PYTHON] Added TRITON_DEBUG_MODE which reallocates input tensors outside of the pytorch memory pool to spot out-of-bounds accesses more easily 2021-07-27 12:38:48 -07:00
Philippe Tillet
547a99a5d4 [VERSION] 0.2.3 -> 0.3.0 2021-07-27 12:38:48 -07:00
Philippe Tillet
8ab62803db [PYTHON] Context switching logic moved to PyTorch 2021-07-27 12:38:48 -07:00
Philippe Tillet
4f08d87fed [DRIVER] Simplified Driver API by substantially removing reliance on driver::context 2021-07-27 12:38:48 -07:00
Philippe Tillet
073fddffc1 [PYTHON] Compiling Triton in Release mode now... 2021-07-27 12:38:48 -07:00
Philippe Tillet
a77c925dfd [DRIVER] Improved performance of Host driver code 2021-07-27 12:38:48 -07:00
Philippe Tillet
8f8d36c7a4 [GENERAL] Various bugfixes 2021-07-27 12:38:48 -07:00
Philippe Tillet
50587bbf4b [General] LLVM-9 -> LLVM-10 2021-07-27 12:38:48 -07:00
Philippe Tillet
8f3ee53f24 [PYTHON] Added option to show PTX source code in Python 2021-07-27 12:38:48 -07:00
Philippe Tillet
cf80ccc798 [PYTHON] Fixed torch ABI issue 2021-07-27 12:38:48 -07:00
Philippe Tillet
06abc8cb40 [GENERAL] Fix compatibility issue with older Torch versions 2021-07-27 12:38:48 -07:00
Philippe Tillet
f152150e7d [LANG] Added log intrinsic 2021-07-27 12:38:48 -07:00
Philippe Tillet
02a6e81b88 [PYTHON] Cleaning C++ bindings 2021-07-27 12:38:48 -07:00
Philippe Tillet
049ab989b5 [GENERAL] Various improvements:
* Sparse einsum in triton.ops.einsum
* Hacky support for fixed-tile-size atomic-add
* Various bugfixes in parser
2021-07-27 12:38:48 -07:00
Philippe Tillet
840308ab5d [CODEGEN] More work on the CPU backend 2021-07-27 12:38:48 -07:00
Philippe Tillet
64eaec016f [Version] Now version 0.2.3 2021-07-27 12:38:48 -07:00
Philippe Tillet
db4e4b9dbf [VERSION] Now version 0.2.2 2021-07-27 12:38:48 -07:00
Philippe Tillet
7af9d812cf [PYTHON] Added credits to Scott Gray for the idea used in launch.cc 2021-07-27 12:38:48 -07:00
Philippe Tillet
acff1b5e05 [RUNTIME] Lower-level interface for executing functions 2021-07-27 12:38:48 -07:00
Philippe Tillet
ba9955ae39 [CODEGEN][ANALYSIS] Fixed issue in layout inference 2021-07-27 12:38:48 -07:00
Philippe Tillet
89e456107b [EXAMPLES] Improved mat_mul example 2021-07-27 12:38:48 -07:00
Philippe Tillet
68c18238a9 [EXAMPLES] Added conv2d example 2021-07-27 12:38:48 -07:00
Philippe Tillet
46297a949f [PACKAGING] Now version 0.2.1 2021-07-27 12:38:48 -07:00
Philippe Tillet
c251dc50f3 [PACKAGING] Now version 0.2.0 2021-07-27 12:38:48 -07:00
Philippe Tillet
4ccd78f1a6 [EXAMPLES][TUTORIAL] Changed to new triton.kernel API 2021-07-27 12:38:48 -07:00
Philippe Tillet
c33d6d15f5 [TRITON][PYTHON] Reverted back to distutils 2021-07-27 12:38:48 -07:00
Philippe Tillet
955b027103 [TRITON][KERNEL] Fixed issue for concurrent compilation of torch
extensions
2021-07-27 12:38:48 -07:00
Philippe Tillet
d85141182d [PACKAGING] Now version 0.1.3 2021-07-27 12:38:48 -07:00
Philippe Tillet
5995cbff8e [CORE] Auto-tuning now copies scalar buffers. Still needs to copy all buffers that are both read from and written to. 2021-07-27 12:38:48 -07:00
Philippe Tillet
78cd54b0c8 [PYTHON] Added support for FP16 scalar kernel arguments 2021-07-27 12:38:48 -07:00
Philippe Tillet
694bfbddf9 [PACKAGING] Now version 0.1.2 2021-07-27 12:38:48 -07:00
Philippe Tillet
13ff6472e0 [LANG] Fixed undefined behavior in replace_all_uses_with() 2021-07-27 12:38:48 -07:00