Philippe Tillet
66c94f21d7
[PYTHON] Removed .softmax from ops/__init__.py following previous commit
2021-07-27 12:38:48 -07:00
Philippe Tillet
b0647cfd52
[PYTHON] Removed support for dense softmax
...
Interest seems limited now that it is fused in cross_entropy. Will
likely re-add once it's easier to share code between ops
2021-07-27 12:38:48 -07:00
Jared Kaplan
682ac4c60e
Added a Softmax Xent Op ( #53 )
...
Also includes a bugfix in kernel.py to set the device before registering the c++ function object
2021-07-27 12:38:48 -07:00
Philippe Tillet
dffd66bc83
[PYTHON] Made codebase pep8 compliant
2021-07-27 12:38:48 -07:00
Philippe Tillet
2a02fabdac
[PYTHON] Some cleaning of the PyBind11 wrappers ( #62 )
2021-07-27 12:38:48 -07:00
Philippe Tillet
80e8a2f1f2
[PYTHON][OPS][BLOCKSPARSE] Now rounding softmax tile sizes to next power
...
of 2
2021-07-27 12:38:48 -07:00
Philippe Tillet
cc84a476a3
[TESTS] test_matmul.py now plots benchmarks
2021-07-27 12:38:48 -07:00
Philippe Tillet
fedbe6f439
[PYTHON] Added triton.__version__ string
2021-07-27 12:38:48 -07:00
Philippe Tillet
6fb4800f57
Improvements w/ Auto-Tuning and standard benchmarks ( #57 )
...
[PYTHON] Bug-fixes in the auto-tuning module and improvement of the existing API for it
2021-07-27 12:38:48 -07:00
Philippe Tillet
ad005d49ac
[PYTHON] Added benchmark code for CUTLASS
2021-07-27 12:38:48 -07:00
Philippe Tillet
3fde4b8f5b
[RUNTIME] Auto-tuning now works as expected when the values of
...
autotune_key change
2021-07-27 12:38:48 -07:00
Philippe Tillet
52af8cda34
[PYTHON] Fixed issue with IS_TK_DIV_K
2021-07-27 12:38:48 -07:00
Philippe Tillet
7cf358a352
[TUTORIALS] Fixed TYPO in CMakeLists.txt
2021-07-27 12:38:48 -07:00
Philippe Tillet
9b31244897
[PYTHON] Added benchmarking code
2021-07-27 12:38:48 -07:00
Philippe Tillet
7ba242fcce
[PYTHON][OPS] Added block-sparse softmax
2021-07-27 12:38:48 -07:00
Philippe Tillet
f81da73b6a
[PYTHON] Added utility to read single Triton kernel from provided file
...
in triton.read
2021-07-27 12:38:48 -07:00
Philippe Tillet
269ebc12e5
[PYTHON][TESTS][DOC] Various improvement of the API and code quality:
...
* Simplified `triton.kernel` API to achieve lower latency:
> .data_ptr() must now be passed as kernel argument. No more implicit
conversion from torch.tensor
> compilation options are now constant attributes, i.e., opt.d('VAR')
becomes opt.VAR
> torch.device must now be passed explicitly to triton.kernel (no
longer inferred from torch.tensor arguments)
* C++ tests moved to `python/tests/`
* C++ tutorial created in `tutorials/`
* Python tutorial created in python/tutorials/
* Version changed to 1.0alpha
* No longer copying C++ headers into the Python package
* added python/triton/ops/ package for pre-written Triton ops
2021-07-27 12:38:48 -07:00
Philippe Tillet
083bbd1e8d
[GENERAL] Merged v1.0alpha into master. Added features are:
...
- A100 support via mma.16816
- Thread swizzling for conflict-free shared memory accesses without
padding
- Complete overhaul of the LLVM code generation in
codegen/selection/generator.cc to remove overengineering
- Added debugging capabilities in the Python binding
- Compilation error for kernels that spill
2021-07-27 12:38:48 -07:00
Philippe Tillet
c0bc7ed8b0
[PYTHON] Added TRITON_DEBUG_MODE which reallocates input tensors outside of the pytorch memory pool to spot out-of-bounds accesses more easily
2021-07-27 12:38:48 -07:00
Philippe Tillet
547a99a5d4
[VERSION] 0.2.3 -> 0.3.0
2021-07-27 12:38:48 -07:00
Philippe Tillet
8ab62803db
[PYTHON] Context switching logic moved to PyTorch
2021-07-27 12:38:48 -07:00
Philippe Tillet
4f08d87fed
[DRIVER] Simplified Driver API by substantially removing reliance on driver::context
2021-07-27 12:38:48 -07:00
Philippe Tillet
073fddffc1
[PYTHON] Compiling Triton in Release mode now...
2021-07-27 12:38:48 -07:00
Philippe Tillet
a77c925dfd
[DRIVER] Improved performance of Host driver code
2021-07-27 12:38:48 -07:00
Philippe Tillet
8f8d36c7a4
[GENERAL] Various bugfixes
2021-07-27 12:38:48 -07:00
Philippe Tillet
50587bbf4b
[General] LLVM-9 -> LLVM-10
2021-07-27 12:38:48 -07:00
Philippe Tillet
8f3ee53f24
[PYTHON] Added option to show PTX source code in Python
2021-07-27 12:38:48 -07:00
Philippe Tillet
cf80ccc798
[PYTHON] Fixed torch ABI issue
2021-07-27 12:38:48 -07:00
Philippe Tillet
06abc8cb40
[GENERAL] Fix compatibility issue with older Torch versions
2021-07-27 12:38:48 -07:00
Philippe Tillet
f152150e7d
[LANG] Added log intrinsic
2021-07-27 12:38:48 -07:00
Philippe Tillet
02a6e81b88
[PYTHON] Cleaning C++ bindings
2021-07-27 12:38:48 -07:00
Philippe Tillet
049ab989b5
[GENERAL] Various improvements:
...
* Sparse einsum in triton.ops.einsum
* Hacky support for fixed-tile-size atomic-add
* Various bugfixes in parser
2021-07-27 12:38:48 -07:00
Philippe Tillet
840308ab5d
[CODEGEN] More work on the CPU backend
2021-07-27 12:38:48 -07:00
Philippe Tillet
64eaec016f
[Version] Now version 0.2.3
2021-07-27 12:38:48 -07:00
Philippe Tillet
db4e4b9dbf
[VERSION] Now version 0.2.2
2021-07-27 12:38:48 -07:00
Philippe Tillet
7af9d812cf
[PYTHON] Added credits to Scott Gray for the idea used in launch.cc
2021-07-27 12:38:48 -07:00
Philippe Tillet
acff1b5e05
[RUNTIME] Lower-level interface for executing functions
2021-07-27 12:38:48 -07:00
Philippe Tillet
ba9955ae39
[CODEGEN][ANALYSIS] Fixed issue in layout inference
2021-07-27 12:38:48 -07:00
Philippe Tillet
89e456107b
[EXAMPLES] Improved mat_mul example
2021-07-27 12:38:48 -07:00
Philippe Tillet
68c18238a9
[EXAMPLES] Added conv2d example
2021-07-27 12:38:48 -07:00
Philippe Tillet
46297a949f
[PACKAGING] Now version 0.2.1
2021-07-27 12:38:48 -07:00
Philippe Tillet
c251dc50f3
[PACKAGING] Now version 0.2.0
2021-07-27 12:38:48 -07:00
Philippe Tillet
4ccd78f1a6
[EXAMPLES][TUTORIAL] Changed to new triton.kernel API
2021-07-27 12:38:48 -07:00
Philippe Tillet
c33d6d15f5
[TRITON][PYTHON] Reverted back to distutils
2021-07-27 12:38:48 -07:00
Philippe Tillet
955b027103
[TRITON][KERNEL] Fixed issue for concurrent compilation of torch
...
extensions
2021-07-27 12:38:48 -07:00
Philippe Tillet
d85141182d
[PACKAGING] Now version 0.1.3
2021-07-27 12:38:48 -07:00
Philippe Tillet
5995cbff8e
[CORE] Auto-tuning now copies scalar buffers. Still needs to copy all buffers that are both read from and written to.
2021-07-27 12:38:48 -07:00
Philippe Tillet
78cd54b0c8
[PYTHON] Added support for FP16 scalar kernel arguments
2021-07-27 12:38:48 -07:00
Philippe Tillet
694bfbddf9
[PACKAGING] Now version 0.1.2
2021-07-27 12:38:48 -07:00
Philippe Tillet
13ff6472e0
[LANG] Fixed undefined behavior in replace_all_uses_with()
2021-07-27 12:38:48 -07:00