Philippe Tillet
11215f0f03
[CODEGEN] Now initializing cp.async to zero when predicate is false
...
WARNING: case for non-zero initialization is still not handled. Will
require manual copy to shared
2021-07-27 12:38:49 -07:00
Philippe Tillet
5b83259592
[CODEGEN] Major performance improvements on A100 ( #70 )
...
Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks
2021-07-27 12:38:49 -07:00
Philippe Tillet
3ca40b05cf
[DRIVER] Added options for developers to cache PTX file so that ti can
...
be manually modified
2021-07-27 12:38:49 -07:00
Philippe Tillet
b8a52c70c9
[LANG] Now requiring tiles have power of 2 number of elements
2021-07-27 12:38:48 -07:00
Philippe Tillet
6fb4800f57
Improvements w/ Auto-Tuning and standard benchmarks ( #57 )
...
[PYTHON] Bug-fixes in the auto-tuning module and improvement of the existing API for it
2021-07-27 12:38:48 -07:00
Philippe Tillet
ad5a30bae1
[LANG] Added __debug_barrier() call to force insertion of a CUDA
...
__syncthreads
2021-07-27 12:38:48 -07:00
Philippe Tillet
3fde4b8f5b
[RUNTIME] Auto-tuning now works as expected when the values of
...
autotune_key change
2021-07-27 12:38:48 -07:00
Philippe Tillet
0b025db2ee
[RUNTIME] Added option to print LLVM-IR
...
Also includes appropriate driver code change for that
2021-07-27 12:38:48 -07:00
Philippe Tillet
9f9d7b8840
[LANG] Fixed parsing error for built-in functions exp/log/sqrtf
2021-07-27 12:38:48 -07:00
Philippe Tillet
269ebc12e5
[PYTHON][TESTS][DOC] Various improvement of the API and code quality:
...
* Simplified `triton.kernel` API to achieve lower latency:
> .data_ptr() must now be passed as kernel argument. No more implicit
conversion from torch.tensor
> compilation options are now constant attributes, i.e., opt.d('VAR')
becomes opt.VAR
> torch.device must now be passed explicitly to triton.kernel (no
longer inferred from torch.tensor arguments)
* C++ tests moved to `python/tests/`
* C++ tutorial created in `tutorials/`
* Python tutorial created in python/tutorials/
* Version changed to 1.0alpha
* No longer copying C++ headers into the Python package
* added python/triton/ops/ package for pre-written Triton ops
2021-07-27 12:38:48 -07:00
Philippe Tillet
a5a477c36b
[CODEGEN] Fixed bug in recoalesce_inst LLVM codegen
2021-07-27 12:38:48 -07:00
Philippe Tillet
376c876eb8
[RUNTIME] Disable error on spills
2021-07-27 12:38:48 -07:00
Philippe Tillet
3b36a1e60c
[CODEGEN] Fixed issue in traversal order for atomic_add and store_inst
2021-07-27 12:38:48 -07:00
Philippe Tillet
083bbd1e8d
[GENERAL] Merged v1.0alpha into master. Added features are:
...
- A100 support via mma.16816
- Thread swizzling for conflict-free shared memory accesses without
padding
- Complete overhaul of the LLVM code generation in
codegen/selection/generator.cc to remove overengineering
- Added debugging capabilities in the Python binding
- Compilation error for kernels that spill
2021-07-27 12:38:48 -07:00
Philippe Tillet
c4fceeea49
[LANG] Added hacky min/max
2021-07-27 12:38:48 -07:00
Yan Da
27dc780871
[IR] Check constant_int type
2021-07-27 12:38:48 -07:00
Yan Da
01ef691b84
[LANG] Fix gep bug in INC
2021-07-27 12:38:48 -07:00
Yan Da
e9b2335224
[LANG] Add support for POSTFIX_INC and POSTFIX_DEC, and pointer type
2021-07-27 12:38:48 -07:00
Yan Da
05b95b7fa6
[LANG] Add support for PREFIX_INC and PREFIX_DEC.
2021-07-27 12:38:48 -07:00
Philippe Tillet
fd5c72d6a0
[LANG] Added some more atomic_add support
2021-07-27 12:38:48 -07:00
Philippe Tillet
5e8f4c934c
[DRIVER] Better exception handling of invalid ptx
2021-07-27 12:38:48 -07:00
Philippe Tillet
44ca2c0cb8
[DRIVER] Removed deprecated files and functions
2021-07-27 12:38:48 -07:00
Philippe Tillet
7ab2c2a356
[DRIVER] Removed obsolete SetArg
2021-07-27 12:38:48 -07:00
Philippe Tillet
4f08d87fed
[DRIVER] Simplified Driver API by substantially removing reliance on driver::context
2021-07-27 12:38:48 -07:00
Philippe Tillet
f42b04d925
[DRIVER] Added (slow) support for CUDA11 and Ampere
2021-07-27 12:38:48 -07:00
Philippe Tillet
baa858aa74
[CODEGEN] Fixed bug in atomic_add
2021-07-27 12:38:48 -07:00
Philippe Tillet
7d095ec686
[LANG] Added sqrtf support
2021-07-27 12:38:48 -07:00
Philippe Tillet
073fddffc1
[PYTHON] Compiling Triton in Release mode now...
2021-07-27 12:38:48 -07:00
Philippe Tillet
5d84fde733
tmp
2021-07-27 12:38:48 -07:00
Philippe Tillet
da287bb710
[CODEGEN] Progress on atom.add.f16x2
2021-07-27 12:38:48 -07:00
Philippe Tillet
a77c925dfd
[DRIVER] Improved performance of Host driver code
2021-07-27 12:38:48 -07:00
Philippe Tillet
8f8d36c7a4
[GENERAL] Various bugfixes
2021-07-27 12:38:48 -07:00
Philippe Tillet
50587bbf4b
[General] LLVM-9 -> LLVM-10
2021-07-27 12:38:48 -07:00
Philippe Tillet
8f3ee53f24
[PYTHON] Added option to show PTX source code in Python
2021-07-27 12:38:48 -07:00
Philippe Tillet
f152150e7d
[LANG] Added log intrinsic
2021-07-27 12:38:48 -07:00
Philippe Tillet
02a6e81b88
[PYTHON] Cleaning C++ bindings
2021-07-27 12:38:48 -07:00
Philippe Tillet
34f1d5e565
[CODEGEN] Fixed bug in 2D reductions
2021-07-27 12:38:48 -07:00
Philippe Tillet
049ab989b5
[GENERAL] Various improvements:
...
* Sparse einsum in triton.ops.einsum
* Hacky support for fixed-tile-size atomic-add
* Various bugfixes in parser
2021-07-27 12:38:48 -07:00
Philippe Tillet
664d3cae89
[DRIVER] Removed OpenCL support
...
There is no plan to support OpenCL anytime soon (Vulkan would be preferred). Removing the adequate portion of the driver code
2021-07-27 12:38:48 -07:00
Philippe Tillet
840308ab5d
[CODEGEN] More work on the CPU backend
2021-07-27 12:38:48 -07:00
Philippe Tillet
acff1b5e05
[RUNTIME] Lower-level interface for executing functions
2021-07-27 12:38:48 -07:00
Philippe Tillet
ba9955ae39
[CODEGEN][ANALYSIS] Fixed issue in layout inference
2021-07-27 12:38:48 -07:00
Philippe Tillet
29a0ad6c4d
[DRIVER] Now always using PTXv6.4
2021-07-27 12:38:48 -07:00
Philippe Tillet
8bdfbe2514
[ANALYSIS] Replaced min by gcd in layout inference
2021-07-27 12:38:48 -07:00
Philippe Tillet
e18f169a39
[CODEGEN] Fixed various issues in alignment inference pass
2021-07-27 12:38:48 -07:00
Philippe Tillet
da6008128e
[CODEGEN] Fixed bug in alignment inference that prevented vectorization
...
in some cases
2021-07-27 12:38:48 -07:00
Philippe Tillet
4bb0311f60
[TRITON] Fixed misaligned address issue
2021-07-27 12:38:48 -07:00
Philippe Tillet
a8f1b85c5f
[CODEGEN] Removed unnecessary coalescing rematerialization
2021-07-27 12:38:48 -07:00
Philippe Tillet
5995cbff8e
[CORE] Auto-tuning now copies scalar buffers. Still needs to copy all buffers that are both read from and written to.
2021-07-27 12:38:48 -07:00
Philippe Tillet
e7461a862b
[CODEGEN] Bugfix in Disassociate pass; Added fp32 atomic_add support
2021-07-27 12:38:48 -07:00