Commit Graph

193 Commits

Author SHA1 Message Date
Benjamin Lefaudeux
b53f5f3803 [OPS][BLOCKSPARSE] safeguarding a couple more configurations (#292) 2021-09-20 17:15:31 -07:00
Philippe Tillet
a12827848d [FRONTEND] Now using exist_ok=True when creating cache directories (#288) 2021-09-18 23:44:21 -07:00
Philippe Tillet
6e5b0b4301 [FRONTEND] Added on-disk cache for compiled kernels (#287) 2021-09-18 22:48:26 -07:00
Benjamin Lefaudeux
bd855ac13d [DOCS] Adding some doc on the benchmarks + requirements file (#285) 2021-09-18 16:37:30 -07:00
Philippe Tillet
313d6488f6 [CODEGEN] Fixed over-aggressive division handling in alignment pass (#280) 2021-09-15 00:40:17 -07:00
Philippe Tillet
da5063d898 [TEST] Added performance regression tests (#283) 2021-09-14 01:46:32 -07:00
Philippe Tillet
3e395bc84e [LANG] Fixed semantics of NaN in float comparisons (#281) 2021-09-13 15:06:29 -07:00
Min Xu
cecca90bea [DOCS] update installation doc and add gitignore (#279)
Co-authored-by: Min Xu <min.xu.public@gmail.com>
2021-09-12 21:11:45 -07:00
Philippe Tillet
4163d32c49 [DOCS] Fixed leftover exit() in 01-vector-add tutorial 2021-09-10 15:52:26 -07:00
Philippe Tillet
34369906b4 [PYTHON] Fix-up the previous commit 2021-09-10 11:13:25 -07:00
Philippe Tillet
ac10551d55 [PYTHON] Now providing triton.next_power_of_2 (#273) 2021-09-10 11:05:44 -07:00
Philippe Tillet
43723ccb95 [FRONTEND] Removed circular import that broke Python 3.6 support (#272) 2021-09-09 13:46:55 -07:00
Philippe Tillet
585e5cd0ec [TEST] Added test for empty kernel (#271) 2021-09-09 10:20:37 -07:00
Philippe Tillet
94c83d30ce [GENERAL] Removed deprecated driver files and added basic compatibility with rocm (#268)
- Removed driver module -- accelerator runtime is handled by pytorch
- Added basic support for ROCM based on @micmelesse 's PR -- now can execute empty kernel on AMD devices without any compile-time changes
- Now only using PREFER_SHARED for kernels when the size of shared memory is greater than 49k. Otherwise there can be poor L1 performance for broadcast tensors
2021-09-09 00:04:28 -07:00
Szymon Sidor
8bedcce9be [LANG] Added seeded random number generation - philox (#261) 2021-09-02 22:02:40 -07:00
Philippe Tillet
c069ef907e [PYTHON] triton.language is now a submodule rather than a single file (#260) 2021-09-02 13:30:14 -07:00
Rohit Dwivedula
c0daffc625 [DOCS] @heuristics -> @triton.heuristics in some snippets (#253) 2021-09-01 18:50:17 -07:00
Philippe Tillet
4ff3714d61 [CODEGEN] Various bugfixes and stability improvements in compiler backend (#240) 2021-08-30 11:50:35 -07:00
daadaada
85426dbaf7 [DOCS] Add comments in layout.h (#249) 2021-08-28 18:07:32 -07:00
milesial
5b29da719d [DRIVER] Add CUDA P2P support (#209) 2021-08-20 21:00:54 -07:00
Sasank Chilamkurthy
6aa5720d75 [DOCS] use numel for num_elements in elementwise tutorial (#228) 2021-08-19 19:35:12 -07:00
Philippe Tillet
f26a48a3b4 [DOCS] Various improvements (#224)
- Added docstr for autotune, Config, heuristics
- Added docstr for atomics
- Hiding internal _builder argument used for built-in language primitives
- Re-factor docstr to use common templates between similar functions.
2021-08-18 11:15:53 -07:00
Philippe Tillet
226fde6ea1 [CODEGEN] Now using atomic_rmw code path for atomic_xchg (#222) 2021-08-17 16:33:23 -07:00
Philippe Tillet
a714b6b856 [PYTHON] re-activated auto-tuner configurations for triton.ops.matmul (#212) 2021-08-16 22:56:21 -07:00
Philippe Tillet
bb1eebb4b4 [CODEGEN] Fixed bug for visit_reduce1d with 64-bit data-types (#207) 2021-08-14 21:07:01 -07:00
Philippe Tillet
6e7593b446 added reset_to_zero in vector addition (#205) 2021-08-14 10:58:38 -07:00
Philippe Tillet
c7a272cb91 [FRONTEND] Added default arguments for range (#203) 2021-08-14 10:11:18 -07:00
Philippe Tillet
b120d70a0a [CI] Moved from assert_allclose to assert_almost_equal (#200) 2021-08-12 12:00:30 -07:00
Philippe Tillet
70e28ff380 [DOCS] Minor modifications of the matmul tutorial (#199)
Making the code more compact and fixing inconsistencies between text variable names and final python program.
2021-08-11 18:59:15 -07:00
Philippe Tillet
398d4b4aeb [DOCS] softmax tutorial fixup (#198) 2021-08-11 17:35:00 -07:00
Philippe Tillet
298da78058 [CODEGEN/DRIVER] Tweaks for performance optimization (#193) 2021-08-07 16:41:44 -07:00
Nicholas Joseph
6cd1ec3955 [DOCS] Fix formatting mistakes (#192) 2021-08-06 12:58:43 -07:00
Nicholas Joseph
68f7eeba92 [DOCS] Improve matmul tutorial readability (#188) 2021-08-05 16:05:56 -07:00
Nicholas Joseph
4e6f667c2f [DOCS] Improve readability of 02-fused-softmax.py (#186) 2021-08-05 09:39:07 -07:00
Nicholas Joseph
23c71538fc [DOCS] Improve tutorial readability (#185) 2021-08-05 09:27:06 -07:00
Xiangru Lian
9967e9d4b4 [DOCS] Fix fused softmax example script naive softmax implementation (#178) 2021-08-02 09:37:31 -07:00
Philippe Tillet
a34c57402f [PYTHON] Improved error message for CPU (#167) 2021-07-30 09:47:27 -07:00
Reid Draper
2322d6df2a [CI] Update ptillet to openai (#152) 2021-07-29 11:39:50 -07:00
Philippe Tillet
4b9df06568 [CI] Bumped dev version to 1.0.1 and fixed permissions in documentation.yml (#149) 2021-07-28 04:35:14 -07:00
Philippe Tillet
acd5e44611 [GENERAL] Some minor improvements here and there to build systems and docs (#148) 2021-07-28 01:51:17 -07:00
Philippe Tillet
57c1fd3366 [BUILD] Now downloading LLVM from web if system does not have llvm-config-11 (#142) 2021-07-28 01:02:31 -07:00
Philippe Tillet
b253b77c71 [DOCS] Improved documentation and integration in CI (#139) 2021-07-27 12:38:49 -07:00
Philippe Tillet
76c6f24fb6 [CI] Made build-wheels compatible with system LLVM setup (#138)
This speeds up wheelhouse build time by ~10x
2021-07-27 12:38:49 -07:00
Philippe Tillet
8eb63bcb01 [CI] Various improvements to CI (#137)
Add clean-up before CI runs. Now using static LLVM-11 libraries from system rather than recompilation. Still no run-time LLVM dependencies
2021-07-27 12:38:49 -07:00
Philippe Tillet
298aead378 [FRONTEND] Fixed bugs in global symbols resolution of @triton.jit'd functions (#136) 2021-07-27 12:38:49 -07:00
Philippe Tillet
01276b5153 [FRONTEND] Added compilation flag to force use of .nc cache modifier (#134)
in DRAM loads. /!\ USE CAREFULLY - THIS CAN BREAK CORRECTNESS IF MISUSED
/!\
2021-07-27 12:38:49 -07:00
Philippe Tillet
2824345065 [LANGUAGE] Added cos/sin (#132) 2021-07-27 12:38:49 -07:00
Philippe Tillet
3169e4355c [PYTHON] Bugfix in trans_c for dsd in blocksparse matmul (#131) 2021-07-27 12:38:49 -07:00
Philippe Tillet
8cea583109 [IR] Preliminary support for BF16 (#129)
This PR adds a BF16 data-type, along with FP32 <-> BF16 conversion instructions in the LLVM codegen. Other kinds of ops on bfloat16 are not yet supported.
2021-07-27 12:38:49 -07:00
Philippe Tillet
9b4e2cae2d [PYTHON] Added isinstance to list of builtin symbols exposed (#128) 2021-07-27 12:38:49 -07:00