419 Commits

Author SHA1 Message Date
Philippe Tillet
bb1eebb4b4 [CODEGEN] Fixed bug for visit_reduce1d with 64-bit data-types (#207) 2021-08-14 21:07:01 -07:00
Philippe Tillet
6e7593b446 added reset_to_zero in vector addition (#205) 2021-08-14 10:58:38 -07:00
Philippe Tillet
c7a272cb91 [FRONTEND] Added default arguments for range (#203) 2021-08-14 10:11:18 -07:00
Philippe Tillet
b120d70a0a [CI] Moved from assert_allclose to assert_almost_equal (#200) 2021-08-12 12:00:30 -07:00
Philippe Tillet
70e28ff380 [DOCS] Minor modifications of the matmul tutorial (#199)
Making the code more compact and fixing inconsistencies between text variable names and final python program.
2021-08-11 18:59:15 -07:00
Philippe Tillet
398d4b4aeb [DOCS] softmax tutorial fixup (#198) 2021-08-11 17:35:00 -07:00
Philippe Tillet
298da78058 [CODEGEN/DRIVER] Tweaks for performance optimization (#193) 2021-08-07 16:41:44 -07:00
Nicholas Joseph
6cd1ec3955 [DOCS] Fix formatting mistakes (#192) 2021-08-06 12:58:43 -07:00
Nicholas Joseph
68f7eeba92 [DOCS] Improve matmul tutorial readability (#188) 2021-08-05 16:05:56 -07:00
Nicholas Joseph
4e6f667c2f [DOCS] Improve readability of 02-fused-softmax.py (#186) 2021-08-05 09:39:07 -07:00
Nicholas Joseph
23c71538fc [DOCS] Improve tutorial readability (#185) 2021-08-05 09:27:06 -07:00
Xiangru Lian
9967e9d4b4 [DOCS] Fix fused softmax example script naive softmax implementation (#178) 2021-08-02 09:37:31 -07:00
Philippe Tillet
a34c57402f [PYTHON] Improved error message for CPU (#167) 2021-07-30 09:47:27 -07:00
Reid Draper
2322d6df2a [CI] Update ptillet to openai (#152) 2021-07-29 11:39:50 -07:00
Philippe Tillet
4b9df06568 [CI] Bumped dev version to 1.0.1 and fixed permissions in documentation.yml (#149) 2021-07-28 04:35:14 -07:00
Philippe Tillet
acd5e44611 [GENERAL] Some minor improvements here and there to build systems and docs (#148) 2021-07-28 01:51:17 -07:00
Philippe Tillet
57c1fd3366 [BUILD] Now downloading LLVM from web if system does not have llvm-config-11 (#142) 2021-07-28 01:02:31 -07:00
Philippe Tillet
b253b77c71 [DOCS] Improved documentation and integration in CI (#139) 2021-07-27 12:38:49 -07:00
Philippe Tillet
76c6f24fb6 [CI] Made build-wheels compatible with system LLVM setup (#138)
This speeds up wheelhouse build time by ~10x
2021-07-27 12:38:49 -07:00
Philippe Tillet
8eb63bcb01 [CI] Various improvements to CI (#137)
Add clean-up before CI runs. Now using static LLVM-11 libraries from system rather than recompilation. Still no run-time LLVM dependencies
2021-07-27 12:38:49 -07:00
Philippe Tillet
298aead378 [FRONTEND] Fixed bugs in global symbols resolution of @triton.jit'd functions (#136) 2021-07-27 12:38:49 -07:00
Philippe Tillet
01276b5153 [FRONTEND] Added compilation flag to force use of .nc cache modifier (#134)
in DRAM loads. /!\ USE CAREFULLY - THIS CAN BREAK CORRECTNESS IF MISUSED
/!\
2021-07-27 12:38:49 -07:00
Philippe Tillet
2824345065 [LANGUAGE] Added cos/sin (#132) 2021-07-27 12:38:49 -07:00
Philippe Tillet
3169e4355c [PYTHON] Bugfix in trans_c for dsd in blocksparse matmul (#131) 2021-07-27 12:38:49 -07:00
Philippe Tillet
8cea583109 [IR] Preliminary support for BF16 (#129)
This PR adds a BF16 data-type, along with FP32 <-> BF16 conversion instructions in the LLVM codegen. Other kinds of ops on bfloat16 are not yet supported.
2021-07-27 12:38:49 -07:00
Philippe Tillet
9b4e2cae2d [PYTHON] Added isinstance to list of builtin symbols exposed (#128) 2021-07-27 12:38:49 -07:00
daadaada
d8d6b715c8 [CODEGEN] Performance improvement on A100 (#125)
Improved codegen for the Ampere GPUs.

    * Make the layout pass recognize the multistage pipelined pattern.
    * Now the pipeline pass can automate the multistage pipelining transformation.
    * Remove extra barriers (from the prefetch pass & WAR) on Ampere.
    * Update the code generator (generator.cc) to make Triton generate n-buffered shared memory loads/stores.
2021-07-27 12:38:49 -07:00
Philippe Tillet
b7b05a560e [DRIVER] Now giving the option to use system ptxas through environment variable (#123) 2021-07-27 12:38:49 -07:00
Philippe Tillet
0274429429 [IR] Added IR and Codegen support for atomic_rmw (#120) 2021-07-27 12:38:49 -07:00
Philippe Tillet
59b0ac672a [LANGUAGE] Added support for bitcast (#119) 2021-07-27 12:38:49 -07:00
Philippe Tillet
3ab121dbdb [PYTHON] Added support for tuples (#116) 2021-07-27 12:38:49 -07:00
Philippe Tillet
f81012a8cf [CODEGEN] Fixed atomic_add issue (#112)
* [CODEGEN] Fixed atomic_add issue

* [CODEGEN] Fixed liveness analysis bug for instructions that are not
DCE'd but have no users (e.g., atomic_cas)
2021-07-27 12:38:49 -07:00
Philippe Tillet
325ee38581 [PYTHON] Fixed bug in scoping mechanism (#111)
Inline functions didn't restore scope of parents. Also some control flow
structure still had the scoping semantics of C++
2021-07-27 12:38:49 -07:00
Philippe Tillet
9f30af76fb [GENERAL] Minor improvements: (#110)
* Load libcuda.so.1 if libcuda.so is not there. Error if both aren't
there.
* Support for multiple grad_to_none in triton.testing.do_bench
* Benchmark dataframe printed along with name
2021-07-27 12:38:49 -07:00
Philippe Tillet
288b4f7f58 [PYTHON] Added frontend to print sass using turingas disasm.py (#109) 2021-07-27 12:38:49 -07:00
Philippe Tillet
c91dd56a92 [CI] Made setup.py more backwards-compatible (#108) 2021-07-27 12:38:49 -07:00
Philippe Tillet
147675923e [triton-ops] Minor build improvements (#106) 2021-07-27 12:38:49 -07:00
Szymon Sidor
ac57812bdc [PYTHON] Update TensorWrapper with device attribute (#102)
bugfix `triton.reinterpret`
2021-07-27 12:38:49 -07:00
Philippe Tillet
840140bf26 [CODEGEN] Removed dedicated reassociate pass to merge it into LLVM isel (#101)
This massively simplifies implementation of `reassociate` and also fixes
a bunch of bug. The pass could still be improved, but can already be used
to generate constant pointer offsets in eg the matmul epilogue
2021-07-27 12:38:49 -07:00
Philippe Tillet
7355efa745 [LANG] Preliminary FP8 support (#96) 2021-07-27 12:38:49 -07:00
Philippe Tillet
4290be1ae8 [PYTHON] Various minor codegen fixes (#95) 2021-07-27 12:38:49 -07:00
Philippe Tillet
2b75158426 [PYTHON] Added atomic_add (#94) 2021-07-27 12:38:49 -07:00
Nora Belrose
d7f87929fa [PYTHON] Added shape & device checks for inputs to sparse matmul op (#93) 2021-07-27 12:38:49 -07:00
Philippe Tillet
bfc0a7587d [PYTHON] Renamed triton.core -> triton.language (#92) 2021-07-27 12:38:49 -07:00
Nora Belrose
41410012e8 [PYTHON] Fix crash in sparse matmul for tensors w/ less than 4 dims (#91) 2021-07-27 12:38:49 -07:00
daadaada
f6688372db [PYTHON] Allow triton.code_gen.Binary to print Triton-IR asm. (#89) 2021-07-27 12:38:49 -07:00
Nora Belrose
1112e2526e [PYTHON] Fix variable referenced before assignment error in blocksparse matmul backward (#90) 2021-07-27 12:38:49 -07:00
Philippe Tillet
d9112144b4 [PYTHON] Now triton.code_gen.Binary can print PTX and LLIR (#88) 2021-07-27 12:38:49 -07:00
Philippe Tillet
29e33e50b7 [DOCS] Updates and improvements (#87) 2021-07-27 12:38:49 -07:00
Philippe Tillet
39f4730305 Deprecation of Triton-C and Replacement by decorated Python functions (#86)
This PR implements a major overhaul of the frontend for Triton, and replaces Triton-C by a pure Python API in which kernels are defined as @triton.jit decorated functions. The documentation and tutorials have also been updated to accommodate these changes.

See documentations for more information on the new API
2021-07-27 12:38:49 -07:00