Commit Graph

472 Commits

Author SHA1 Message Date
Philippe Tillet
3cb77aa126 [README] Added "we're hiring!" with link to some of our blog posts (#180) 2021-08-02 16:46:26 -07:00
Xiangru Lian
9967e9d4b4 [DOCS] Fix fused softmax example script naive softmax implementation (#178) 2021-08-02 09:37:31 -07:00
Philippe Tillet
e8031fe61f [DRIVER] More robust support of unsupported CUDA version (#179) 2021-08-02 09:06:55 -07:00
milesial
b7cdf670c3 [DOCS] Fix related work (#172) 2021-08-01 11:06:37 -07:00
daadaada
c7060eadb2 [CODEGEN] Fix bug in auto-pipeline pass when a value depends on multiple phis (#164) 2021-07-31 23:40:36 -07:00
Philippe Tillet
c0bb895d9d [BUILD] More portable detection of terminfo (#173) 2021-07-31 17:09:49 -07:00
Philippe Tillet
a34c57402f [PYTHON] Improved error message for CPU (#167) 2021-07-30 09:47:27 -07:00
Ikko Ashimine
2293afece7 [README] GitHub format (#165)
Github -> GitHub
2021-07-30 09:47:08 -07:00
Philippe Tillet
cb5c280691 [DOCS] Added contributions section to README.md 2021-07-29 11:40:34 -07:00
Reid Draper
2322d6df2a [CI] Update ptillet to openai (#152) 2021-07-29 11:39:50 -07:00
Philippe Tillet
2f0f51be50 [DRIVER] No longer crashing when encountering CUDA version >11.4 2021-07-29 11:27:55 -07:00
Philippe Tillet
41ecd96300 [DOCS] minor grammar improvements 2021-07-28 14:18:31 -07:00
Avi Radinsky
d3851d8989 [DOCS] Typo fix (#151) 2021-07-28 12:07:12 -07:00
Philippe Tillet
4b9df06568 [CI] Bumped dev version to 1.0.1 and fixed permissions in documentation.yml (#149) 2021-07-28 04:35:14 -07:00
Philippe Tillet
046160b7f4 [README] Update Wheels badge URL v1.0 2021-07-28 02:04:41 -07:00
Philippe Tillet
acd5e44611 [GENERAL] Some minor improvements here and there to build systems and docs (#148) 2021-07-28 01:51:17 -07:00
Philippe Tillet
57c1fd3366 [BUILD] Now downloading LLVM from web if system does not have llvm-config-11 (#142) 2021-07-28 01:02:31 -07:00
Philippe Tillet
1365e96330 [CI] Fixup website build (#147) 2021-07-28 00:29:07 -07:00
Justin Jay Wang
8ddf909093 Add logo to README (#146) 2021-07-27 23:38:04 -07:00
Philippe Tillet
b736fdc740 [CI] More fixups (#145) 2021-07-27 22:14:51 -07:00
Philippe Tillet
1c48bd623e [CI] More bugfixes (#144) 2021-07-27 18:35:22 -07:00
Philippe Tillet
84521a5c82 [CI] Switch to Github Actions (#143) 2021-07-27 17:57:02 -07:00
Philippe Tillet
52d311f302 [CI] Updated build-website.yml (#141) 2021-07-27 12:38:49 -07:00
Philippe Tillet
bd70f10668 [CI] Added name to "Build Website" pipeline (#140) 2021-07-27 12:38:49 -07:00
Philippe Tillet
b253b77c71 [DOCS] Improved documentation and integration in CI (#139) 2021-07-27 12:38:49 -07:00
Philippe Tillet
76c6f24fb6 [CI] Made build-wheels compatible with system LLVM setup (#138)
This speeds up wheelhouse build time by ~10x
2021-07-27 12:38:49 -07:00
Philippe Tillet
8eb63bcb01 [CI] Various improvements to CI (#137)
Add clean-up before CI runs. Now using static LLVM-11 libraries from system rather than recompilation. Still no run-time LLVM dependencies
2021-07-27 12:38:49 -07:00
Philippe Tillet
298aead378 [FRONTEND] Fixed bugs in global symbols resolution of @triton.jit'd functions (#136) 2021-07-27 12:38:49 -07:00
Philippe Tillet
94ce6aa80f [DRIVER] Added support for CUDA 11.4 (#135) 2021-07-27 12:38:49 -07:00
Philippe Tillet
01276b5153 [FRONTEND] Added compilation flag to force use of .nc cache modifier (#134)
in DRAM loads. /!\ USE CAREFULLY - THIS CAN BREAK CORRECTNESS IF MISUSED
/!\
2021-07-27 12:38:49 -07:00
Philippe Tillet
2824345065 [LANGUAGE] Added cos/sin (#132) 2021-07-27 12:38:49 -07:00
Philippe Tillet
3169e4355c [PYTHON] Bugfix in trans_c for dsd in blocksparse matmul (#131) 2021-07-27 12:38:49 -07:00
Philippe Tillet
8cea583109 [IR] Preliminary support for BF16 (#129)
This PR adds a BF16 data-type, along with FP32 <-> BF16 conversion instructions in the LLVM codegen. Other kinds of ops on bfloat16 are not yet supported.
2021-07-27 12:38:49 -07:00
Philippe Tillet
9b4e2cae2d [PYTHON] Added isinstance to list of builtin symbols exposed (#128) 2021-07-27 12:38:49 -07:00
daadaada
0b05e06c0d cu_device::max_shared_memory() now returns max dynamic shared memory size (#127) 2021-07-27 12:38:49 -07:00
daadaada
d8d6b715c8 [CODEGEN] Performance improvement on A100 (#125)
Improved codegen for the Ampere GPUs.

    * Make the layout pass recognize the multistage pipelined pattern.
    * Now the pipeline pass can automate the multistage pipelining transformation.
    * Remove extra barriers (from the prefetch pass & WAR) on Ampere.
    * Update the code generator (generator.cc) to make Triton generate n-buffered shared memory loads/stores.
2021-07-27 12:38:49 -07:00
Philippe Tillet
5a51f3e529 [CODEGEN] Bugfix in membar pass (#124)
Membar pass on top of master is buggy with asynchronous copy. For example, it doesn't wait for asynchronous copies to complete before recoalescing accumulator in GEMM, which leads to undefined behavior when the program doesn't enter the loop. This PR proposes
2021-07-27 12:38:49 -07:00
Philippe Tillet
b7b05a560e [DRIVER] Now giving the option to use system ptxas through environment variable (#123) 2021-07-27 12:38:49 -07:00
Philippe Tillet
80c86ecf4a [LANG] Minor semantic changes (#121)
* Now using unordered instead of ordered float (fixes NaN issues)
* Bool -> int32 now converts to 1 rather than -1
* Reduce extend arguments to 32-bits if possible
2021-07-27 12:38:49 -07:00
Philippe Tillet
0274429429 [IR] Added IR and Codegen support for atomic_rmw (#120) 2021-07-27 12:38:49 -07:00
Philippe Tillet
59b0ac672a [LANGUAGE] Added support for bitcast (#119) 2021-07-27 12:38:49 -07:00
Philippe Tillet
38ab4e955a [CODEGEN] Bugfix in prefetch pass (#118) 2021-07-27 12:38:49 -07:00
Philippe Tillet
3ab121dbdb [PYTHON] Added support for tuples (#116) 2021-07-27 12:38:49 -07:00
Philippe Tillet
b5dcac484d [CODEGEN] Small bugfix in atomic-add (#114) 2021-07-27 12:38:49 -07:00
Philippe Tillet
f81012a8cf [CODEGEN] Fixed atomic_add issue (#112)
* [CODEGEN] Fixed atomic_add issue

* [CODEGEN] Fixed liveness analysis bug for instructions that are not
DCE'd but have no users (e.g., atomic_cas)
2021-07-27 12:38:49 -07:00
Philippe Tillet
325ee38581 [PYTHON] Fixed bug in scoping mechanism (#111)
Inline functions didn't restore scope of parents. Also some control flow
structure still had the scoping semantics of C++
2021-07-27 12:38:49 -07:00
Philippe Tillet
9f30af76fb [GENERAL] Minor improvements: (#110)
* Load libcuda.so.1 if libcuda.so is not there. Error if both aren't
there.
* Support for multiple grad_to_none in triton.testing.do_bench
* Benchmark dataframe printed along with name
2021-07-27 12:38:49 -07:00
Philippe Tillet
288b4f7f58 [PYTHON] Added frontend to print sass using turingas disasm.py (#109) 2021-07-27 12:38:49 -07:00
Philippe Tillet
c91dd56a92 [CI] Made setup.py more backwards-compatible (#108) 2021-07-27 12:38:49 -07:00
daadaada
840d65d8c6 [CODEGEN] Clean up visit_mma884 (#107) 2021-07-27 12:38:49 -07:00