Commit Graph

5 Commits

Author SHA1 Message Date
Philippe Tillet
43723ccb95 [FRONTEND] Removed circular import that broke Python 3.6 support (#272) 2021-09-09 13:46:55 -07:00
Philippe Tillet
8cea583109 [IR] Preliminary support for BF16 (#129)
This PR adds a BF16 data-type, along with FP32 <-> BF16 conversion instructions in the LLVM codegen. Other kinds of ops on bfloat16 are not yet supported.
2021-07-27 12:38:49 -07:00
daadaada
d8d6b715c8 [CODEGEN] Performance improvement on A100 (#125)
Improved codegen for the Ampere GPUs.

    * Make the layout pass recognize the multistage pipelined pattern.
    * Now the pipeline pass can automate the multistage pipelining transformation.
    * Remove extra barriers (from the prefetch pass & WAR) on Ampere.
    * Update the code generator (generator.cc) to make Triton generate n-buffered shared memory loads/stores.
2021-07-27 12:38:49 -07:00
Philippe Tillet
38ab4e955a [CODEGEN] Bugfix in prefetch pass (#118) 2021-07-27 12:38:49 -07:00
daadaada
967e629c0c [CODEGEN] Add a pass to prefetch operands of dot if applicable. (#105)
* update membar pass when data is double buffered

* Add instruction prefetch_s

* prefetch tests pass (except the 1 warp case)

* Fix the 1-warp bug

* Add back prefetch files

* Disable prefetch on a100

* Always add war barrier on sm>=80
2021-07-27 12:38:49 -07:00