Commit Graph

4 Commits

Author SHA1 Message Date
Philippe Tillet
db6bf71564 [CODEGEN] Bugfixes with FP32 async copy 2021-02-24 14:44:09 -05:00
Philippe Tillet
15f8e8c3b7 [CODEGEN] Major performance improvements on A100 (#70)
Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks
2021-02-21 18:19:39 -05:00
Jared Kaplan
b10e9b89e9 [PYTHON] Add Blocksparse Attention Fwd/Bwd Test (#69)
Also includes small bugfix for block-sparse softmax
2021-02-19 17:46:05 -05:00
Philippe Tillet
53fd9631ef [PYTHON] Added automated benchmark script (#63)
This adds a bench functionality to the setup.py that can be used to run the benchmark suite and generates a bunch of csv files (and optionally plots)

python setup.py bench
python setup.py bench --with-plots
python setup.py bench --filter=cross_entropy
2021-02-08 15:16:41 -05:00