Commit Graph

4 Commits

Author SHA1 Message Date
Philippe Tillet
567a1a3d17 [CODEGEN] Bugfixes with FP32 async copy 2021-07-27 12:38:49 -07:00
Philippe Tillet
5b83259592 [CODEGEN] Major performance improvements on A100 (#70)
Improved handling of asynchronous copy, scheduling and synchronization for A100. Now achieving CUTLASS-like performance on large square dense matrix multiplication tasks
2021-07-27 12:38:49 -07:00
Philippe Tillet
ce8aa2a41a [CI] Added benchmarking to CI script (#65) 2021-07-27 12:38:49 -07:00
Philippe Tillet
5e3c7f5a60 [PYTHON] Added automated benchmark script (#63)
This adds a bench functionality to the setup.py that can be used to run the benchmark suite and generates a bunch of csv files (and optionally plots)

python setup.py bench
python setup.py bench --with-plots
python setup.py bench --filter=cross_entropy
2021-07-27 12:38:48 -07:00