triton

Files

Natalia Gimelshein 0d7e753227 [TESTING] use torch.int for autotuning cache (#840 )

For stupid reasons, ops on int8 are 3 times slower than on int, and for
another set of stupid reasons we are not using cudaMemset for `zero_`,
so using `int8` buffer in `do_bench` makes it slow.

Co-authored-by: Philippe Tillet <phil@openai.com>

2022-11-04 18:05:16 -07:00

test_performance.py

[TESTING] use torch.int for autotuning cache (#840 )

2022-11-04 18:05:16 -07:00