Philippe Tillet
|
feeb1e9862
|
Feature: Merged kernel-fusion branch
* Fuses multiple AXPY kernel
* Possibility to add thread-wise for loops in AXPY-like kernels
|
2015-09-30 15:31:41 -04:00 |
|
Philippe Tillet
|
f5f2b78089
|
Backend: fixed nasty issue with int_t being int rather than long long
|
2015-08-27 19:08:54 -04:00 |
|
Philippe Tillet
|
db090d7942
|
Code quality: Large clean-up of the codebase and especially of the include/ folder
|
2015-08-06 12:05:12 -07:00 |
|
Philippe Tillet
|
0ef6654c5f
|
Code quality: removed dependencies on the C++ OpenCL wrapper
|
2015-07-26 10:05:16 -07:00 |
|
Philippe Tillet
|
155554f5cf
|
Code quality: added clBLAS.def and some ISAACAPI
|
2015-07-21 23:48:50 -07:00 |
|
Philippe Tillet
|
cd155cb9e3
|
Code quality: Improved compliance to MSVC
|
2015-07-21 17:18:50 -04:00 |
|
Philippe Tillet
|
cfa6ea812d
|
Cleaning: Largely renamed templates to BLAS-like names
|
2015-07-11 11:21:15 -04:00 |
|
Philippe Tillet
|
0e207e7ca4
|
Backend: Now not creating a temporary upon C = alpha*dot(op(A), op(B)) + beta*C
|
2015-06-27 17:55:01 -07:00 |
|
Philippe Tillet
|
278109eef8
|
C++: Now using standard C++ types instead of stdint
|
2015-05-04 21:23:05 -04:00 |
|
Philippe Tillet
|
cf5028d55b
|
Squashed feature branch:
* Added CUDA support
* Performance improvements
* API improvements
* Added "depth" parameter to GEMM
* Android cross-compilation
|
2015-04-29 15:52:21 -04:00 |
|