Commit Graph

10 Commits

Author SHA1 Message Date
Philippe Tillet
feeb1e9862 Feature: Merged kernel-fusion branch
* Fuses multiple AXPY kernel
* Possibility to add thread-wise for loops in AXPY-like kernels
2015-09-30 15:31:41 -04:00
Philippe Tillet
f5f2b78089 Backend: fixed nasty issue with int_t being int rather than long long 2015-08-27 19:08:54 -04:00
Philippe Tillet
db090d7942 Code quality: Large clean-up of the codebase and especially of the include/ folder 2015-08-06 12:05:12 -07:00
Philippe Tillet
0ef6654c5f Code quality: removed dependencies on the C++ OpenCL wrapper 2015-07-26 10:05:16 -07:00
Philippe Tillet
155554f5cf Code quality: added clBLAS.def and some ISAACAPI 2015-07-21 23:48:50 -07:00
Philippe Tillet
cd155cb9e3 Code quality: Improved compliance to MSVC 2015-07-21 17:18:50 -04:00
Philippe Tillet
cfa6ea812d Cleaning: Largely renamed templates to BLAS-like names 2015-07-11 11:21:15 -04:00
Philippe Tillet
0e207e7ca4 Backend: Now not creating a temporary upon C = alpha*dot(op(A), op(B)) + beta*C 2015-06-27 17:55:01 -07:00
Philippe Tillet
278109eef8 C++: Now using standard C++ types instead of stdint 2015-05-04 21:23:05 -04:00
Philippe Tillet
cf5028d55b Squashed feature branch:
* Added CUDA support
 * Performance improvements
 * API improvements
 * Added "depth" parameter to GEMM
 * Android cross-compilation
2015-04-29 15:52:21 -04:00