Commit Graph

431 Commits

Author SHA1 Message Date
Philippe Tillet
dfbe52c20a Driver: now ignore CUDA_ERROR_DEINITIALIZED in the destructor of CUDA C++ object.
This should be harmless. ISAAC deinitializes CUDA at the very end, but external libraries may deinitialize it beforehands.
2015-11-27 02:09:15 -05:00
Philippe Tillet
c0b9bbee43 cuBLAS: fixed CUDA context import 2015-11-26 21:09:34 -05:00
Philippe Tillet
6fc94c0c0b Kernels: Fixed various corner cases for the kernel templates and BLAS 2015-11-26 19:49:44 -05:00
Philippe Tillet
6be5929b0d Core: fixed handle wrapping for CUcontext 2015-11-21 13:57:05 -05:00
Philippe Tillet
f653625aa9 C API: added symbols for cublas_v2 2015-11-20 22:46:52 -05:00
Philippe Tillet
c6333c993a API: adding cuBLAS interface 2015-11-20 12:46:42 -05:00
Philippe Tillet
da1b0a9571 GEMM: performance regression fix 2015-11-19 20:49:38 -05:00
Philippe Tillet
bc20cc1ed7 Python: updated wrapper to match C++ API 2015-11-19 19:22:11 -05:00
Philippe Tillet
a843477438 CMake: removed legacy debug code 2015-11-19 18:33:38 -05:00
Philippe Tillet
e2cdb88338 Core: included bugfixes from the SVD branch 2015-11-19 12:37:18 -05:00
Philippe Tillet
ce07e490f6 Examples: polished tutorial 2015-10-08 20:43:04 -04:00
Philippe Tillet
714e0f5634 API: Fixed single-element indexing 2015-10-07 01:13:55 -04:00
Philippe Tillet
2648724217 API: diag() now usable as lvalue 2015-10-07 00:50:49 -04:00
Philippe Tillet
07b8ba20de API: some fixes with 1D slices 2015-10-06 16:34:47 -04:00
Philippe Tillet
8daf13da2e Code quality: some renaming here and there 2015-10-05 14:35:46 -04:00
Philippe Tillet
3e4f147fbc Code quality: removed ambiguous overload 2015-10-04 17:31:39 -04:00
Philippe Tillet
d97250bce5 API: removed explicit constructors for math expressions 2015-10-04 17:08:44 -04:00
Philippe Tillet
07e7bd862c API: added diag(matrix) 2015-10-04 17:05:06 -04:00
Philippe Tillet
740f5def49 API: polished slice construction 2015-10-03 19:30:50 -04:00
Philippe Tillet
b5100f9d9a API: Added shallow-copiable view object for viewing slices of arrays. 2015-10-03 18:51:02 -04:00
Philippe Tillet
1e076c131b API: clearer interface for transposition 2015-10-01 21:58:59 -04:00
Philippe Tillet
feeb1e9862 Feature: Merged kernel-fusion branch
* Fuses multiple AXPY kernel
* Possibility to add thread-wise for loops in AXPY-like kernels
2015-09-30 15:31:41 -04:00
Philippe Tillet
149441b9e2 Bench: improved output formatting 2015-08-31 13:35:29 -04:00
Philippe Tillet
836a955663 GEMV: bugfix with CUDA 2015-08-30 02:35:55 -04:00
Philippe Tillet
b8f3e08c68 Tune: no longer pruning Y, profiles at each iteration 2015-08-28 22:34:44 -04:00
Philippe Tillet
caf711a71c Tuner: added check for android presence 2015-08-28 22:31:55 -04:00
Philippe Tillet
b5a468a40a Tuner: more bugfixes 2015-08-28 15:38:21 -04:00
Philippe Tillet
c4788ec925 Tune: now pruning unnecessary data at each iteration 2015-08-28 15:02:54 -04:00
Philippe Tillet
3b9b80309c Tune: fixed problem with linebreaks 2015-08-28 14:43:34 -04:00
Philippe Tillet
beb32f8412 Tune: better formatting 2015-08-28 14:36:09 -04:00
Philippe Tillet
1e77703f7f Android: various fixes 2015-08-28 13:48:54 -04:00
Philippe Tillet
3fa8f3a480 Tuner: better formating 2015-08-28 12:16:22 -04:00
Philippe Tillet
922ae52846 Tuner: added DOT and GER in CLI 2015-08-28 09:59:47 -04:00
Philippe Tillet
f5d3d71d94 Tune: added progress bar on android 2015-08-28 02:05:53 -04:00
Philippe Tillet
222ea4aecf Tune: misc. cleaning 2015-08-27 22:56:05 -04:00
Philippe Tillet
53dcbfa1e0 Kernels [GEMM]: restored vector types on CUDA 2015-08-27 22:55:38 -04:00
Philippe Tillet
8dcf062342 Benchmarks: added consistency between CUDA and the rest 2015-08-27 22:55:20 -04:00
Philippe Tillet
426ba27d8b Python: now ships vector.cu's string-header 2015-08-27 20:28:30 -04:00
Philippe Tillet
de159ca829 Python: fixed minor error in kernels.cpp 2015-08-27 20:27:14 -04:00
Philippe Tillet
c3c5b48b24 Tune: more pretty-printing 2015-08-27 20:25:03 -04:00
Philippe Tillet
b6333c3a6e Tuner: Now pretty-printing progress bar on command line 2015-08-27 20:25:02 -04:00
Philippe Tillet
6676b94d00 Bench: no longer reallocating memory for CUDA. 2015-08-27 19:09:22 -04:00
Philippe Tillet
f5f2b78089 Backend: fixed nasty issue with int_t being int rather than long long 2015-08-27 19:08:54 -04:00
Philippe Tillet
eb330cad3a Benchmark: no longer using nvcc for CUDA benchmark.
Don't know why I ever felt the need to use it in first place...
2015-08-26 22:16:21 -04:00
Philippe Tillet
f06a3bdf53 Bugfix: fixed bug in dynamic kernel selection 2015-08-26 19:11:09 -04:00
Philippe Tillet
ffb3c01b77 Code quality: fixed typo 2015-08-26 14:24:12 -04:00
Philippe Tillet
69c11d16cc Code quality: bugfix in bench/test to note call clBLAS on CUDA backend 2015-08-26 14:12:50 -04:00
Philippe Tillet
9da87bee51 Driver: fixed up invalid option for nvrtc 2015-08-26 13:44:40 -04:00
Philippe Tillet
0d3fcb18dc Driver: now using proper compute capability option in nvrtc ; added missing file. 2015-08-26 13:31:58 -04:00
Philippe Tillet
0ce345f14a Driver: more standard conforming way of casting symbol to function 2015-08-26 11:40:24 -04:00