Philippe Tillet
|
dfbe52c20a
|
Driver: now ignore CUDA_ERROR_DEINITIALIZED in the destructor of CUDA C++ object.
This should be harmless. ISAAC deinitializes CUDA at the very end, but external libraries may deinitialize it beforehands.
|
2015-11-27 02:09:15 -05:00 |
|
Philippe Tillet
|
c0b9bbee43
|
cuBLAS: fixed CUDA context import
|
2015-11-26 21:09:34 -05:00 |
|
Philippe Tillet
|
6fc94c0c0b
|
Kernels: Fixed various corner cases for the kernel templates and BLAS
|
2015-11-26 19:49:44 -05:00 |
|
Philippe Tillet
|
6be5929b0d
|
Core: fixed handle wrapping for CUcontext
|
2015-11-21 13:57:05 -05:00 |
|
Philippe Tillet
|
f653625aa9
|
C API: added symbols for cublas_v2
|
2015-11-20 22:46:52 -05:00 |
|
Philippe Tillet
|
c6333c993a
|
API: adding cuBLAS interface
|
2015-11-20 12:46:42 -05:00 |
|
Philippe Tillet
|
da1b0a9571
|
GEMM: performance regression fix
|
2015-11-19 20:49:38 -05:00 |
|
Philippe Tillet
|
bc20cc1ed7
|
Python: updated wrapper to match C++ API
|
2015-11-19 19:22:11 -05:00 |
|
Philippe Tillet
|
a843477438
|
CMake: removed legacy debug code
|
2015-11-19 18:33:38 -05:00 |
|
Philippe Tillet
|
e2cdb88338
|
Core: included bugfixes from the SVD branch
|
2015-11-19 12:37:18 -05:00 |
|
Philippe Tillet
|
ce07e490f6
|
Examples: polished tutorial
|
2015-10-08 20:43:04 -04:00 |
|
Philippe Tillet
|
714e0f5634
|
API: Fixed single-element indexing
|
2015-10-07 01:13:55 -04:00 |
|
Philippe Tillet
|
2648724217
|
API: diag() now usable as lvalue
|
2015-10-07 00:50:49 -04:00 |
|
Philippe Tillet
|
07b8ba20de
|
API: some fixes with 1D slices
|
2015-10-06 16:34:47 -04:00 |
|
Philippe Tillet
|
8daf13da2e
|
Code quality: some renaming here and there
|
2015-10-05 14:35:46 -04:00 |
|
Philippe Tillet
|
3e4f147fbc
|
Code quality: removed ambiguous overload
|
2015-10-04 17:31:39 -04:00 |
|
Philippe Tillet
|
d97250bce5
|
API: removed explicit constructors for math expressions
|
2015-10-04 17:08:44 -04:00 |
|
Philippe Tillet
|
07e7bd862c
|
API: added diag(matrix)
|
2015-10-04 17:05:06 -04:00 |
|
Philippe Tillet
|
740f5def49
|
API: polished slice construction
|
2015-10-03 19:30:50 -04:00 |
|
Philippe Tillet
|
b5100f9d9a
|
API: Added shallow-copiable view object for viewing slices of arrays.
|
2015-10-03 18:51:02 -04:00 |
|
Philippe Tillet
|
1e076c131b
|
API: clearer interface for transposition
|
2015-10-01 21:58:59 -04:00 |
|
Philippe Tillet
|
feeb1e9862
|
Feature: Merged kernel-fusion branch
* Fuses multiple AXPY kernel
* Possibility to add thread-wise for loops in AXPY-like kernels
|
2015-09-30 15:31:41 -04:00 |
|
Philippe Tillet
|
149441b9e2
|
Bench: improved output formatting
|
2015-08-31 13:35:29 -04:00 |
|
Philippe Tillet
|
836a955663
|
GEMV: bugfix with CUDA
|
2015-08-30 02:35:55 -04:00 |
|
Philippe Tillet
|
b8f3e08c68
|
Tune: no longer pruning Y, profiles at each iteration
|
2015-08-28 22:34:44 -04:00 |
|
Philippe Tillet
|
caf711a71c
|
Tuner: added check for android presence
|
2015-08-28 22:31:55 -04:00 |
|
Philippe Tillet
|
b5a468a40a
|
Tuner: more bugfixes
|
2015-08-28 15:38:21 -04:00 |
|
Philippe Tillet
|
c4788ec925
|
Tune: now pruning unnecessary data at each iteration
|
2015-08-28 15:02:54 -04:00 |
|
Philippe Tillet
|
3b9b80309c
|
Tune: fixed problem with linebreaks
|
2015-08-28 14:43:34 -04:00 |
|
Philippe Tillet
|
beb32f8412
|
Tune: better formatting
|
2015-08-28 14:36:09 -04:00 |
|
Philippe Tillet
|
1e77703f7f
|
Android: various fixes
|
2015-08-28 13:48:54 -04:00 |
|
Philippe Tillet
|
3fa8f3a480
|
Tuner: better formating
|
2015-08-28 12:16:22 -04:00 |
|
Philippe Tillet
|
922ae52846
|
Tuner: added DOT and GER in CLI
|
2015-08-28 09:59:47 -04:00 |
|
Philippe Tillet
|
f5d3d71d94
|
Tune: added progress bar on android
|
2015-08-28 02:05:53 -04:00 |
|
Philippe Tillet
|
222ea4aecf
|
Tune: misc. cleaning
|
2015-08-27 22:56:05 -04:00 |
|
Philippe Tillet
|
53dcbfa1e0
|
Kernels [GEMM]: restored vector types on CUDA
|
2015-08-27 22:55:38 -04:00 |
|
Philippe Tillet
|
8dcf062342
|
Benchmarks: added consistency between CUDA and the rest
|
2015-08-27 22:55:20 -04:00 |
|
Philippe Tillet
|
426ba27d8b
|
Python: now ships vector.cu's string-header
|
2015-08-27 20:28:30 -04:00 |
|
Philippe Tillet
|
de159ca829
|
Python: fixed minor error in kernels.cpp
|
2015-08-27 20:27:14 -04:00 |
|
Philippe Tillet
|
c3c5b48b24
|
Tune: more pretty-printing
|
2015-08-27 20:25:03 -04:00 |
|
Philippe Tillet
|
b6333c3a6e
|
Tuner: Now pretty-printing progress bar on command line
|
2015-08-27 20:25:02 -04:00 |
|
Philippe Tillet
|
6676b94d00
|
Bench: no longer reallocating memory for CUDA.
|
2015-08-27 19:09:22 -04:00 |
|
Philippe Tillet
|
f5f2b78089
|
Backend: fixed nasty issue with int_t being int rather than long long
|
2015-08-27 19:08:54 -04:00 |
|
Philippe Tillet
|
eb330cad3a
|
Benchmark: no longer using nvcc for CUDA benchmark.
Don't know why I ever felt the need to use it in first place...
|
2015-08-26 22:16:21 -04:00 |
|
Philippe Tillet
|
f06a3bdf53
|
Bugfix: fixed bug in dynamic kernel selection
|
2015-08-26 19:11:09 -04:00 |
|
Philippe Tillet
|
ffb3c01b77
|
Code quality: fixed typo
|
2015-08-26 14:24:12 -04:00 |
|
Philippe Tillet
|
69c11d16cc
|
Code quality: bugfix in bench/test to note call clBLAS on CUDA backend
|
2015-08-26 14:12:50 -04:00 |
|
Philippe Tillet
|
9da87bee51
|
Driver: fixed up invalid option for nvrtc
|
2015-08-26 13:44:40 -04:00 |
|
Philippe Tillet
|
0d3fcb18dc
|
Driver: now using proper compute capability option in nvrtc ; added missing file.
|
2015-08-26 13:31:58 -04:00 |
|
Philippe Tillet
|
0ce345f14a
|
Driver: more standard conforming way of casting symbol to function
|
2015-08-26 11:40:24 -04:00 |
|