Philippe Tillet
9de87da993
GEMM: swapped loops in rC[m][n]*=alpha
2015-07-16 10:40:38 -04:00
Philippe Tillet
4b004e1cd5
GEMM: Added pointers declaration to the beginning of the kernel
2015-07-14 20:48:52 -07:00
Philippe Tillet
6a74eb3340
GEMM: moved declaration of shared memory to the beginning of the kernel
2015-07-14 20:41:34 -07:00
Philippe Tillet
8be02a50c3
GEMM: Cleaned generated GEMM code a little bit
2015-07-14 20:40:29 -07:00
Philippe Tillet
1257dda310
GEMM: Fixed typo
2015-07-10 23:16:21 -07:00
Philippe Tillet
2f106a9186
GEMM: Improved performance for cases other than NT
2015-07-10 21:15:36 -07:00
Philippe Tillet
cfa6ea812d
Cleaning: Largely renamed templates to BLAS-like names
2015-07-11 11:21:15 -04:00
Philippe Tillet
2b10363668
GEMM: More bugfixes
2015-07-10 16:05:28 -04:00
Philippe Tillet
e25dcf97ea
Bugfix in SIMD handling for other layouts
2015-07-10 16:05:28 -04:00
Philippe Tillet
47406a5e50
Implementing vector for other layouts
2015-07-09 20:07:44 -04:00
Philippe Tillet
347f4025f2
Cleaned up GEMM
2015-07-09 15:03:55 -04:00
Philippe Tillet
4ec061ceeb
More...
2015-07-09 13:32:32 -04:00
Philippe Tillet
931a403d81
More fix
2015-07-09 13:09:01 -04:00
Philippe Tillet
a676b15448
Fixup
2015-07-09 11:40:26 -04:00
Philippe Tillet
4e25e20206
More bounds checking
2015-07-09 10:52:54 -04:00
Philippe Tillet
b18442c220
Fixup
2015-07-07 23:39:17 -07:00
Philippe Tillet
bdd4ea05fd
Trying to further improve bounds checking
2015-07-08 22:37:57 -04:00
Philippe Tillet
4c123c4b38
Backend: GEMM - Improved bounds checking
2015-07-02 16:44:02 -04:00
Philippe Tillet
5c720a5b54
Backend: Fixed AXPY for shape=(1,x>1)
2015-07-01 11:48:01 -04:00
Philippe Tillet
9d0d50ba05
Backend: Fixed alpha, beta in GEMM.
2015-06-29 21:52:50 -07:00
Philippe Tillet
cf2dba43ef
Backend: A lot of bugfixes in dot() for handling shapes better
2015-06-30 17:55:57 -04:00
Philippe Tillet
e7cabf65ac
Tuning: Merged tune branch.
...
- Much cleaner and more concise source
- Better exceptions handling
- Checks local minima to see if retuning is needed.
Resolved conflicts:
bench/blas.cpp
include/isaac/backend/templates/mproduct.h
include/isaac/driver/buffer.h
lib/array.cpp
lib/backend/templates/mproduct.cpp
lib/driver/buffer.cpp
python/setup.py
tune/pysrc/autotune.py
tune/pysrc/dataset.py
tune/pysrc/misc_tools.py
2015-06-28 17:53:16 -07:00
Philippe Tillet
0e207e7ca4
Backend: Now not creating a temporary upon C = alpha*dot(op(A), op(B)) + beta*C
2015-06-27 17:55:01 -07:00
Philippe
743a559f76
Backend: Bugfix in GEMM bound-checking
2015-06-27 13:14:46 -04:00
Philippe Tillet
80bcbd095f
C++: Some renaming; added possibility to pass buffers when constructing arrays
2015-06-23 09:38:34 -07:00
Philippe Tillet
05e730f06e
CUDA: Many fixes in the backend
2015-05-13 02:26:38 -04:00
Philippe Tillet
3b983cf32f
CMake: some cleaning
2015-05-03 17:41:23 -04:00
Philippe Tillet
cf5028d55b
Squashed feature branch:
...
* Added CUDA support
* Performance improvements
* API improvements
* Added "depth" parameter to GEMM
* Android cross-compilation
2015-04-29 15:52:21 -04:00
Philippe Tillet
e453031094
More efficient access pattern in the GEMV kernel
2015-02-11 02:06:16 -05:00
Philippe Tillet
85b7eb8b5e
Added another parameter to GEMV
2015-02-10 16:33:38 -05:00
Philippe Tillet
37fc98c532
Fixed bug on marix-vector products with vectorization
2015-02-10 03:09:41 -05:00
Philippe Tillet
a6d7671831
removing C++11 interface
2015-02-08 23:19:38 -05:00
Philippe Tillet
e214927b16
Better control flow through options
2015-02-05 04:43:50 -05:00
Philippe Tillet
bbf2f0188e
Ported to C++11
2015-02-05 04:43:40 -05:00
Philippe Tillet
2afc574724
Implemented simple operation cache
2015-02-01 23:56:05 -05:00
Philippe Tillet
535706f35a
Some renaming; lower overhead in benchmark
2015-02-01 22:28:49 -05:00
Philippe Tillet
d29f1252ad
Clearer array_expression with hopefully lower overhead.
...
Also removed pyc's
2015-01-31 22:01:48 -05:00
Philippe Tillet
f488274269
Fixed relational operators tests
2015-01-29 16:01:46 -05:00
Philippe Tillet
d4629ba018
Bugfix in cast and relational operators
2015-01-29 02:50:51 -05:00
Philippe Tillet
c7665021d1
reducing overhead; reverted custom CL/ header because CL/cl.hpp was buggy
2015-01-28 23:04:19 -05:00
Philippe Tillet
0dcf4d3617
Better exception handling, lowered CMake requirement ; blas-bench now benchmarks square matrices
2015-01-28 17:08:39 -05:00
Philippe Tillet
c37d8a2a81
Now using system CL include
2015-01-27 16:19:50 -05:00
Philippe Tillet
a96c897cb3
Various fixes
2015-01-27 02:41:27 -05:00
Philippe Tillet
4a9e16fefd
various bugfixes
2015-01-25 01:08:18 -05:00
Philippe Tillet
e74563070a
API enhancement
2015-01-20 11:17:42 -05:00
Philippe Tillet
4f73fb384f
More flexibility in scalars
2015-01-19 21:29:47 -05:00
Philippe Tillet
6dd3b20ace
Bugfix in boundchecking for fallback GEMM
2015-01-19 14:37:33 -05:00
Philippe Tillet
edaa821d93
low level representation of array
2015-01-18 16:53:34 -05:00
Philippe Tillet
16648f18e0
various changes
2015-01-17 15:47:52 -05:00
Philippe Tillet
0068560bc6
Some cleaning + outer product
2015-01-17 10:49:36 -05:00