Philippe Tillet
|
ffb9548b6a
|
Runtime: More progress towards cuBLAS integration
|
2016-10-04 01:02:43 -04:00 |
|
Philippe Tillet
|
fb9669a34d
|
Python: fixed compilation error
|
2016-10-03 22:22:38 -04:00 |
|
Philippe Tillet
|
f1a636f83f
|
GEMM: Added skeleton for cuBLAS GEMM calls
|
2016-10-03 21:26:05 -04:00 |
|
Philippe Tillet
|
889b4cffdf
|
Code Quality: renamed base_impl -> parameterized_base
|
2016-10-03 14:12:57 -04:00 |
|
Philippe Tillet
|
1852ddef72
|
Further fixes
|
2016-10-03 03:24:49 -04:00 |
|
Philippe Tillet
|
31849794e8
|
Python: Fixed wrapper issues induced after cleaning
|
2016-10-03 02:23:20 -04:00 |
|
Philippe Tillet
|
a26582d34b
|
More cleaning
|
2016-10-02 20:21:38 -04:00 |
|
Philippe Tillet
|
e1baf85707
|
Code quality: removed obsolete/dead code
|
2016-10-01 19:27:42 -04:00 |
|
Philippe Tillet
|
b514638d86
|
Bench: re-order GEMM-bench order (Putting DeepBench on top as it's most relevant)
|
2016-09-30 01:21:41 -04:00 |
|
Philippe Tillet
|
5d0e29db1f
|
Bench: Fixed CUDA synchronization issue
|
2016-09-30 01:21:24 -04:00 |
|
Philippe Tillet
|
fa4cb6866d
|
Bench: Now displaying results in a table
|
2016-09-29 14:50:42 -04:00 |
|
Philippe Tillet
|
5178ba06f9
|
Python: fixed compilation issues
|
2016-08-13 09:41:04 -07:00 |
|
Philippe Tillet
|
258dd76eda
|
Python: fixed compilation errors for the binding
|
2016-05-19 01:14:51 -04:00 |
|
Philippe Tillet
|
1e439ad5bc
|
JIT: No longer using fallbacks for stride[0] > 1
It was pretty messy.
|
2016-04-10 16:31:29 -04:00 |
|
Philippe Tillet
|
6bc5d9e1cb
|
Python: fixed compilation issues
|
2016-04-10 15:41:55 -04:00 |
|
Philippe Tillet
|
97a0d65a4d
|
Code quality: reorganized files structure
|
2016-04-10 13:13:16 -04:00 |
|
Philippe Tillet
|
509c496b2e
|
Bugfix: Typo fix in dot() API function
|
2016-04-08 01:12:13 -04:00 |
|
Philippe Tillet
|
7f77fba4d4
|
General: Internal code generator overhaul
|
2016-04-02 18:19:33 -04:00 |
|
Philippe Tillet
|
b322fe3942
|
Python: minor bugfix in vector conversion
|
2015-12-22 16:44:23 -05:00 |
|
Philippe Tillet
|
6623116372
|
Licensing: added blank line after license text
|
2015-12-21 17:04:09 -05:00 |
|
Philippe Tillet
|
0d09b0518f
|
API: more consistent zeros() initializer
|
2015-12-21 03:33:13 -05:00 |
|
Philippe Tillet
|
da43f89ea4
|
Profiles: reorganized database
|
2015-12-21 02:43:04 -05:00 |
|
Philippe Tillet
|
b5fc058b21
|
Profiles: added hawaii (GCN 1.1)
|
2015-12-20 05:23:44 +01:00 |
|
Philippe Tillet
|
ebbb6dd18e
|
LICENSING: added license headers ; polished files hierarchy
|
2015-12-19 21:43:05 -05:00 |
|
Philippe Tillet
|
d9eb51d04a
|
Code quality: renamed math_expression -> expression_tree
|
2015-12-19 03:29:51 -05:00 |
|
Philippe Tillet
|
b6d596d26d
|
Code quality: renamed expression types
|
2015-12-19 01:37:58 -05:00 |
|
Philippe Tillet
|
acd460402d
|
Kernels/REDUCE: added temporary workspace information
|
2015-12-18 18:14:29 -05:00 |
|
Philippe Tillet
|
373771c796
|
Tuner: more polishing of intermediate BLAS3 sizes
|
2015-12-18 03:38:01 -05:00 |
|
Philippe Tillet
|
b89dc9a9ea
|
Code Quality: More renaming
|
2015-12-16 18:19:33 -05:00 |
|
Philippe Tillet
|
83feed534c
|
Code quality: more renaming
|
2015-12-16 16:34:36 -05:00 |
|
Philippe Tillet
|
042aa070bb
|
Code Quality: More sensible names
|
2015-12-12 21:19:59 -05:00 |
|
Philippe Tillet
|
46dad59e10
|
Tests: Fixed typos and polished test names
|
2015-12-12 13:31:14 -05:00 |
|
Philippe Tillet
|
b3c5251f91
|
CMake: Fixed clBLAS handling
|
2015-12-12 01:29:08 -05:00 |
|
Philippe Tillet
|
386963a6cc
|
Core: added queue-wise temporary workspace. WARNING: breaks the fused computation of multiple DOT/GEMV operations
|
2015-11-27 18:43:46 -05:00 |
|
Philippe Tillet
|
c6333c993a
|
API: adding cuBLAS interface
|
2015-11-20 12:46:42 -05:00 |
|
Philippe Tillet
|
bc20cc1ed7
|
Python: updated wrapper to match C++ API
|
2015-11-19 19:22:11 -05:00 |
|
Philippe Tillet
|
a843477438
|
CMake: removed legacy debug code
|
2015-11-19 18:33:38 -05:00 |
|
Philippe Tillet
|
e2cdb88338
|
Core: included bugfixes from the SVD branch
|
2015-11-19 12:37:18 -05:00 |
|
Philippe Tillet
|
ce07e490f6
|
Examples: polished tutorial
|
2015-10-08 20:43:04 -04:00 |
|
Philippe Tillet
|
1e076c131b
|
API: clearer interface for transposition
|
2015-10-01 21:58:59 -04:00 |
|
Philippe Tillet
|
feeb1e9862
|
Feature: Merged kernel-fusion branch
* Fuses multiple AXPY kernel
* Possibility to add thread-wise for loops in AXPY-like kernels
|
2015-09-30 15:31:41 -04:00 |
|
Philippe Tillet
|
1e77703f7f
|
Android: various fixes
|
2015-08-28 13:48:54 -04:00 |
|
Philippe Tillet
|
de159ca829
|
Python: fixed minor error in kernels.cpp
|
2015-08-27 20:27:14 -04:00 |
|
Philippe Tillet
|
69c11d16cc
|
Code quality: bugfix in bench/test to note call clBLAS on CUDA backend
|
2015-08-26 14:12:50 -04:00 |
|
Philippe Tillet
|
5d8a092ed8
|
Code quality: removed dead code related to obsolete static backend selection
|
2015-08-25 23:51:54 -04:00 |
|
Philippe Tillet
|
7b77d5ae4b
|
Driver: bugfixes in CUDA dynamic loading
|
2015-08-25 19:12:02 -04:00 |
|
Philippe Tillet
|
67a35a62bd
|
Driver: now loading the backend dynamically on Linux
|
2015-08-25 17:06:51 -04:00 |
|
Philippe Tillet
|
95f2564c1a
|
Tuning: Android UI improvement
|
2015-08-24 23:03:37 -04:00 |
|
Philippe Tillet
|
10524ebdee
|
CUDA: various improvements
|
2015-08-24 17:03:31 -04:00 |
|
Philippe Tillet
|
33dac6b05a
|
Code quality: fixed compilation errors with CUDA
|
2015-08-20 21:24:41 -04:00 |
|