* Added CUDA support * Performance improvements * API improvements * Added "depth" parameter to GEMM * Android cross-compilation