* Simplified `triton.kernel` API to achieve lower latency:
> .data_ptr() must now be passed as kernel argument. No more implicit
conversion from torch.tensor
> compilation options are now constant attributes, i.e., opt.d('VAR')
becomes opt.VAR
> torch.device must now be passed explicitly to triton.kernel (no
longer inferred from torch.tensor arguments)
* C++ tests moved to `python/tests/`
* C++ tutorial created in `tutorials/`
* Python tutorial created in python/tutorials/
* Version changed to 1.0alpha
* No longer copying C++ headers into the Python package
* added python/triton/ops/ package for pre-written Triton ops
7 lines
227 B
CMake
7 lines
227 B
CMake
foreach(PROG 01-matmul)
|
|
set(TARGET ${PROG})
|
|
add_executable(${TARGET} ${PROG}.cc)
|
|
set_target_properties(${TARGET} PROPERTIES OUTPUT_NAME ${TARGET})
|
|
target_link_libraries(${TARGET} triton dl)
|
|
endforeach(PROG)
|