* Simplified `triton.kernel` API to achieve lower latency: > .data_ptr() must now be passed as kernel argument. No more implicit conversion from torch.tensor > compilation options are now constant attributes, i.e., opt.d('VAR') becomes opt.VAR > torch.device must now be passed explicitly to triton.kernel (no longer inferred from torch.tensor arguments) * C++ tests moved to `python/tests/` * C++ tutorial created in `tutorials/` * Python tutorial created in python/tutorials/ * Version changed to 1.0alpha * No longer copying C++ headers into the Python package * added python/triton/ops/ package for pre-written Triton ops
7 lines
227 B
CMake
7 lines
227 B
CMake
foreach(PROG 01-matmul)
|
|
set(TARGET ${PROG})
|
|
add_executable(${TARGET} ${PROG}.cc)
|
|
set_target_properties(${TARGET} PROPERTIES OUTPUT_NAME ${TARGET})
|
|
target_link_libraries(${TARGET} triton dl)
|
|
endforeach(PROG)
|