If someone is enthusiastic for Kokkos, that could be done as well, but so far there are
- directive based (OpenMP offloading),
- language based (Python-numba in progress),
- "portable" kernel based (SYCL) implementations,
but no CUDA or HIP. So CUDA would be useful.
If someone is enthusiastic for Kokkos, that could be done as well, but so far there are
but no CUDA or HIP. So CUDA would be useful.