Today's algorithms (e.g., image/video processing, hyperspectral sensor data, ...) require huge amounts of data. For many algorithms, a good computational performance is indispensable for use in practical applications. These applications are often targeted toward a big diversity of devices, such as desktop PCs, tablets, smartphones, mini PCs.
To reach a good computational performance, modern GPUs bring speedups of 10x-200x for highly parallel processing tasks, but one main disadvantage is the difficulty of programming: not only does (properly) programming a GPU require an extensive in-depth knowledge of the details of a GPU, the development efforts are usually high, which causes GPUs not easily to be used for research purposes, e.g., for devising and testing of new algorithms. Then, when CPUs and GPUs of different types and models are combined, the development and debugging complexity level further increases.
One of our concerns is that training a developer (in academia, sometimes a.k.a. "Ph.