CUDA - Device Functions Across Modules

Device functions are useful to implement common CPU and GPU routines once in order to later use them from different other kernel or device functions (also see the function overview).

An example is the sinc function in system.q:

sinc = __device__ (x : scalar) -> (x == 0.0) ? 1.0 : sin(pi*x)/(pi*x)

By this definition, the sinc function can be used on scalar numbers from both host functions as kernel/device functions.

However, when a device function is defined in one module and used in an another module, there is one problem for the CUDA engine. The compiler will give the following error:

Cannot currently access device function 'sinc' defined in 
'system.q' from 'foo.q'. The reason is that CUDA 4.2 does not 
support static linking so device functions must be defined in the
same compilation unit.

By default, __device__ functions are statically linked (in C/C++ linker terminology). However, CUDA modules are standalone, which makes it impossible to refer from device functions in one module to another module.

There are however two work-arounds:

the first work-around is to define the function using a lambda expression, by making sure that function inlining is enabled (the compiler setting COMPILER_LAMBDAEXPRESSION_INLINING should have value OnlySuitable or Always).

This way, the function will be expanded inline by the Quasar compiler and the problem is avoided.
the second work-around is to use function pointers to prevent static linking. Obviously, this has an impact on performance, therefore the compiler setting COMPILER_USEFUNCTIONPOINTERS needs to be set to Always (default value = SmartlyAvoid).

If possible, try to define the functions in such a way that device functions are only referred to from the module in which they are defined. If this is not possible/preferred, use work-around #1 (i.e. define the function as a lambda expression and enable automatic function inlining).