Prev Chapter 14: Built-in function quick reference Up Main page Chapter 16: The Quasar runtime system Next

15 Functional image processing in Quasar

In this section we will highlight some functional image processing abilities of Quasar. Through the combination of generics, automatic function inlining and avoidance of function pointers, Quasar permits practical functional programming of image processing algorithms, while at the same time exploiting the parallelism and generating efficient CPU/GPU code. As a starting point, we will consider all values (e.g. scalars, integers, vectors) to be application of functions. These functions can be constant functions, functions with no parameters etc. Rather than passing values to functions, functions themselves are used as parameters for other functions. This allows the caller to choose whether a constant or a non-constant input is used. Moreover, intermediate results can be calculated only when needed, rather than precalculated. Additionally, in some cases, it is more efficient to evaluate functions more than once (e.g., because the memory cost involved with storing the intermediate results is much higher than the calculation time of these results).

The concept of a (2D) image can be made explicit by a function type definition:

type image : [__device__ ivec2’const -> vec3]

Here we say that an image is a function that maps from a two-dimensional integer (ivec2) domain to a three-dimensional (say, RGB) domain. Note that the types ivec2 and vec3 are short-hand notation for vec[int](2) and vec[scalar](3).
Similarly, the concept of an algorithm can be defined as:

type algorithm : [__device__ (image, ivec2’const) -> vec3]

The algorithm takes an image and a 2D position as an input and calculates the components of the output at that position. Note that it would be more easy to define type algorithm : [__device__ (image) -> image], which is also viable. However, the algorithm definition incorporating a position vector allows to separate the computational approach (e.g., pixel-based parallel, block-based parallel, etc.) from the algorithm definition. We can then design additional infrastructure for this task, by the following abstraction:

type application : [(image, algorithm) -> image]

This can be read as: an application uses an algorithm to convert an image to another image.

type color_function : [__device__ vec3 -> vec3]

A rasterizer samples an image and converts it back to a cube:

type rasterizer : [(image, ivec3) -> cube]

The function make_image converts an image (of type cube) into a function variable (of type image):

make_image : [cube -> image] =
im_data -> __device__ (pos) -> im_data[pos[0], pos[1], 0..2]

By declaring the resulting function as a device function (__device__) we can ensure that native device-specific code is generated for the function. But there is more: the Quasar compiler performs automatic function inlining and function pointer reduction, so that there is often no performance loss experienced due to the use of function variables. This allows algorithms to be specified in a flexible and generic way.

We can define a ‘casting’ operator to call the function make_image directly.

reduction (x : cube) -> image(x) = make_image(x)

Similarly, we can write a function to perform a point-wise operation on an image:

pointwise_op : [color_function -> algorithm] =
fn -> __device__ (x, pos) -> fn(x(pos))

The pointwise operation will apply a given color processing function (color_function) to every pixel in the image.

Similarly, we can define an operation that translates the image:

translate : [vec2 -> algorithm] =
shift -> __device__ (x, pos) -> x(pos + int(shift))

The translate function takes a 2-dimensional offset vector and returns an algorithm (which can then be applied to the image). In functional programming, lazy evaluation is preferred over eager evaluation, and therefore we will “concatenate” several image processing algorithms in a chain, and at the very end perform the computation. In this way, the Quasar compiler will obtain several degrees of freedom in optimizing the code. For example, it is not necessary to store intermediate results in memory.

As a next algorithm, we define a horizontal mean filter (with as input parameter the size of the local window):

function algo : algorithm = mean_filter_hor(N : int)
norm_factor = 1.0 / N % normalization factor
function y : vec3 = __device__ algo(input : image, pos : ivec2)
y = [0., 0., 0.]
for n=-int(N/2)..int(N/2)
y = y + input(pos + [0,n])
endfor
y = y * norm_factor
endfunction
endfunction

Together with the corresponding vertical mean filter:

function algo : algorithm = mean_filter_ver(N : int)
norm_factor = 1.0 / N % normalization factor
function y : vec3 = __device__ algo(input : image, pos : ivec2)
y = [0., 0., 0.]
for n=-int(N/2)..int(N/2)
y = y + input(pos + [n,0])
endfor
y = y * norm_factor
endfunction
endfunction

we can then easily calculate mean filters with N x N-masks, because the mean filters are separable. Next, the compute function converts the image back to its pixel representation (also called rasterization):

compute : rasterizer =
(x, sz) -> (y = zeros(sz); parallel_do(sz, __kernel__ (pos : ivec2) -> y[pos[0],pos[1],0..2] = x(pos)); y)

Essentially, the function evaluates the input function for all positions within the domain.

Next, we define two operations that apply a specified processing algorithm to an image. In the first algorithm, we “defer” the processing, which means that the processing is only applied when the compute function is applied.

apply_deferred : application =
(x, p) -> __device__ (pos) -> p(x, pos)

In a second operation, we calculate the intermediate result:

apply_immediate : [vec3 -> application] =
sz -> (x, p) -> make_image(compute(apply_deferred(x, p), sz)

Then, our goal is to be able to subsequently apply several algorithms. To simplify the calculation of the end result, we define the definition-assignment operator :=

reduction (y : cube, x : image) -> (y := x) = y = compute(x, size(y))

This assignment will lead to the result being evaluated, resulting in a variable of type cube. Finally, to concatenate several independent operations, we define the pipelining operator |>:

symbolic immediate, deferred
reduction (x : cube, y) -> (x |> y) = make_image(x) |> y
reduction (x : image, a : algorithm, sz : ivec3) -> (x |> immediate(sz, a)) = apply_immediate(sz)(x, a)
reduction (x : image, a : algorithm) -> (x |> deferred(a)) = apply_deferred(x, a)

Here, symbolic introduces some symbols that will be defined later using reductions.

15.1 Example: tranlation and filtering

As an example, we demonstrate the functional image processing framework in applying a translation and a mean filter to an input image. The algorithm specification is then:

im = imread("image.tif")
im_out := im |> deferred(translate([64, 64])) _
|> deferred(mean_filter_ver(7)) _
|> mean_filter_hor(7)
imshow (im_out)

This approach allows to easily describe image processing pipelines, in which different filtering/processing building blocks are connected to each other. Additionally, it allows the Quasar compiler to optimize the entire pipeline at once.

Prev Chapter 14: Built-in function quick reference Up Main page Chapter 16: The Quasar runtime system Next