3 Type system
3.1 Type definitions
Although variable types in Quasar often do not need to be specified (the types are either determined at compile time by type inference, or at runtime), it is always recommended to use strong typing. Strong typing has the immediate advantage that the compiler can generate some more optimal code for the typed variables. In this section, a more detailed overview of the Quasar type system is provided.
The following type categories exist in Quasar:
-
Class #1: Primitive types (scalar, cscalar, int, intx,uintx,string)
-
Class #2: vector/matrix/cube types (vec, mat, cube, cube{:})
-
Class #3: Classes / user-defined types (class)
-
Class #4: Function types ([?? -> ??], [(??, ??) -> (??, ??)], [__device__ scalar -> scalar], [__kernel__ () -> ()] ...)
-
Class #5: Type references (type).
Class #2 types are also container types and can embed all other types. For example, mat[scalar] represents a matrix type for scalar numbers, cube[[??->??]] represents a 3D array of functions with one input argument and one output argument. The parameters can be nested: cube[cube[^T]] denotes a 3D array of 3D arrays of pointers to objects of type T. By default, the default type arguments of vec, mat and cube is scalar. The ^-prefix cannot be used on vectors/matrices: these objects are already passed by reference. In fact, ^ is only used for classes.
There also some derived types, which can be expressed directly in terms of the above types. Note that the following definitions are already defined as shorthand, so you do not need to define them yourself:
% Complex-valued matrices
type cvec : vec[cscalar]
type cmat : mat[cscalar]
type ccube : cube[cscalar]
% Integer matrices
type ivec : vec[int]
type imat : mat[int]
type icube : cube[int]
% By default: the argument type of vec[.],mat[.],cube[.] is scalar:
type vec : vec[scalar]
type mat : mat[scalar]
type cube : cube[scalar]
% Cell matrices
type cellvec : vec[??]
type cellmat : mat[??]
type cellcube : cube[??]
Class #3 types are passed by value. It is possible to declare pointers to these types (e.g. in order to pass them by reference). For this purpose, Pascal-style pointers can by used (e.g. ^T). There is only one level of indirection possible (in contrast to C/C++) and also pointer values need to be explicitly initialized. It is not possible to declare pointers to types other than classes.
Class #2 types can contain parameters of class #3: for example mat[T], cube[T], vec[^T]. Especially vectors/matrices/cubes of UDTs containing only primitive types are very efficient, because they use a sequential layout scheme (i.e. they are stored contiguously in memory, with appropriate alignment depending on the machine/GPU).
Recall that cell matrix types containing unspecified sub-types ??, such as vec[vec[??]], mat[??], cannot be passed to kernel or device functions. This is mainly for performance reasons: when the types of all variables are specified, the compiler can generate more optimal code. On the other hand, not specifying types can be an advantage for rapid prototyping.
3.2 Variable construction
The construction of variables of a specified type depends on the type class:
-
Class #1: variables of class #1 are constructed using symbols: a number containing a decimal point (e.g. 1.4e3) will have type scalar. When the symbol contains the imaginary unit (1i or 1j) it will be a complex scalar cscalar (e.g. 1+3i). Strings (string) can be defined using “quotation marks”. Note that it is not possible currently to construct variable of type intx or uintx: these types are mainly intended to be used for storage, and because for most computation engines default integer type (int) offer a better performance, the types cannot be used for calculations.
-
Class #2: variables of vector, matrix, cube types can be created using the [] constructor. The type of the result depends of the types of the operands (which should be the same for all operands, otherwise a compiler error is generated). For example, [1,2,3,4] has type ivec, [[1.0,2.0],[3.0,4.0]] has type mat. If a,b,c have a user-defined type T, then [a,b,c] will have type vec[T]. Similarly [[a],[b]] has type mat[T]. Real-valued vectors, cubes and matrices of arbitrary dimenions can be constructed using the functions uninit, zeros, and ones:
A = uninit(2)
B = zeros(3,4)
C = ones(1,2,3)
Here, the function uninit allocates a vector of length 2, without initializing the data. zeros creates a matrix of 3 rows and 4 columns, and initializes each element to 0. ones creates a cube of dimensions 1 × 2 × 3 and initializes each element to 1. Complex-valued versions can be obtained using the function complex, combined with uninit, zeros or ones:
A = complex(uninit(2))
B = complex(zeros(3,4))
C = complex(ones(1,2,3))
Variables of parametric vectors, matrices and cubes can also be constructed, however they require a type alias:
type my_cell : mat[cube]
A = my_cell(1,2)
A[0,0] = uninit(4,2)
A[0,1] = uninit(4,2)
-
Class #3: user-defined types are constructed either using the type name followed by (), or by explicitly assigning values to all fields, as shown below:
type point : class
x : scalar
y : scalar
endtype
p = point()
q = point(x:=1,y:=2)
-
Class #4: variables of these types are created using either a lambda expression, or a function definition (see Section 4.6↓).
-
Class #5: use the type keyword to define types.
3.3 Size constraints
Often, more efficient code can be generated when the array (e.g., vector, matrix, cube, ...) dimensions are known at compile-time. For example, when dimensions are known, certain types of for-loops can be efficiently parallelized, without using dynamic kernel memory. For this purpose, array types can be annotated with their dimensions, such as in the following table:\begin_inset Separator latexpar\end_inset
Data type
|
Example
|
Purpose
|
vecX or vec(X)
|
vec4
|
A vector of length X
|
matXxY or mat(X,Y)
|
mat2x3
|
A matrix of size X × Y
|
cubeXxYxW or cube(X,Y,W)
|
cube2x2x3
|
A cube of size X × Y × W
|
cubeXxYxWxZ or cube(X,Y,W,Z)
|
cube2x2x2x2
|
A hypercube of size X × Y × W × Z
|
There are two equivalent ways to annotate the size: either
vecX or
vec(X). The former notation can only be used when X (or Y or Z) are numeric. A type parameter can be used as well as long as the dimensions can be determined statically (e.g., as part of a generic function, see
Section 6.7↓).
There are several advantages of annotating types with size constraints:
-
A known dimension becomes a constant rather than a parameter that must be passed to a kernel/device function. Constants can be propagated and simplify indexing expressions.
-
By specifying the array dimensions through the type, the compiler and runtime can automatically check the dimensions. Out-of-bounds array accesses and dimensioning problems can easily be detected at compile-time.
-
Additionally, the code generator may map the types onto stack memory or even registers, which also avoids dynamic allocation overhead.
-
In generic functions, generic size parameters allows coupling arguments of different matrices. For example:
function C = generic_matrix_vector_mult[M,N](A : mat(M,N), B : vec(N))
...
endfunction
Here, the size constraints will be checked by the compiler (and/or runtime) and an error will be reported when the dimensions of the input matrix/vector are not correct.
Variables of fixed-size array types are always passed by reference, just like array types without dimension constraint. Hence, the dimension constraint acts as a contract for the type, but omitting the contract will not change the program behavior.
The type inference will automatically determine that the type of an expression zeros(2,2), [[0,1],[1,0]] is mat(2,2).
Also, incompatibilities in matrix/vector operations can be detected at compile-time in some cases, for example:
y = [1,2,3] * eye(4)
The use of array size constraints is currently the best approach to avoid the use of dynamic kernel memory (which incurs a performance penalty), see
Section 8.3↓.
When array dimensions are only partially known, this can be annotated using :, as in cube(:,:,3). This is useful for example to indicate the number of color channels of an image. Both the width and height are then unspecified (this may depend on an input file). To make code more easily extendible and also to facilitate compiler analysis (for loop parallelization, type inference), partially parametrized array types can be used.
Example:
img : cube(:,:,3) = imread("lena_big.tif")
indicates that A is a cube with three dimensions, for which the first two dimensions have unspecified length (:) and the last dimension has length 3. When a for-loop is written over the third dimension, either explicitly or implicitly via matrix slices:
img[m,n,:]
the compiler can determine the length numel(img[m,n,:])==3 so that the vector type vec3 can be assigned.
Implementation note:
for arrays with number of elements <= 64, the stack memory (or registers) will be used, while for >64, the arrays will be allocated in dynamic kernel memory, which has some computational overhead (see Section 8.3↓). To maximize performance, it is best to keep the arrays short. In future versions of Quasar, the constant 64 may be increased or be made more adaptable.
3.4 Dimension constraints
In some cases it is useful to indicate the dimensionality of an array type directly through a parameter. This can be achieved using dimension constraints, using the notation cube{n}. The following table gives an overview of some common types:
Data type
|
Equivalent to
|
Meaning
|
cube{1}
|
vec
|
vector
|
cube{2}
|
mat
|
matrix
|
cube{3}
|
cube
|
3 dimensional array (cube)
|
cube{4}
|
-
|
4 dimensional array (hypercube)
|
cube{N}
|
-
|
N dimensional array (N=constant)
|
cube{:}
|
-
|
Infinite dimensional array
|
The type cube{:} is useful in function definitions. It indicates that the function accepts arrays of arbitrary dimension. It is currently not possible to construct infinite dimensional arrays (in fact the maximum dimensionality is currently set internally to 16, which should be enough for most practical purposes).
Dimension-parametrized arrays such as
cube{N} are useful for describing algorithms that do not depend on a certain dimensionality of the input data. Because the dimensionality is known at compile-time, the compiler can still generate efficient indexing code. For more information, see
Section 6.8↓.
3.5 Cell array types
Cell arrays are structured but unnamed array types. In Quasar, each element of an array can have a different type. For example, a 2 × 2 cell matrix type with vectors in the first row and scalar values in the second row is given by:
mat[{{vec, vec},
{scalar, scalar}}]
Cell array types (CATs) can therefore be seen as a generalization of tuple types to multiple dimensions. Arbitrary numbers of dimensions can be defined by nesting
{}. CATs mostly occur during type inference and are not often used in user-code. In particular, CATs allow the type inference to succeed (and correspondingly efficient parallelization and other optimizations) in places where inhomogeneous data is stored in a cell matrix.
As a multidimensional generalization of tuples, cell array types can also be used to define lambda expressions (see
Section 4.6↓) with multiple return values:
get_string_and_value : [scalar -> vec[{string,scalar}]] =
value -> {"String”, value}
[str, val] = get_string_and_value(1.234)
Note that the right handed side
{"
String"
, value} constructs a cell vector.
3.6 Type constructors and the typename function
By calling the type as a function, for many types such as scalars and vectors/matrices, ..., an instance of the type can be constructed. This is particularly useful for generic code (see
Chapter 6↓) but also for constructing values of types for which no default creation function exists (for example
zeros can be used to create a vector value of type
vec[scalar], but the function cannot create an integer vector of type
vec[int]). Below are a number of examples of type constructors:
integer_vec = vec[int](200) % integer vector of length 200
uint8_vec4 = vec[uint8](4) % a 32-bit value containing four 8-bit unsigned integers
integer_val = typename(scalar)(1) % a scalar with value 1
typed_cell = typename(mat[{{vec,vec},{scalar,scalar}}])(2,2) % typed cell matrix
hyper_cube = typename(cube{4}[int16])(2,2,2,2) % 4-D array of int16
half_vec = typename(vec[scalar’half])(4) % half precision float vector of length 4
Quasar types are parsed using a type parser which is separate from the main parser. Therefore, with exception of the type annotation operator
:, type names cannot be directly placed inside regular statements. As an alternative Quasar now provides the builtin function
typename, so that type names can be given inline and so that instances of these types can be constructed more easily (in the past a type alias was required for this purpose).
For example,
typename(vec[scalar’half])(4) creates a half-precision vector value of length 4. This of practical relevance for SIMD processing (see further in
Chapter 12↓).
3.7 Type classes
Type classes allow the type range of the input parameters to be narrowed. For example:
[int|cube]
denotes a type for a variable that can either be int or cube. The compiler will always attempt to simplify the type class definition. For example, [int’const|int] becomes int, because the const modifier does not make much sense here.
When such simplifications are not possible, the variables of these types are automatically polymorphic (see
Section 4.1↓). Nevertheless, type classes allow restricting the possible types that a function parameter can have (which is better than not specifying any type at all!), allowing the compiler to still proceed with the type inference. For example:
function y = diag(x : [mat|cmat]) This construction only allows variables of the type ‘mat’ and ‘cmat’ to be passed to the function. This is useful when it is already known in advance which types are relevant (in this case a real-valued or complex-valued matrix). Equivalently, type class aliases can be defined. The type:
type AllInt : [int|int8|int16|int32|uint8|uint32|uint64]groups all integer types. Several functions of the Quasar standard library use type classes to allow the functions to be applicable to a wide range of input types.
3.8 Class / user defined type (UDT) definitions
User-defined types are used for storing structured data. Although Quasar currently does not support class member functions, inheritance or interfaces, there are some ways to simulate this behavior and obtain the same functionality.
To define class member functions, the special
reduction keyword (see
Section 4.9↓) can be used. First, a general function should be defined using a lambda expression or function definition (e.g.
point_distance in the example below). Next, a reduction can be used to redirect the member function
p.distanceto to the
point_distance function.
type point : class
x : scalar
y : scalar
endtype
reduction (p : point, q : point) -> p.distanceto(q) = point_distance(p, q)
p = point(x:=4, y:=5);
q = point(x:=2, y:=1)
print p.distanceto(q)
Defining an interface can be achieved by defining a user-defined class of function variables:
type my_interface : class
times2_function : [scalar -> scalar]
sum_function : [?? -> ??]
endtype
obj = my_interface(
times2_function := (x : scalar) -> 2*x,
sum_function := (x) -> sum(x)
)
print obj.times2_function(2)
print obj.sum_function([1,2,3])
Interface initialization requires all fields to be set. This enforces that are interface methods are implemented.
3.9 Function types
Function types have the following form:
[__modifier__ (inArgType1,...,inArgTypeN)->(outArgType1,...,outArgTypeN)]
where __modifier__ is either __kernel__ (indicating a kernel function type), __device__ (indicating a device function type) or entirely omitted (host function types). Below is an example of a function taking a matrix and a vector, with no output parameters:
[(mat,vec)->()]
A device function with no input arguments and one output argument (with unspecified type) has the following type:
[__device__()->??]
A variadic function taking a cube followed by an arbitrary number of scalar values can have the type:
[(cube,...scalar)->()]
Only kernel/device function types can be passed to kernel/device functions. If function parameters have a function type set, the compiler and/or runtime system will check the function variable that is passed, and generate an error if there is a type mismatch.
3.10 Enumerations
Enumeration types, which contain an ordering of named values, are defined as in the following example:
type color : enum
Red
Green
Blue = 2
Cyan
Magenta
Yellow
endtype
Enumerations are treated as integer types. To each named value (key) an integer value is assigned, that starts counting from 0, unless a value is explicitly assigned (like 2 in the above example). The individual values can be accessed with the dot-syntax:
my_color = color.Magenta
One exception are matches, which do not require the enum class name to be specified, because it is clear from the context:
match a with
| Red -> print ("The color is red")
| Green -> print ("The color is green")
| _ -> print ("Unknown color")
endmatch
3.11 Passed by reference / Passed by value
The semantics of whether a given variable is passed to a function by reference or by value, generally depends on the type of the variable. Scalar types are passed by value, while vectors, matrices and cubes are passed by reference. In the first case (pass by value), a function can modify the variable locally, but the effects of the change are not visible outside the function. In the second case (pass by reference), the function can modify the vector or matrix elements of the variable, and these modifications are visible to the calling function. This way, some run-time overhead involved with copying matrices can be avoided.
An overview of the variable passing conventions is given in
Table 3.1↓. Some dynamic types (
string,
object,
type) are not supported by kernel/device functions, therefore the table lists not applicable (N/A) for these types.
Table 3.1 Overview of the variable passing conventions.
Type
|
host function
|
device function
|
kernel function
|
device function
|
|
|
(host call)
|
|
(device call)
|
scalar, cscalar
|
value
|
value
|
value
|
value
|
vecx, ivecx, cvecx
|
reference
|
reference
|
reference
|
reference
|
vec, mat, cube
|
reference
|
reference
|
reference
|
reference
|
string
|
reference
|
reference
|
N/A
|
N/A
|
function
|
reference
|
reference
|
reference
|
reference
|
class
|
value
|
value
|
value
|
value
|
^class
|
reference
|
reference
|
reference
|
reference
|
object, type
|
reference
|
reference
|
N/A
|
N/A
|
??
|
reference*
|
reference*
|
N/A
|
N/A
|
*: unless the underlying value is scalar
3.12 Constants
To enforce that a by reference variable cannot be modified by the function, it is possible to add the
’const modifier to the type definition (see
Section 16.6↓), as in the following example:
function [] = compute(A : mat’const)
...
endfunction
Explicitly declaring ’const is mostly useful for preventing accidental writes to the A matrix. For other purposes, it is not needed to manually annotate types with the ’const modifier because the compiler will add this modifier automatically.
In general, constant variable types have the following benefits:
-
By using constant values, redundant memory transfer operations between the different computation devices can be avoided.
-
Constant values may propagate to functions, avoiding use of closures and function pointers, leading to a better execution performance.
-
Quasar may use GPU features (e.g., non-coherent texture cache) to accelerate access to constant memory.
-
Constant values may be used as size and dimension parameters of types. For example cube{N}.
Below an example is given of such constant value propagation. Here, by declaring the variable half_wnd_size as a constant, the value of half_wnd_size can be substituted into the two for-loops (with iterators k and l), leading to additional optimizations such as loop unrolling.
half_wnd_size : ’const = 4
function y = mean_filter(x : mat’circular)
y = zeros(size(x))
tmp : mat’circular = zeros(size(x))
for m=0..size(y,0)-1
for n=0..size(y,1)-1
sum = 0.0
for k=-half_wnd_size..half_wnd_size
sum += x[m,n+k]
endfor
tmp[m,n] = sum
endfor
endfor
for m=0..size(y,0)-1
for n=0..size(y,1)-1
sum = 0.0
for l=-half_wnd_size..half_wnd_size
sum += tmp[m+l,n]
endfor
y[m,n] = sum
endfor
endfor
endfunction