Prev Main page Up Main page Chapter 2: Getting started Next

1 Introduction

The Quasar Computation System is optimized to deal with “astronomical” numbers of data values or operations, massively performed in parallel and/or distributed along several processors, hence its name. In the first place, the system is intended to be used for processing of 2D or 3D images, and excels in iterative algorithms that allow for a lot of parallelism. The system consists of three major components:

Quasar compiler: compiles input code (.q files written in the Quasar scripting language) to an intermediate format, which can either be directly interpreted or translated to Common Intermediate Language (CIL) code (managed executable files). These managed executable files can then be run under Windows (.Net or MONO), Linux (MONO) or Mac (MONO).
Quasar interpreter: mostly used for debugging code.
Quasar computation engine: a computation engine performs general (high-level) computations, such as multiplication of real-valued matrices, taking the imaginary part of a complex number, performing FFTs and various built-in functions. Computation engines are substitutable, which means that one engine can take over the work of another engine. [A] [A] For GPU computation engines vs. Generic CPU computation engine (see Section 1.1↓), this is done automatically and at any time. For other computation engines, this is only possible by specifying command-line flags, in future versions this may be possible at runtime as well.

1.1 Computation Engines

Different computation engines exists which take advantage of certain technology present on the system.

Generic CPU computation engine: makes use of an optimizing C++ compiler (such as GCC, Intel Compiler, ...) in the background and automatically uses OpenMP for multi-threading. This gives a speed up of typically 2x-8x compared to sequential execution.
CUDA computation engine: uses the CPU for small number of computations (e.g. operations with small matrices), and dynamically switches to GPU computation for larger amount of data, and depending on whether the data currently already resides in GPU/CPU memory.
Hyperion computation engine: provides multi-GPU support (see Chapter 11↓) and allows using OpenCL devices.
Helios computation engine: a light computation engine, developed in C++, intended for embedded platforms.

The specific details and implementation of the computation engine are completely transparent to the user. More concretely, the user can specify by command line which computation engine to use. For example -cpu specifies to use the generic CPU engine, -gpu will give the “best” GPU engine for the given system (at least if CUDA/OpenCL is installed). The computation engines perform automatic memory management, i.e. the user is relieved from allocating/freeing memory, and copying memory from/to the GPU. The CPU computation engine (currently) uses a garbage collector, while the CUDA computation engine has a custom fast memory allocator.

The Quasar compiler automatically invokes the NVidia CUDA compiler (CUDA computation engine) or the configured C/C++ compiler (CPU computation engine) for compiling critical parts of the code (so-called device and kernel functions, see further).

1.2 How to use?

One single executable program performs all the work (both compiling and running the code). The usage is as follows:

./Quasar.exe [-debug] [-cpu|-gpu] [-profile] [-double] [-nogl] [-make_exe] program.q

where the parameters have the following meaning:

-debug: use the interpreter for running the code. In case of failure, exact information on the lines which triggered the error will be given (useful for debugging).
-cpu: uses the generic CPU computation engine for running the code (default=-gpu)
-gpu: uses a GPU computation engine (default choice)
-profile: runs the code in interpreted mode, and collects profiling information. The profiling information is then printed to the console at the end of the program.
-double: instructs the computation engine to use the double precision floating point by default (see Subsection 2.2.1↓).
-make_exe: builds a managed executable (.exe). The executable can be run using .Net/MONO.
-make_lib: builds a managed library (.qlib), that can be used in other Quasar programs.
-nogl: disables OpenGL support (used for visualization, e.g. the function imshow).
program.q: a source code file written in the Quasar programming language, containing the program to run.

When the -debug switch is not specified, the compiler produces an executable binary (.exe) which allows the program to be run directly without compilation. The compiler is relatively fast, most (simple) algorithms take a couple of milliseconds to compile.

Note that the GPU computation engine is often 10x to 100x faster then the CPU computation engine. Nevertheless, it is useful to occasionally run the program on the CPU as well, to check the numerical accuracy/precision of the results.

Architecture: 32-bit/64-bit CPU or GPU

Quasar has been designed to operate correctly in the following conditions:

32-bit CPU (x86) - the CPU uses a 32-bit address space.
64-bit CPU (x64) - the CPU uses a 64-bit address space (useful for addressing more than 2GB of RAM).
32-bit GPU - the GPU uses a 32-bit address space.
64-bit GPU - the GPU uses a 64-bit address space (when the GPU has more than 2GB RAM, although devices with less than 1GB RAM support it).

By default, the choice of 32-bit/64-bit CPU depends on the OS. If a 64-bit OS is installed, the 64-bit CPU version of Quasar will be used. The mode in which the GPU is run, depends on the installed version of the GPU runtime (e.g., 64-bit or 32-bit CUDA Runtime). The normal practice is to run the GPU in the same mode as the CPU. Under some circumstances, some GPU devices do not support 64-bit yet. For CUDA, this can be solved by using a special 32-bit version of the CUDA interoperability DLL (CUDA.Net.dll), instead of the default cross-architecture DLL.

Important note: since CUDA 7.0 (released in 2015), the 32-bit mode is not supported anymore. Quasar still supports 32-bit modes for backward compatibility (e.g., in combination with CUDA 6.5). However, it is highly recommended to switch to 64-bit versions of Quasar whenever possible.

Supported libraries

A number of libraries have builtin support. These include: OpenGL, FFTW, cuFFT, cuBLAS, cuSolver and cuDNN. It suffices to use the Quasar functions designed to use these libraries in a user-friendly way. For more information, see the CUDA guide.

Distributing Quasar programs

Quasar programs need to be distributed together with the Quasar runtime library. For this purpose, portable Quasar runtime installers are available for Windows and Linux. The portable Quasar runtime installer can e.g. be integrated in your product installer.

1.3 Quasar Programming Language

Motivation for a new programming language for heterogeneous computing

From the principle, the right tool for the right job, Quasar aims at simplicity (a low barrier of entry) while aiming at a high performance that is similar to handwritten C++/CUDA/OpenCL code.

Additionally, the Quasar language unifies CPU and GPU programming: one single code path is sufficient to generate optimized versions for both CPU and GPU. This considerably reduces programming effort. In fact, the Quasar compiler can recognize and optimize sophisticated programming patterns (such as parallel reductions, prefix sums, stencil operations etc.) To be able to do so, higher-level information is extracted from the Quasar program. In other programming languages this information is often lost (e.g., because there are no built-in dynamically sized multidimensional arrays, the presence of pointers and aliasing conditions hamper compiler analysis, array/vector sizes cannot be statically determined etc.). Quasar then uses target-specific source-to-source optimizations to generate efficient C++/CUDA or OpenCL code. For the final translation to binary, commercial or open-source compilers are used in the background. This also allows benefitting from the low-level optimizations in these compilers.

Furthermore, Quasar offers the low-level flexibility and optimization possibilities of C/C++ together with the high-level rapid testing/development of Octave/Matlab. Additionally, Quasar is user/programmer-friendly and is easy to learn. Of course, there are always certain compromises to be made (e.g., flexibility of programming versus computational cost), but this is where the compiler research and the various tools kick in.

Syntax features

The emphasis of the Quasar programming language is on simplicity and practical usefulness. The syntax is similar to MATLAB/Octave (this is mainly to keep the transition from Matlab to Quasar easy), although there are a number of differences which encourage efficient programming:

Objects (such as matrices, cell matrices etc) are passed by reference rather than by value. This means that a simple assignment a=b has negligible computation cost, since it only involves copying pointers. However, one has to be careful with function calls: when passing a matrix as an input argument, the function is allowed to modify the input parameter. [B] [B] If the intention is to copy the values of objects, the function copy(.) can be used to perform a deep copy of objects. This is mainly for efficiency reasons. On the other hand, scalar numbers (real or complex) are passed by value at any time.
Zero-based indexing. All indices start with 0, similar to C/C++, Java, C#, ...
Some improved syntax (similar to GNU OCTAVE): lambda expressions, indexing of the results of a function call (like imread(file)[0..100,0..100]), ...

Advantages

In general, Quasar has the following advantages:

Uniform programming model for CPU and GPU. Unlike some other programming models, in Quasar it is not necessary to implement separate functions for different target devices (for example, a CPU implementation and a GPU implementation). In fact, the same code is targetted toward heterogeneous compute devices. For this purpose, compiler transformations specialize the code for the target architecture. When desired, it is possible to write target-specific code, but in practice this is rarely needed.
Compact and easy to learn programming language. Quasar code is simple to develop using the Quasar Redshift IDE. An easy learning curve, together with various integrated debugging and visualization tools allow a novice to get started really quickly. Compiler errors and warnings have been optimized to be as informative and helpful as possible.
Access to low-level parallelization primitives (through kernel and device functions). Kernel and device functions are compiled natively using existing C++/CUDA compilers (e.g., CUDA NVCC, GCC, MSVC or any other C++ compiler). Quasar code can therefore be seen as a thin layer on top of C++ or CUDA.
High-level programming. Loop parallelization and kernel generation convert the code to low-level kernels. The high-level programming approach not only increases productivity, but it also stimulates writing concise and readable code. This simultaneously reduces the chance for bugs.
Transparent use of CPU / GPU resources. Essentially, no knowledge on GPU programming is required. However, knowledge on parallel programming is a must!
Automatic concurrent kernel execution. Automatic assignment of CUDA streams is generally a tedious task. The runtime system automates this task, as a result, kernel launches and memory copies can overlap whenever the compute device resources and data dependencies allow it.
Lightweight runtime system with minimal runtime overhead. As long as the bulk of the computations is done within kernel/device functions, the runtime overhead is negligible. The execution time is often similar to handwritten C++ or CUDA code.
Dynamic runtime scheduling. The runtime system offloads computations to be best suitable (and available) device.
Automatic memory management and memory transfers: there is no need to worry about deallocation, dangling pointers. Additionally, the runtime system makes sure that the memory is transferred to the right device at the right time.
Hardware agnostic programming. In general, the Quasar programming model is hardware-agnostic, so that the code does not depend much on the features of the (GPU) hardware. When desired, low level primitives (e.g., shared memory, thread synchronization, textures, ...) can be accessed.
Easy access to low-level CUDA features. such as textures, surfaces, cooperative threading, warp shuffling, shared memory and thread synchronization.
Builtin OpenGL interoperability and visualization features. OpenGL interoperability allows access to data allocated in CUDA, which is useful for efficient visualization. There is the possibility of generating both texture and vertex data from Quasar, which allows creating e.g., advanced 3D plots.
Future-proofness: older Quasar programs automatically use GPU feature of newer GPU architectures. You only need to update to the latest Quasar version

1.4 Integration with foreign programming languages

Many existing code bases exist, therefore Quasar can be seamlessly integrated with several programming languages. In this section, we provide a quick overview of integration techniques. For more details, see the external interface reference.

.Net languages (C#, F#, IronPython, ...): the Quasar .Net host API can be used to access Quasar features (including running Quasar libraries or binaries generated using Quasar).
Java: a Java bridge has been developed by the Flemish Institute for Biotechnology (VIB) and will soon be available open source.
C++: the Quasar C++ host and DSL APIs offer an extensive set of runtime features to allow either Quasar programs to be used as libraries in C++ projects, or C++ libraries to be called from Quasar. Furthermore, the Helios system allows transpiling Quasar code to C++ which can be linked with existing C++ modules.
Python: the pyQuasar Python-Quasar bridge acts as both a library to Quasar and a class extension to Python, allowing Quasar functions to be called from Python, and vice versa. pyQuasar also maps NumPy arrays onto Quasar arrays.

Prev Main page Up Main page Chapter 2: Getting started Next