Chapter 1: Introduction
Next
The Quasar Computation System: Quick Reference Manual
Bart Goossens
Table of Contents
Chapter 1: Introduction
Section 1.1: Computation Engines
Section 1.2: How to use?
Subsection: Architecture: 32-bit/64-bit CPU or GPU
Subsection: Supported libraries
Subsection: Distributing Quasar programs
Section 1.3: Quasar Programming Language
Section 1.4: Integration with foreign programming languages
Chapter 2: Getting started
Section 2.1: Quasar high-level programming concepts
Section 2.2: A brief introduction of the type system
Subsection 2.2.1: Floating point representation
Subsection 2.2.2: Mixed precision floating point computations
Subsection 2.2.3: Integer types
Subsubsection: Important note
Subsection 2.2.4: Fixed sized datatypes
Subsection 2.2.5: Higher dimensional matrices
Subsection 2.2.6: User-defined types, type definitions and pointers
Section 2.3: Automatic parallelization
Section 2.4: Writing parallel code using kernel functions
Subsection 2.4.1: Basic usage: kernel functions
Subsection 2.4.2: Device functions
Subsection 2.4.3: Memory usage inside kernel or device functions
Subsection 2.4.4: Advanced usage: shared memory and synchronization
Chapter 3: Type system
Section 3.1: Type definitions
Section 3.2: Variable construction
Section 3.3: Size constraints
Section 3.4: Dimension constraints
Section 3.5: Cell array types
Section 3.6: Type constructors and the typename function
Section 3.7: Type classes
Section 3.8: Class / user defined type (UDT) definitions
Section 3.9: Function types
Section 3.10: Enumerations
Section 3.11: Passed by reference / Passed by value
Section 3.12: Constants
Chapter 4: Programming concepts
Section 4.1: Polymorphic variables
Section 4.2: Closures
Section 4.3: Device functions, kernel functions, host functions
Section 4.4: Nested parallelism
Section 4.5: Function overloading
Subsection 4.5.1: Device function overloading
Subsection 4.5.2: Optional function parameters
Section 4.6: Functions versus lambda expressions
Subsection 4.6.1: Explicitly typed lambda expressions
Section 4.7: Kernel function output arguments
Section 4.8: Variadic functions
Subsection 4.8.1: Variadic device functions
Subsection 4.8.2: Variadic function types
Subsection 4.8.3: The spread operator
Subsection 4.8.4: Variadic output parameters
Section 4.9: Reductions
Subsection 4.9.1: Symbolic variables and reductions
Subsection 4.9.2: Reduction resolution
Subsection 4.9.3: Ensuring safe reductions
Subsection 4.9.4: Reduction where clauses
Subsection 4.9.5: Variadic reductions
Section 4.10: Partial evaluation
Section 4.11: Code attributes
Section 4.12: Macros
Section 4.13: Exception handling
Section 4.14: Documentation conventions
Chapter 5: The logic system
Section 5.1: Kernel function assertions
Section 5.2: Built-in compiler functions
Section 5.3: Assertion types recognized by the compiler
Subsection 5.3.1: Equalities
Subsection 5.3.2: Inequalities
Subsection 5.3.3: Type assertions
Section 5.4: User-defined properties
Section 5.5: Unassert
Section 5.6: The role of assertions
Chapter 6: Generic programming
Section 6.1: Parametrized functions
Section 6.2: Parametrized reductions
Section 6.3: Parametrized types
Section 6.4: Generic memory allocation functions and casting
Section 6.5: Explicit specialization through meta-functions
Section 6.6: Implicit specialization
Section 6.7: Generic size-parametrized arrays
Section 6.8: Generic dimension-parametrized arrays
Section 6.9: Example of generic programming: linear filtering
Chapter 7: Object-oriented programming
Section 7.1: Mutable/non-mutable classes
Section 7.2: Constructors
Section 7.3: Destructors
Subsection 7.3.1: Methods
Subsection 7.3.2: Properties
Subsection 7.3.3: Operators
Section 7.4: Dynamic classes
Section 7.5: Parametric types
Section 7.6: Inheritance
Section 7.7: Virtual functions, interfaces, abstract classes
Chapter 8: Special programming patterns
Section 8.1: Matrix/vector expressions
Section 8.2: Loop parallelization/serialization
Subsection 8.2.1: While-loop serialization
Subsection 8.2.2: Example: gamma correction
Section 8.3: Dynamic kernel memory
Subsection 8.3.1: Examples
Subsubsection: Kernel version
Subsubsection: Loop version
Subsection 8.3.2: Memory models
Subsection 8.3.3: Features
Subsection 8.3.4: Performance considerations
Section 8.4: Map and Reduce pattern
Section 8.5: Cumulative maps (prefix sum)
Section 8.6: Meta functions
Subsubsection: Example: copying the type and assumptions from one variable to another
Chapter 9: GPU hardware features
Section 9.1: Constant memory and texture memory
Section 9.2: Shared memory designators
Subsection 9.2.1: How to use
Subsection 9.2.2: Virtual blocks and overriding the dependency analysis
Subsection 9.2.3: Examples
Subsubsection 9.2.3.1: Histogram
Subsubsection 9.2.3.2: Separable linear filtering
Subsubsection 9.2.3.3: Parallel reduction (sum of NxN matrices)
Section 9.3: Speeding up spatial data access using Hardware Texturing Units
Section 9.4: 16-bit (half-precision) floating point textures
Section 9.5: Multi-component Hardware Textures
Section 9.6: Texture/surface writes
Section 9.7: Maximizing occupancy through shared memory assertions
Section 9.8: Cooperative groups and warp shuffling functions
Subsection 9.8.1: Fine synchronization granularity
Subsection 9.8.2: Optimizing block count for grid synchronization
Subsection 9.8.3: Memory fences
Section 9.9: Kernel launch bounds
Section 9.10: Memory management
Section 9.11: Querying GPU hardware features
Chapter 10: Parallel programming examples
Section 10.1: Gamma correction
Section 10.2: Fractals
Section 10.3: Image rotation, translation and scaling [basic]
Section 10.4: 2D Haar inplace wavelet transform using lifting
Section 10.5: Convolution
Section 10.6: Parallel reduction sum
Section 10.7: A more accurate parallel sum
Section 10.8: Parallel sort
Section 10.9: Matrix multiplication
Chapter 11: Multi-GPU programming
Section 11.1: A quick glance
Section 11.2: Setting up the device configuration
Section 11.3: Three levels of concurrency
Section 11.4: Manual vs. automatic multi-GPU scheduling
Section 11.5: Host Synchronization
Section 11.6: Key principles for efficient multi-GPU processing
Section 11.7: Supported Libraries
Section 11.8: Profiling techniques
Section 11.9: Automatic GPU scheduling
Section 11.10: Developing multi-GPU applications
Chapter 12: SIMD processing on CPU and GPU
Section 12.1: Storage versus computation types
Section 12.2: x86/x64 SIMD accelerated operations
Subsection 12.2.1: Example: AVX image filtering on CPU
Section 12.3: CUDA SIMD accelerated operations
Subsection 12.3.1: Example: 8-bit image filtering
Subsection 12.3.2: Example: 16-bit half float image filtering
Section 12.4: ARM Neon accelerated operations
Section 12.5: Automatic alignment
Section 12.6: Automatic SIMD code generation
Chapter 13: Best practices
Section 13.1: Use main functions
Section 13.2: Shared memory usage
Section 13.3: Loop parallelization
Section 13.4: Output arguments
Section 13.5: Writing numerically stable programs
Section 13.6: Writing deterministic kernels
Chapter 14: Built-in function quick reference
Chapter 15: Functional image processing in Quasar
Section 15.1: Example: tranlation and filtering
Chapter 16: The Quasar runtime system
Section 16.1: Program interpretation and execution
Section 16.2: Abstraction layer for computation devices
Section 16.3: Object management
Section 16.4: Memory management
Section 16.5: Load balancing and runtime scheduling
Section 16.6: Optimizing memory transfers with const and nocopy
Section 16.7: Controlling the runtime system programmatically
Chapter 17: The Quasar compiler/optimizer
Section 17.1: Function Transforms
Subsection 17.1.1: Automatic For-Loop Parallelizer (ALP)
Subsection 17.1.2: Automatic Kernel Generator
Subsection 17.1.3: Automatic Function Instantiation
Subsection 17.1.4: High Level Inference
Subsection 17.1.5: Function inlining
Subsection 17.1.6: Kernel fusion
Section 17.2: Kernel transforms
Subsection 17.2.1: Parallel Reduction Transform
Subsection 17.2.2: Local Windowing Transform
Subsection 17.2.3: Kernel Tiling Transform
Subsection 17.2.4: Kernel Boundary Checks
Subsection 17.2.5: Target-specific programming and manually invoking the runtime scheduler
Subsection 17.2.6: Compile-time specialization through the $target() meta function
Section 17.3: Common compilation settings
Section 17.4: CUDA target architecture
Chapter 18: Development tools
Section 18.1: Redshift - integrated development environment
Section 18.2: Spectroscope - command line debugger
Section 18.3: Redshift Profiler
Subsection 18.3.1: Security settings
Subsection 18.3.2: Peer to peer transfers
Subsection 18.3.3: GPU event view
Subsection 18.3.4: Timeline view
Subsection 18.3.5: Kernel line information
Subsection 18.3.6: Kernel metric reports
Chapter 1: Introduction
Next