Lectures

LECTURE 1: INTRODUCTION I

LECTURE 2: INTRODUCTION II

  • Overview of a parallel architecture taxonomy. Data-level vs. thread-level parallelism.
  • Slides: [PDF]

LECTURE 3: OpenMP Programming I

LECTURE 4: OPENMP PROGRAMMING II

LECTURE 5: OPENMP PROGRAMMING III

  • Debugging/testing strategies for an OpenMP programs. Programming OpenMP with C using the parallel directive and a critical region.
  • Exercises:
  • Reading:

LECTURE 6: OPENMP PROGRAMMING VI

  • Next we will review the expressiveness and trade-offs of atomic statements vs critical regions. First, we will review the benefits of atomic statements over critical regions. Next, we will examine the use of multiple critical regions with different names.
  • Exercises:
    • atomic_ex1.c – an example of the trade-offs between atomic and critical.
    • critical_ex1.c – an example of two independent critical regions within the same parallel region.
  • Readings:

LECTURE 7: OPENMP PROGRAMMING V

  • Programming OpenMP with C using parallel directive and critical regions.
  • Exercises:
    • pi_serial.c – a serial program for calculating pi as the sum of area’s of rectangles under a curve.
    • pi_parallel_ex1.c – parallelization of the serial pi program using the parallel directive.
    • pi_parallel_ex2.c – improving the performance of pi_parallel_v1.c. The original program suffers from “false sharing” between array elements on the same cache line. This program uses an architecture specific solution (padding) to solve the problem.
  • Reading:

LECTURE 8: OPENMP PROGRAMMING VI

  • Programming OpenMP with C using parallel directive and atomic statements.
  • Exercises:
    • pi_parallel_ex3.c – parallelization of the serial pi program using the parallel directive and a critical region.
    • pi_parallel_ex3b.c – the danger of placing critical regions in loops is demonstrated in this program. This is not a best practice :).
    • pi_parallel_ex4.c – parallelization of the serial pi program using the parallel directive and an atomic statement.
  • Whiteboard:

LECTURE 9: OPENMP PROGRAMMING VII

  • We will now introduce loops. We will look at the process involved in parallelizing serial loops, the use of the nowait clause, nested parallel loop with the collapse clause, customizing loops with the schedule clause and finally the operators available for loop reductions (e.g., +, *, min, max). We will also conclude our look at OpenMP with a review of best practices.
  • Exercises:
    • loop_ex1.c – parallelizing loops by combining the parallel and for worksharing constructs.
    • loop_ex2.c – an example of parallelizing a serial loop by first modifying the loop to ensure all iterations are independent.
    • loop_ex3.c – an example of two parallel loops in the same parallel block that also employs the nowait clause.
    • loop_ex4.c – parallelizing nested loops with the collapse clause.
    • loop_ordered_clause.c – an ordered clause can be used with a parallel for loop to order the output of the loop the same as if it was executed sequentially. The ordered clause does have a performance cost.
    • loop_if_clause.c – an if clause can be added to a parallel construct and allows for code to be parallelized under some conditions (e.g., a high number of threads) and not parallelized under other conditions (e.g., a low number of threads).
  • Readings:
  • Whiteboard:
Process for parallelizing loops
Best Practices for Parallelizing with OpenMP

LECTURE 10: Posix thread Programming I

  • We will introduce explicit parallelism in C through discussion of threading and the pthread library. First, we will look at threads vs. processes in UNIX. Second, We will look at pthreads and explain what they are and why they are useful. We will also discuss testing threaded programs.
  • Resources:
  • Whiteboard:

LECTURE 11: POSIX THREAD PROGRAMMING II

  • Creating and destroying threads. Passing data to threads. Creating joinable vs detached threads.
  • Exercises:
  • Resources:

LECTURE 12: POSIX THREAD PROGRAMMING III

  • Accessing shared data with a pthread mutex (mutual exclusion). The difference between lock() and tryLock().
  • Exercises:
  • Resources:

LECTURE 13: POSIX THREAD PROGRAMMING IV

  • An in-depth look at the performance impacts of mutex, trymutex and the combined use of both.
  • Exercises:
    • pthread_trymutex_lockmutex.c – an example of using a hybrid approach of a trylock and lock to improve performance with respect to real time and CPU time.
  • Whiteboard:

LECTURE 14: POSIX THREAD PROGRAMMING V

LECTURE 15: OPENCL PROGRAMMING I

  • A high-level overview of data parallelism with OpenCL. 
  • Resources:

LECTURE 16: OPENCL PROGRAMMING II

  • Data types in OpenCL C kernels.
  • Notes: [PDF]

LECTURE 17: OPENCL PROGRAMMING III

  • Programming a host program and kernal code with OpenCL.
  • opencl_square.c – An OpenCL program that squares an array of values.

LECTURE 18: Heterogeneous Computing

  • A discussion of heterogeneous computing where multiple different processor types are used.