{"id":371,"date":"2025-01-02T20:23:45","date_gmt":"2025-01-02T20:23:45","guid":{"rendered":"https:\/\/www.sqrlab.ca\/csci4060u\/?page_id=371"},"modified":"2025-03-20T12:11:07","modified_gmt":"2025-03-20T12:11:07","slug":"lectures","status":"publish","type":"page","link":"http:\/\/www.sqrlab.ca\/csci4060u\/lectures\/","title":{"rendered":"Lectures"},"content":{"rendered":"\n<ul class=\"wp-block-list\">\n<li><a href=\"#lecture01\">Lecture 1: Introduction I<\/a><\/li>\n\n\n\n<li><a href=\"#lecture02\">Lecture 2: Introduction II<\/a><\/li>\n\n\n\n<li><a href=\"#lecture03\">Lecture 3: OpenMP Programming I<\/a><\/li>\n\n\n\n<li><a href=\"#lecture04\">Lecture 4: OpenMP Programming II<\/a><\/li>\n\n\n\n<li><a href=\"#lecture05\">Lecture 5: OpenMP Programming&nbsp;III<\/a><\/li>\n\n\n\n<li><a href=\"#lecture06\">Lecture 6: OpenMP Programming&nbsp;IV<\/a><\/li>\n\n\n\n<li><a href=\"#lecture07\">Lecture 7: OpenMP Programming V<\/a><\/li>\n\n\n\n<li><a href=\"#lecture08\">Lecture 8: OpenMP Programming VI<\/a><\/li>\n\n\n\n<li><a href=\"#lecture09\">Lecture 9: OpenMP Programming VII<\/a><\/li>\n\n\n\n<li><strong>Test #1 [Feb 6, 2025] (Lectures 1-9)<\/strong><\/li>\n\n\n\n<li><a href=\"#lecture10\">Lecture 10: POSIX Thread Programming I<\/a><\/li>\n\n\n\n<li><a href=\"#lecture11\">Lecture 11: POSIX Thread Programming II<\/a><\/li>\n\n\n\n<li><a href=\"#lecture12\">Lecture 12: POSIX Thread Programming III<\/a><\/li>\n\n\n\n<li><a href=\"#lecture13\">Lecture 13: POSIX Thread Programming IV<\/a><\/li>\n\n\n\n<li><a href=\"#lecture14\">Lecture 14: POSIX Thread Programming V<\/a> (online asynchronous)<\/li>\n\n\n\n<li><strong>Test #2 [Mar. 6, 2025] (Lectures 10-14)<\/strong><\/li>\n\n\n\n<li><a href=\"#lecture15\">Lecture 15: OpenCL Programming I<\/a><\/li>\n\n\n\n<li><a href=\"#lecture16\">Lecture 16: OpenCL Programming II<\/a><\/li>\n\n\n\n<li><a href=\"#lecture17\">Lecture 17: OpenCL Programming III<\/a><\/li>\n\n\n\n<li><a href=\"#lecture18\">Lecture 18: Heterogeneous Computing<\/a><\/li>\n\n\n\n<li>Lecture 19: Parallel Programming &amp; AI<\/li>\n\n\n\n<li>Lecture 20: Parallel Programming &amp; AI<\/li>\n\n\n\n<li><strong>Test #3 [April 3, 2025 \u2013 take home] (Lectures 15-20)<\/strong><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">LECTURE 1: INTRODUCTION I<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Overview of course material. What is concurrency and why is it important? Concurrency and the free lunch.<\/li>\n\n\n\n<li><em>Slides:<\/em>&nbsp;<a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/CSCI_4060U_Lecture_01.pdf\">[PDF]<\/a><\/li>\n\n\n\n<li><em>Readings:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/blogs.nvidia.com\/blog\/2009\/12\/16\/whats-the-difference-between-a-cpu-and-a-gpu\/\">What\u2019s the Difference Between a CPU and a GPU?<\/a><\/li>\n\n\n\n<li><a href=\"http:\/\/www.futurechips.org\/tips-for-power-coders\/parallel-programming.html\">What Makes Parallel Programming Hard?<\/a><\/li>\n\n\n\n<li><a href=\"http:\/\/www.gotw.ca\/publications\/concurrency-ddj.htm\">The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.apple.com\/ca\/newsroom\/2022\/03\/apple-unveils-m1-ultra-the-worlds-most-powerful-chip-for-a-personal-computer\/\">Apple unveils M1 Ultra, the world\u2019s most powerful chip for a personal computer<\/a><a name=\"lecture01\"><\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture02\">LECTURE 2: INTRODUCTION II<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Overview of a parallel architecture taxonomy. Data-level vs. thread-level parallelism.<\/li>\n\n\n\n<li><em>Slides:<\/em>&nbsp;<a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/CSCI_4060U_Lecture_02.pdf\">[PDF]<\/a><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture03\">LECTURE 3: OpenMP Programming I<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Introduction to implicit parallelism and OpenMP programming in C.<\/li>\n\n\n\n<li><em>Exercises:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/helloworld.c\">helloworld.c<\/a>&nbsp;\u2013 a first C program with OpenMP<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Readings:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"http:\/\/www.openmp.org\/wp-content\/uploads\/Intro_To_OpenMP_Mattson.pdf\">A \u201cHands-on\u201d Introduction to OpenMP<\/a>&nbsp;\u2013 slides 1-39<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Whiteboard:<\/em><\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"374\" src=\"https:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_1-1024x374.jpg\" alt=\"\" class=\"wp-image-414\" style=\"width:614px;height:224px\" srcset=\"http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_1-1024x374.jpg 1024w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_1-300x109.jpg 300w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_1-768x280.jpg 768w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_1-1536x560.jpg 1536w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_1-2048x747.jpg 2048w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_1-500x182.jpg 500w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"377\" src=\"https:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_2-1024x377.jpg\" alt=\"\" class=\"wp-image-415\" srcset=\"http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_2-1024x377.jpg 1024w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_2-300x110.jpg 300w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_2-768x283.jpg 768w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_2-1536x565.jpg 1536w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_2-2048x754.jpg 2048w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_2-500x184.jpg 500w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"367\" src=\"https:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_3-1024x367.jpg\" alt=\"\" class=\"wp-image-416\" srcset=\"http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_3-1024x367.jpg 1024w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_3-300x108.jpg 300w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_3-768x275.jpg 768w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_3-1536x550.jpg 1536w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_3-2048x734.jpg 2048w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture03_3-500x179.jpg 500w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture04\">LECTURE 4: OPENMP PROGRAMMING II<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Programming OpenMP with C using barriers.<\/li>\n\n\n\n<li><em>Exercises:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/helloworld_v2.c\">helloworld_v2.c<\/a>, <a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/helloworld_v3.c\">helloworld_v3.c<\/a>, <a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/helloworld_v4.c\">helloworld_v4.c<\/a>\u2013 an example of parallelization using a barrier.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/barrier_grades.c\">barrier_grades.c<\/a>&nbsp;\u2013 a more detailed example of parallelization using a barrier to separate two calculations involving grades.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Readings:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"http:\/\/jakascorner.com\/blog\/2016\/07\/omp-barrier.html\">OpenMP: Barrier<\/a><\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Whiteboard:<\/em><\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"356\" src=\"https:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture_04_1-1024x356.jpg\" alt=\"\" class=\"wp-image-423\" srcset=\"http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture_04_1-1024x356.jpg 1024w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture_04_1-300x104.jpg 300w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture_04_1-768x267.jpg 768w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture_04_1-1536x535.jpg 1536w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture_04_1-2048x713.jpg 2048w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture_04_1-500x174.jpg 500w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture05\">LECTURE 5: OPENMP PROGRAMMING III<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Debugging\/testing strategies for an OpenMP programs. Programming OpenMP with C using the parallel directive and a critical region.<\/li>\n\n\n\n<li><em>Exercises:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/barrier_grades_bugfree.c\">barrier_grades_bugfree.c<\/a> &#8211; a bug-free version of <a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/barrier_grades.c\">barrier_grades.c <\/a>from last class. <\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/barrier_grades_v2.c\">barrier_grades_v2.c<\/a> &#8211; a complete version of the program that includes both the generation of grades and the calculation of a running average of grades that uses a critical region.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/barrier_grades_v3.c\">barrier_grades_v3.c<\/a> &#8211; a version of the program that calculates the average of grades at the end of the program.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Reading:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"http:\/\/www.openmp.org\/wp-content\/uploads\/Intro_To_OpenMP_Mattson.pdf\">A \u201cHands-on\u201d Introduction to OpenMP<\/a>&nbsp;\u2013 Unit 2, slides 40-91<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture06\">LECTURE 6: OPENMP PROGRAMMING VI<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Next we will review the expressiveness and trade-offs of atomic statements vs critical regions. First, we will review the benefits of atomic statements over critical regions. Next, we will examine the use of multiple critical regions with different names.<\/li>\n\n\n\n<li><em>Exercises:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/atomic_ex1.c\">atomic_ex1.c<\/a>&nbsp;\u2013 an example of the trade-offs between atomic and critical.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/critical_ex1.c\">critical_ex1.c<\/a>&nbsp;\u2013 an example of two independent critical regions within the same parallel region.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Readings:\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.openmp.org\/spec-html\/5.0\/openmpsu95.html\">Atomic statements<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.openmp.org\/spec-html\/5.0\/openmpsu89.html\">Critical regions<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture07\">LECTURE 7: OPENMP PROGRAMMING V<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Programming OpenMP with C using parallel directive and critical regions.<\/li>\n\n\n\n<li><em>Exercises:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pi_serial.c\">pi_serial.c<\/a>&nbsp;\u2013 a serial program for calculating pi as the sum of area\u2019s of rectangles under a curve.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pi_parallel_ex1.c\">pi_parallel_ex1.c<\/a>&nbsp;\u2013 parallelization of the serial pi program using the parallel directive.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pi_parallel_ex2.c\">pi_parallel_ex2.c<\/a>&nbsp;\u2013 improving the performance&nbsp;of pi_parallel_v1.c. The original program suffers from \u201cfalse sharing\u201d between array elements on the same cache line. This program uses an architecture specific solution (padding) to solve the problem.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Reading:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"http:\/\/www.openmp.org\/wp-content\/uploads\/Intro_To_OpenMP_Mattson.pdf\">A \u201cHands-on\u201d Introduction to OpenMP<\/a>&nbsp;\u2013 Unit 2, slides 40-91<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture08\">LECTURE 8: OPENMP PROGRAMMING VI<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Programming OpenMP with C using parallel directive and atomic statements.<\/li>\n\n\n\n<li><em>Exercises:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pi_parallel_ex3.c\">pi_parallel_ex3.c<\/a>&nbsp;\u2013&nbsp;parallelization of the serial pi program using the parallel directive and a critical region.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pi_parallel_ex3b.c\">pi_parallel_ex3b.c<\/a>&nbsp;\u2013 the danger of placing critical regions in loops is&nbsp;demonstrated in this program. This is not a best practice :).<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pi_parallel_ex4.c\">pi_parallel_ex4.c<\/a>&nbsp;\u2013&nbsp;parallelization of the serial pi program using the parallel directive and an atomic statement.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Whiteboard:<\/em><\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"601\" src=\"https:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture_08_1-1024x601.jpg\" alt=\"\" class=\"wp-image-451\" srcset=\"http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture_08_1-1024x601.jpg 1024w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture_08_1-300x176.jpg 300w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture_08_1-768x450.jpg 768w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture_08_1-1536x901.jpg 1536w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture_08_1-2048x1201.jpg 2048w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/01\/lecture_08_1-500x293.jpg 500w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture09\">LECTURE 9: OPENMP PROGRAMMING VII<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We will now introduce loops. We will look at the process involved in parallelizing serial loops, the use of the nowait clause, nested parallel loop with the collapse clause, customizing loops with the schedule clause and finally the operators available for loop reductions (e.g., +, *, min, max). We will also conclude our look at OpenMP with a review of best practices.<\/li>\n\n\n\n<li><em>Exercises:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/loop_ex1.c\">loop_ex1.c<\/a>&nbsp;\u2013 parallelizing loops by combining the parallel and for worksharing constructs.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/loop_ex2.c\">loop_ex2.c<\/a>&nbsp;\u2013 an example of parallelizing a serial loop by first modifying the loop to ensure all iterations are independent.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/loop_ex3.c\">loop_ex3.c<\/a>&nbsp;\u2013 an example of two parallel loops in the same parallel block that also employs the nowait clause.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/loop_ex4.c\">loop_ex4.c<\/a>&nbsp;\u2013 parallelizing nested loops with the collapse clause.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/loop_ordered_clause.c\">loop_ordered_clause.c<\/a>&nbsp;\u2013 an ordered clause can be used with a parallel for loop to order the output of the loop the same as if it was executed sequentially. The ordered clause does have a performance cost.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/loop_if_clause.c\">loop_if_clause.c<\/a>&nbsp;\u2013 an if clause can be added to a parallel construct and allows for code to be parallelized under some conditions (e.g., a high number of threads) and not parallelized under other conditions (e.g., a low number of threads).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Readings:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.openmp.org\/spec-html\/5.0\/openmpsu44.html\">OpenMP Loops<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.openmp.org\/spec-html\/5.2\/openmpse94.html\">OpenMP nowait clause<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.openmp.org\/spec-html\/5.0\/openmpsu97.html\">OpenMP ordered clause<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/www.openmp.org\/spec-html\/5.0\/openmpse16.html#x58-990002.8\">OpenMP Worksharing Constructs<\/a><\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Whiteboard:<\/em><\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"435\" src=\"https:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2019\/02\/lecture_08_1-1024x435.jpg\" alt=\"Process for parallelizing loops\" class=\"wp-image-197\" srcset=\"http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2019\/02\/lecture_08_1-1024x435.jpg 1024w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2019\/02\/lecture_08_1-300x127.jpg 300w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2019\/02\/lecture_08_1-768x326.jpg 768w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2019\/02\/lecture_08_1-500x212.jpg 500w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"452\" src=\"https:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2019\/02\/lecture_09_1-1024x452.jpg\" alt=\"Best Practices for Parallelizing with OpenMP\" class=\"wp-image-200\" srcset=\"http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2019\/02\/lecture_09_1-1024x452.jpg 1024w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2019\/02\/lecture_09_1-300x132.jpg 300w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2019\/02\/lecture_09_1-768x339.jpg 768w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2019\/02\/lecture_09_1-500x220.jpg 500w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture10\">LECTURE 10: Posix thread Programming I<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We will introduce explicit parallelism in C through discussion of threading and the pthread library. First, we will look at threads vs. processes in UNIX. Second, We will look at pthreads and explain what they are and why they are useful. We will also discuss testing threaded programs.<\/li>\n\n\n\n<li><em>Resources:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/hpc-tutorials.llnl.gov\/posix\/\">Pthreads Overview<\/a> [Section 2]<\/li>\n\n\n\n<li><a href=\"https:\/\/hpc-tutorials.llnl.gov\/posix\/pthreads_api\/\">The Pthreads API<\/a> [Section 3]<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Whiteboard:<\/em><\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_1-scaled.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"336\" src=\"https:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_1-1024x336.jpg\" alt=\"\" class=\"wp-image-478\" srcset=\"http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_1-1024x336.jpg 1024w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_1-300x98.jpg 300w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_1-768x252.jpg 768w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_1-1536x504.jpg 1536w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_1-2048x672.jpg 2048w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_1-500x164.jpg 500w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_2-scaled.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"319\" src=\"https:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_2-1024x319.jpg\" alt=\"\" class=\"wp-image-479\" srcset=\"http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_2-1024x319.jpg 1024w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_2-300x93.jpg 300w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_2-768x239.jpg 768w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_2-1536x478.jpg 1536w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_2-2048x638.jpg 2048w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture10_2-500x156.jpg 500w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture11\">LECTURE 11: POSIX THREAD PROGRAMMING II<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Creating and destroying threads. Passing data to threads. Creating joinable vs detached threads.<\/li>\n\n\n\n<li><em>Exercises:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pthread_helloworld.c\">pthread_helloworld.c<\/a>&nbsp;\u2013 a first example of creating and destroying threads.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pthread_args.c\">pthread_args.c<\/a>&nbsp;\u2013 an example of passing data to threads using a structure.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pthread_join.c\">pthread_join.c<\/a>&nbsp;\u2013 an example of joining worker threads to the main thread upon termination.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pthread_detach.c\">pthread_detach.c<\/a>&nbsp;\u2013 an example of detached worker threads.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Resources:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/hpc-tutorials.llnl.gov\/posix\/\">Thread Management<\/a>&nbsp;[Section 5]<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture12\">LECTURE 12: POSIX THREAD PROGRAMMING III<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Accessing shared data with a pthread mutex (mutual exclusion). The difference between lock() and tryLock().<\/li>\n\n\n\n<li><em>Exercises:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pthread_mutex.c\">pthread_mutex.c<\/a>&nbsp;\u2013 an example of using a mutex to access a shared variable from multiple threads.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pthread_trymutex.c\">pthread_trymutex.c<\/a>,&nbsp;<a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pthread_trymutex2.c\">pthread_trymutex2.c<\/a>&nbsp;\u2013 examples of using tryLock() instead of lock() with a mutex.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Resources:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/hpc-tutorials.llnl.gov\/posix\/\">Mutex Variables<\/a>&nbsp;[Section 7]<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture13\">LECTURE 13: POSIX THREAD PROGRAMMING IV<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>An in-depth look at the performance impacts of mutex, trymutex and the combined use of both.<\/li>\n\n\n\n<li><em>Exercises:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pthread_trymutex_lockmutex.c\">pthread_trymutex_lockmutex.c<\/a>&nbsp;\u2013 an example of using a hybrid approach of a trylock and lock to improve performance with respect to real time and CPU time.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Whiteboard:<\/em><\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture13_1-scaled.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"605\" src=\"https:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture13_1-1024x605.jpg\" alt=\"\" class=\"wp-image-483\" srcset=\"http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture13_1-1024x605.jpg 1024w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture13_1-300x177.jpg 300w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture13_1-768x454.jpg 768w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture13_1-1536x908.jpg 1536w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture13_1-2048x1210.jpg 2048w, http:\/\/www.sqrlab.ca\/csci4060u\/wp-content\/uploads\/sites\/14\/2025\/02\/lecture13_1-500x295.jpg 500w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture14\">LECTURE 14: POSIX THREAD PROGRAMMING V<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Pthread condition variables and condition attributes.<\/li>\n\n\n\n<li><em>Exercises:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/pthread_condition.c\">pthread_condition.c<\/a>&nbsp;\u2013 an example of using a condition variable with a mutex in a stock exchange.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Resources:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/hpc-tutorials.llnl.gov\/posix\/\">Condition Variables<\/a>&nbsp;[Section 8]<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><em>Video:<\/em> <a href=\"https:\/\/drive.google.com\/file\/d\/1KPiDJPXXXYFHkTkW8yf-cwRdUSrbSSbb\/view?usp=drive_link\">[Google Drive]<\/a><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture15\">LECTURE 15: OPENCL PROGRAMMING I<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A high-level overview of data parallelism with OpenCL.&nbsp;<\/li>\n\n\n\n<li><em>Resources:<\/em>\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/github.com\/KhronosGroup\/OpenCL-Guide\">OpenCL Guide<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture16\">LECTURE 16: OPENCL PROGRAMMING II<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data types in OpenCL C kernels.<\/li>\n\n\n\n<li><em>Notes:<\/em> <a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/CSCI_4060U_Lecture_16.pdf\">[PDF]<\/a><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture17\">LECTURE 17: OPENCL PROGRAMMING III<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Programming a host program and kernal code with OpenCL.<\/li>\n\n\n\n<li><a href=\"https:\/\/www.sqrlab.ca\/courses\/csci4060u-w25\/exercises\/opencl_square.c\">opencl_square.c<\/a> &#8211; An OpenCL program that squares an array of values.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"lecture18\">LECTURE 18: Heterogeneous Computing<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A discussion of heterogeneous computing where multiple different processor types are used.<\/li>\n<\/ul>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>LECTURE 1: INTRODUCTION I LECTURE 2: INTRODUCTION II LECTURE 3: OpenMP Programming I LECTURE 4: OPENMP PROGRAMMING II LECTURE 5: OPENMP PROGRAMMING III LECTURE 6: OPENMP PROGRAMMING VI LECTURE 7: OPENMP PROGRAMMING V LECTURE 8: OPENMP PROGRAMMING VI LECTURE 9: &hellip; <a href=\"http:\/\/www.sqrlab.ca\/csci4060u\/lectures\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"sidebar-page.php","meta":{"jetpack_post_was_ever_published":false,"footnotes":""},"class_list":["post-371","page","type-page","status-publish","hentry"],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/P739Pv-5Z","_links":{"self":[{"href":"http:\/\/www.sqrlab.ca\/csci4060u\/wp-json\/wp\/v2\/pages\/371","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.sqrlab.ca\/csci4060u\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/www.sqrlab.ca\/csci4060u\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/www.sqrlab.ca\/csci4060u\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.sqrlab.ca\/csci4060u\/wp-json\/wp\/v2\/comments?post=371"}],"version-history":[{"count":44,"href":"http:\/\/www.sqrlab.ca\/csci4060u\/wp-json\/wp\/v2\/pages\/371\/revisions"}],"predecessor-version":[{"id":498,"href":"http:\/\/www.sqrlab.ca\/csci4060u\/wp-json\/wp\/v2\/pages\/371\/revisions\/498"}],"wp:attachment":[{"href":"http:\/\/www.sqrlab.ca\/csci4060u\/wp-json\/wp\/v2\/media?parent=371"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}