Online Parallel Programming Class

Spring Quarter 2015

March 30 - June 5, 2015

http://cs.oregonstate.edu/~mjb/onlineparallelclass

This page was last updated: February 27, 2015

Really? A college-level online parallel programming class?

Oregon State University is taking its Desktop Parallel Programming course online through its award-winning Ecampus program! This will let you learn this important skill any time you'd like, while being dressed any way you'd like. :-)

This course is targeted towards:

Computer Science students or recent graduates who want to see how to apply what they have been learning to the new multicore-based and GPU-based programming platforms.
Mid-career CS people who could use some extra knowledge and experience to advance their careers.

So, whether you are a game developer, an engineer, a chemist, a biologist, a physicist, etc., this is a great time to explore how to get more performance from the computers you probably already have.

Here's what you get:

It's not a MOOC -- you get a real-live 4 college credits for it.
You are expected to keep up with the lectures and do all of the programming assignments.
You get graded (in addition to A-F, pass/no-pass is an option).
You can get that credit as either undergraduate or grad-level. (You do extra work to earn the grad-level credit.)
There is a record that you took this course. It will show on your Oregon State University transcript.
Your credit can potentially transfer to other programs you are in now, or might be in in the future.

In other words, you will be able to prove that you were here and that you learned something significant!

For more information about the content of the course, read on!

For more information about the online-ness of the class, go to:

Registration opens February 25! To register, go here.

What will we be doing in this class?

The goals of this course are to leave you "career-ready" (i.e., both work-ready and research-ready) for tasks that require desktop parallelism, both on a CPU and on a GPU.

Topics will include:

Why study parallel programming: the pervasiveness of Big Data
Parallel computing: types, limitations
Multicore computing
Moore's Law
Amdahl's Law: forward and inverse
Gustafson's Law
OpenMP -- for loops, tasks, mutexes
Benchmarking parallel programs
Synchronization issues in parallel computing
Cache issues in parallel computing
Many-core computing -- the Intel Xeon Phi
What is special about GPUs?
GPU computing -- OpenCL: extreme data parallel
GPU computing -- interoperating OpenCL with OpenGL
GPU computing -- OpenGL Compute Shaders
Vector (SIMD) computing on CPUs

We will attempt to help you use your own computers for the assignments. After all, it is a blast to have cool stuff running on your own machine so that you can show your friends and family! We get that.

However, if that doesn't end up working for you, fear not! We have machines here at OSU that you will be able to get remote access to. This includes the ability to run OpenCL code on a GPU.

Definitions

Data Parallel a computing situation where a single operation is being performed identically on many pieces of data.
GPU Programming "GPU" stands for Graphics Processing Unit. Computer Graphics chips have been designed to handle the streaming-processing of vertices and pixels. It is now possible to redirect that streaming-computing to data parallel problems.
Multicore Most CPU chips produced today have more than one processing unit on them. Each of them is called a core. Each core can be executing its own set of instructions. Thus, if you arrange your program properly, you can gain increased performance from the same hardware.
Vector Computing Many pieces of computing hardware allow multiple arithmetic operations to take place simultaneously. On the Intel Xeon Phi, for example, 16 floating-point operations can take place at the same time. vector computing is also referred to as Single-Instruction-Multiple-Data, or SIMD.

Prerequisites

This course will use C/C++ in all assignments and examples. Already being comfortable with function calls, arrays, for loops, structures, arrays of structures, structures of arrays, pointers, trees, and linked lists is important. It is strongly suggested that you not use this class as an opportunity to learn C for the first time.

Beyond that, understanding differential calculus, analysis of algorithms, and Big-O notation is highly recommended.

Already having some knowledge about computer architecture (e.g., cores, cache) would be a plus, but not critical.

You will be expected to take data and produce scientifically-literate graphs and reports. You will be expected to have access to a graphing program, a word processor, and a way to turn your reports into PDF.

Learning Outcomes

On completion of the course, students will have demonstrated the ability to:

Explain the clockspeed limitations of computing using physics and Moore's Law
Explain the limitations of parallel computing using Amdahl's Law and Gustafson's Law
Demonstrate "parallel thinking" in program design
Explain the difference between ILP, TLP, DLP, and SIMD
Demonstrate the ability to program parallel algorithms in TLP, DLP, and SIMD.
Characterize what types of problems are best able to be parallelized
Characterize different parallel programming patterns and what types of problems they best address
Chacacterize how cache issues affect parallel performance
Demonstrate the proper use of synchronization to avoid race conditions and deadlock
Characterize the benefits of using a CPU versus using a GPU for parallel programming
Characterize the benefits of using a GPU versus using a CPU for parallel programming

In addition, those taking the grad-student version of this course will also have deminstrated the ability to:

Read a parallel-programming-related research paper and write a 5-page analysis paper of it. (I will make some of these available for you, or you can propose your own. It has to be a real research paper, though.)

A significant part of your projects will be characterizing parallel performance based on certain parameters in order to develop insights into how to get the most from parallel systems. Be prepared to produce graphs like this:

Also be prepared to look for patterns in the graphs and draw conclusions from them. After all, this is Computer Science.

Canvas

This course will be delivered via the Canvas Learning Management System where you will interact with your classmates and with me. Within the course's Canvas site you will access the learning materials, such as the syllabus, handouts, class discussions, assignments, projects, and quizzes.

Professor

The class is being taught by Professor Mike Bailey.

Office: Kelley Engineering Center 2117

Phone: 541-737-2542

E-mail: mjb@cs.oregonstate.edu

Web site: http://cs.oregonstate.edu/~mjb

Lecture Schedule

Week Topics
1 Introduction. Syllabus. What this course is ... and isn't.
Project notes: timing, graphing. Examples.
Parallel programming background information. The three things we care about Parallel Processing for. Von Neumann architecture.
Multithreading

2 Moore's Law. What holds, what doesn't.
Multicore. Hyperthreading.
Timing. Speedup. Amdahl's Law. Parallel efficiency. Gustafson's Law.
OpenMP: fork-join model, pragmas, what it does for you, what it doesn't do for you,
OpenMP: parallelizing for-loops
OpenMP: variable sharing, dynamic vs. static thread assignment.
Chunksize.

3 Summing. Not doing anything special vs. critical vs. atomic vs. reduction.
Trapezoid integration.
Project #1.
Mutexes.
Barriers.
Project #2.
OpenMP: sections, tasks, graph traversal.
Caches. Architecture. Hits. Misses.

4 Caches, cont. False sharing.
Project #3.
Designing parallel programs.

5 Tasks.
Barriers.
Project #4.
Test #1

6 Go over the test answers.
GPU 101.
Architecture.
What GPUs are good at. What they are not good at. Why?

7 OpenCL: What is it? Diagram. Mapping onto GPU architecture.
OpenCL library. Querying configurations.
Project #5.

8 OpenCL Reduction.
OpenCl Events
OpenCL / OpenGL Interoperability
Project #6.

9 Looking at OpenCL Assembly Language.
A special kind of parallelism: Single Instruction Multiple Data (SIMD). SSE, AVX, AVX-512 instructions: what they are, how to use them.
Types of problems that work this way. Guest Speaker: Patrick Neill, NVIDIA: "GPU Architectures"

10 SIMD: aritmetic on two large arrays
Project #7.
Fourier analysis. Autocorrelation.
Project #8.
Guest Speaker: Michael Wrinn, Intel: "Parallel Design Patterns"
OpenGL Compute Shaders
More Information.

11 Test #2

Week	Topics
1	Introduction. Syllabus. What this course is ... and isn't. Project notes: timing, graphing. Examples. Parallel programming background information. The three things we care about Parallel Processing for. Von Neumann architecture. Multithreading
2	Moore's Law. What holds, what doesn't. Multicore. Hyperthreading. Timing. Speedup. Amdahl's Law. Parallel efficiency. Gustafson's Law. OpenMP: fork-join model, pragmas, what it does for you, what it doesn't do for you, OpenMP: parallelizing for-loops OpenMP: variable sharing, dynamic vs. static thread assignment. Chunksize.
3	Summing. Not doing anything special vs. critical vs. atomic vs. reduction. Trapezoid integration. Project #1. Mutexes. Barriers. Project #2. OpenMP: sections, tasks, graph traversal. Caches. Architecture. Hits. Misses.
4	Caches, cont. False sharing. Project #3. Designing parallel programs.
5	Tasks. Barriers. Project #4. Test #1
6	Go over the test answers. GPU 101. Architecture. What GPUs are good at. What they are not good at. Why?
7	OpenCL: What is it? Diagram. Mapping onto GPU architecture. OpenCL library. Querying configurations. Project #5.
8	OpenCL Reduction. OpenCl Events OpenCL / OpenGL Interoperability Project #6.
9	Looking at OpenCL Assembly Language. A special kind of parallelism: Single Instruction Multiple Data (SIMD). SSE, AVX, AVX-512 instructions: what they are, how to use them. Types of problems that work this way. Guest Speaker: Patrick Neill, NVIDIA: "GPU Architectures"
10	SIMD: aritmetic on two large arrays Project #7. Fourier analysis. Autocorrelation. Project #8. Guest Speaker: Michael Wrinn, Intel: "Parallel Design Patterns" OpenGL Compute Shaders More Information.
11	Test #2

Projects

Project # Title

1 OpenMP: Numeric integration

2 OpenMP: N-body problem

3 False sharing.

4 Task-based Functional Decomposition

5 OpenCL Array Multiplication and Reduction

6 OpenCL/OpenGL Particle System

7 SIMD: Array multiplication.

Grad-only Research paper analysis

Project #	Title
1	OpenMP: Numeric integration
2	OpenMP: N-body problem
3	False sharing.
4	Task-based Functional Decomposition
5	OpenCL Array Multiplication and Reduction
6	OpenCL/OpenGL Particle System
7	SIMD: Array multiplication.
Grad-only	Research paper analysis

Downloadable Files

You will be given sample code to work with and to use as a basis for your project assignments.

Handouts

There is no textbook for this class. All topics will be covered via handouts.

Comments? Suggestions? Questions? Contact:
Prof. Mike Bailey
Oregon State University, Computer Science
2117 Kelley Engineering Center
Corvallis, OR 97331-5501
541-737-2542
mjb@cs.oregonstate.edu


	It's not a MOOC -- you get a real-live 4 college credits for it.
	You are expected to keep up with the lectures and do all of the programming assignments.
	You get graded (in addition to A-F, pass/no-pass is an option).
	You can get that credit as either undergraduate or grad-level. (You do extra work to earn the grad-level credit.)
	There is a record that you took this course. It will show on your Oregon State University transcript.
	Your credit can potentially transfer to other programs you are in now, or might be in in the future.

Office:	Kelley Engineering Center 2117
Phone:	541-737-2542
E-mail:	`mjb@cs.oregonstate.edu`
Web site:	`http://cs.oregonstate.edu/~mjb`