CS 475/575 -- Spring Quarter 2024
Test #2 Review
This page was last updated: March 24, 2024
Under Construction:
This content is not final until you see this message go away!
Test date and time range:
Test #2 will open in Finals Week on Wednesday, June 12 at 12:01 AM PDT (one minute after midnight).
It will close on Saturday, June 15, at 11:59 PM PDT (one minute before midnight).
This gives you 95 hours, 58 minutes in which to take a 1-hour test.
Test Information:
-
This will be a multiple choice test cast as a Canvas "Quiz".
-
There will be 40 questions, worth 2.5 points each.
-
You will have 60 minutes to complete it.
Once you start, you must finish.
Canvas does not allow you to pause, leave, then come back and resume.
-
The test is open notes and closed friends.
Warning! "Open Notes" is not the same as "I don't need to study for it"!
You will run out of time if you have to look up in the notes every one of the questions.
-
Clearly, I cannot stop you from accessing information on the Internet.
However, the test has been written against our class notes.
If you miss a particular question, any protest of the form "But somethingsomething.com said that..." will be ignored.
-
You are responsible for
- what is in the handouts
- what was said in class and the videos, including the Live Lectures
- what was covered on the quizzes
- what you have done in the projects
Grade Cutoffs
As a reminder, our grade cutoffs are:
Points |
Grade |
1060 |
A |
1040 |
A- |
1020 |
B+ |
1000 |
B |
980 |
B- |
960 |
C+ |
940 |
C |
920 |
C- |
900 |
D+ |
880 |
D |
860 |
D- |
The test can potentially cover any of the following:
Class Topics:
-
GPU 101:
GPU performance vs. CPU performance,
"CUDA Cores" vs. "Intel cores",
What GPUs are good at, what GPUs are not good at
Compute Units,
Processing Elements,
the Yellow Robot.
-
CUDA:
general idea,
two programs together in the same file,
the nvcc compiler,
relationship between nvcc and gcc/g++ and Visual Studio,
executing the kernel,
"chevrons",
the GPU consists of a grid of blocks,
each block contains a grid of threads,
1D or 2D,
thousands of threads,
threads per block,
number of threads in a "Warp" (32),
gridDim, blockIdx, blockDim, threadIdx
types of memory (and who can share them),
steps in creating and running a CUDA program,
host (CPU) memory vs. device (GPU) memory,
transferring buffers to/from the GPU,
cudaMalloc( ),
cudaMemcpy( ),
performance.
[ You won't need to be able to reproduce exact function syntax. ]
-
DGX system:
what it is,
what is slurm?
using srun
using sbatch
-
CUDA ↔ OpenCL Transition:
relationship between the CUDA and OpenCL compilers and gcc/g++ and Visual Studio,
the four things that both CUDA and OpenCL must do:
- Allocate data space in GPU memory
- Transfer data from CPU to GPU
- Execute a kernel to compute on that data
- Transfer data back from the GPU to the CPU
-
OpenCL:
general idea,
two programs, each in a separate file (C/C++ and .cl),
the command queue
work-groups,
work-items,
1D or 2D or 3D,
thousands of threads,
threads per block,
get_num_groups( ),
get_global_size( ),
get_local_size( ),
get_global_id( ),
get_local_id( ),
SIMD parallelism (float2, float4, float 8, float16),
types of memory (and who can share them),
steps in creating and running an OpenCL program,
host (CPU) memory vs. device (GPU) memory,
command queue,
compiling and building .cl code,
where the OpenCL compiler lives (in the OpenCL driver),
enqueuing,
executing a kernel,
transferring buffers to/from the GPU,
performance.
[ You won't need to be able to reproduce exact function syntax. ]
-
OpenCL Events:
throwing events
waiting for one or more events
creating a kernel-execution graph structure
[ You won't need to be able to reproduce exact function syntax. ]
-
OpenCL Assembly Language:
Difference between sqrt( ), distance( ), length( ), normalize( ) and
fast_sqrt( ), fast_distance( ), fast_length( ), fast_normalize( ).
which you should use when,
registers,
fused multiply-add (FMA).
-
GPU Reduction:
general idea,
workgroup-shared memory array,
mask, offset,
barriers.
-
OpenCL / OpenGL Interoperability:
general idea,
OpenGL creates a vertex buffer,
in this case, the Vertex Buffer is a table of positions and colors,
OpenCL acquires the buffer,
clCreateFromGLBuffer,
particles.cpp,
particles.cl
[ You are not responsible for any of the OpenGL code. ]
-
Message Passing Interface (MPI):
general idea,
multiple computers networked together,
single-program-multiple-data (SPMD) programming model,
broadcast,
sending,
receiving,
reduction,
scatter / gather,
barriers,
derived types.
[ You won't need to be able to reproduce the exact syntax of the function calls. ]
-
Compute-to-communicate ratio:
what it is,
why it is good to make it bigger (more computing accomplished before a communication is needed),
why it is good to make it smaller (bring more simultaneous compute power to bear),
can result in a non-obvious "sweet spot"
N:2, N:4, N:6
- Parallelism Jeopardy
The four different types of parallelism you could bring to bear on a specific problem
-
More Information.
Projects:
- Project #3: Mutexes in a stack implementation
- Project #4: SIMD Array Multiplication and Summing: advantage of SIMD
- Project #5: CUDA -- Monte Carlo Simulation: performance characteristics
- Project #6: OpenCL -- Matrix Multiplication
- Project #7: MPI -- Fourier analysis
[ You don't need to know the details of a Fourier analysis ]
Hints:
-
Hint #1: You won't have to write any code.
-
Hint #2: I might give you code and ask what it does.
-
Hint #3: I might give you code and ask what is wrong with it and how to fix it.
-
Hint #4: Any arithmetic on the test will be things that you can do in your head,
but you can have a calculator handy if you want.