In this, you will modify a CUDA kernel and empirically test and compare its performance to the given code. The given program that performs vector addition. The provided program does not coalesce memory accesses. You will modify it to coalesce memory access.
Resources that may be useful for this assignment include:
You will need to use a linux machine with an NVIDIA graphics card. In the CSU 325 lab, the machines have such a graphics card. These are machines named after fish (anchovy .. ), see these machines and grep 325.
*** Exception: Do not use wahoo for your CUDA experiments.***
You'll need to take a few additional steps in order to use CUDA on the department
machines.
Minimally, you need to load the CUDA module:
module load cuda
Download and untar the CUDAL5.tar file. To compile a CUDA program, use the CUDA compiler nvcc. You should use the provided Makefile as a starting point for this lab and for your CUDA assignmnts. The Makefile shows you which gcc is to be used.
As discussed in class, vecadd is a micro benchmark to determine the effectiveness of coalescing. You are provided with a non-coalescing version, and it is your job to create a coalescing version, and to measure the difference in performance of the two codes. For this lab you must create a new kernel called vecaddKernel01.cu which implements the coalesced version of the vector addition.
In order to implement coalescing you need to change what part of the computation each thread is doing. In the base version each thread is computing N values in which all N values are consecutive in the array. Using coalescing we can improve performance by allowing each thread to compute a consecutive element in the array then index by number of threads. You must change the code to allow for this different method of computing the result.
Here is a set of files provided in CUDAL5.tar for Vecadd:
Compile and run the provided program vecadd00 and collect data on the time the program takes with 60,000 values per thread. Test your new program and compare the time it takes to perform the same work performed by the original.