Objectives
The objective of this assignment is to explore implementations of
Polynomial Multiplication. Record the performance of your programs on
a Fish machine
(Anchovy to Wahoo in the machines list), and experimentally determine
the gains you get with 1-8 threads.
Download and untar PA3.tar. You should not need
to update the Makefile. Do not set the number of threads in the
program, set it at the Linux OS level (using the export or setenv
command). Otherwise we cannot run your program for different numbers of
threads. Write your report in pdf format.
The following two sequential programs have been implemented for you.
- [PolyMultGSQ] (Grade School Quadratic) This is the
naive implementation of Polynomial Multiplication.
- [PolyMultGold] This (sequential) implementation includes
all of the optimizations up to PolyMultBLQ. It is only provided as a
binary. Use it as a benchmark against which you may compare your
programs.
You will need to implement the following programs.
- [PolyMultINQ] (Inverted Quadratic)
- Invert the inner and outer loops of both computational loops
from the PolyMultGSQ implementation.
- Hint: once the loops have been inverted you can condense
the computation into a single double for loop.
- Explore loop parallelism. Report the optimal strategy and
number of threads.
- [PolyMultOPQ] (Outer Product Quadratic)
- Rewrite PolyMultGSQ to use a square iteration space (as
shown in lecture).
- Explore loop parallelism. Report the optimal strategy and
number of threads.
- [PolyMultBLQ] (Blocked Quadratic)
- Block both the inner and outer loops from PolyMultOPQ.
- Use the same parameter (tune1) to set the block size for each
loop.
- Explore loop parallelism. Report the optimal strategy, number
of threads, and block size.
- [PolyMultDCQ] (Divide and Conquer Quadratic)
- Rewrite polyMultOPQ to use the divide and conquer strategy
presented in lecture.
- Use tune2 to set the minimum leaf size.
- Explore task parallelism and test the use of different non
recursive algorithms in the leafs
(e.g. PolyMult{INQ,OPQ,BLQ}). Determine and report which
configuration gives you the best performance.
- [PolyMultDCNew] (Divide and Conquer New)
- The divide and conquer strategy used in PolyMultDCQ is a
preliminary step to writing Karatsuba's algorithm (which
combines the upper right and lower left recursive calls).
However, PolyMultDCQ does not parallelize well because only the
upper right and lower left recursive calls can be executed in
parallel (without allocating more memory).
- Use PolyMultINQ as a starting point to write a divide and
conquer algorithm which does parallelize well.
- Divide relative to the i index, so that both recursive calls
are independent.
- You will need to write your own leaf case because the leaf
domain will not be rectangular.
- Any divide and conquer strategy which achieves the same (or
better) speedup as this strategy will recieve credit.
- Add this task to your project by edditing the Makefile and
PolyMult-wrapper.c to produce
PolyMultDCNew.{time,time-par,check,check-rand,check-par}
Report
- Include the details from the explore steps in each of the
programs above.
- When measuring speedup and finding optimal tuning parameters use
a degree of 65535 (2^16-1)
- Justify the optimal design and configuration of each program
with either experimental evidence or design principles from this
class (for some programs you may need to use both).
- For each parallelization, report speedup on 1-8 threads, using
the speedup equation from lecture 2.
- As always, report the details of the machine where you are
running your tests.
Submission
Submit a PA3.tar file through Checkin. Create this file by using the
rule for PA3.tar in the provided Makefile. (This will ensure that your
project has the correct directory structure.)
Testing
Good luck and have fun!