CS560 Suggested Class Projects (Spring 2008)
There are essentially two kinds of projects: application/algorithm studies,
and tools. A tool oriented project requires prior discussion with Sanjay to
ensure that the efforts complement efforts in the Mélange group.
We expect most students to do application/algorithm studies, where an
application/algorithm is parallelized and implemented on one of two platforms:
- a Sony Playstation 3, that uses the Sony/Toshiba/IBM Cell processor; or
- the FPGA boards (either AlphaData or Digilent) in the department.
It is strongly recommended that the Playstation projects involve a complete
application, and the FPGA ones tackle an algorithmic kernel (since there is a
considerable amount of interface VHDL code that must be written by hand). It
is acceptable to do the project with a partner, with the expectation of
correspondingly larger scope and size. FPGA based projects should probably be
done with a partner.
Application Studies
Here is a list of suggested applications that you may explore. I've tried to
make this list reasonably large, but it also gives you an idea of what I'm
looking for. It is also somewhat restrictive because I know that their
computationally intensive kernels are amenable to the techniques we develop in
this class. If there's something else you'd like to pursue, please discuss
with Sanjay.
- Linear Space Sequence Alignment for bioinformatics: Driga,
Lu, Schaeffer, Szafron, Charter and Parsons propose a memory efficient
biological sequence alignment algorithm: FastLSA:
A Fast, Linear-Space, Parallel and Sequential Algorithm for Sequence
Alignment, in Algorithmica, Volume 45 , Issue 3, Pages: 337-375 (July
2006). A simple parallelization of this algorithm is available from the Biobench Suite at University of
Maryland, under the name of PLSA: Parallel Linear Space Alignment.
- Watershed computation models: TREX:
Two-Dimensional Runoff Erosion and Export is a software package developed
over the years by CSU's Civil Engineering faculty. There is a wealth of
information on this project there, but if you plan to do it, we have access to
a more recent version of the software (written in C++).
- Weather modeling I: The Weather Research and Forecasting
Model is a next generation mesoscale numerical weather prediction system
developed at NCAR, Boulder. The source is freely available and can be
installed to run sequentially, and we are in touch with the authors.
- Weather modeling II: CSU's own Atmospheric Science
Department also has a number of software packages for weather prediction and
modeling. We have been collaborating with Dave Randall's group to investigate
data mapping and parallelization strategies for the so called "geodesic
grid". We have access to a shallow water model (written in Fortran) on
this grid, and a number of students have worked on parts of this code. We
have proposed a "smashing technique" to avoid "wrap-around dependences."
- Biomimetic Vision and Focus of Attention: Bruce Draper and
his research group are exploring biomimetic vision, and have developed
software to determine "Focus of Attention." The initial and computationally
intensive part of this application involves constructing what is known as the
"image pyramid." Thomson Comer has very generously provided a stand-alone
implementation of this part of the software called AttendAsYou.
Furthermore he has agreed to be available to help, and also has prior
experience with implementing image convolutions on FPGAs (he tool CS560 last
year). Another student is already working with Wim Bohm to implement this
algorithm on FPGAs (Digilent and/or AlphaData) so this target is not
recommended for CS560, although the Cell would be complementary.
Algorithmic Kernels
- Algorithmic Tiling for the APP: The algebraic path problem
(APP) is a generalized problem that unifies many problems (such as the
all-pairs shortest paths in a graph, the transitive closure of a boolean
relation, matrix inversion, among others) into a single seting. It (i.e., all
instances of the problem) can be resolved by a generic algorithm
schema, simply by changing the nature of some of the operations. Many
parallel algorithms/architectures have been proposed fopr the APP, the most
recent ones may be found in this
and in
this recent papers that propose a technique called "algorithmic tiling"
that transforms a standard "scalar" algorithm to a "blocked" version. Such
blocked programs have all the advantages of tiling even though, the two
programs are not semantically equivalent ( all intermediate results
are not necesarily identical, although the final values are).
- Protein folding and the optimal parenthesization problem:
The optimal parenthesization problem is a classic algorithm for which a neat
systolic aray has been proposed more than twenty years ago by Guibas, Kung
and Thompson. The problem has recently become important due to its immediate
application in the determination of the secondary structure of proteins. The
project would require you to implement the array on an FPGA co-processor,
measure the performance, determine the bottlenecks and develop solutions to
resolve them. There are many issues to resolve: the proposed architecture is
a triangular array with about 0.5 n^2 processors, and the total
computation time is about 1.5n, leading to a work of 0.75 n^3,
but the algorithm has only a ninth as many operations. Because the
"core" of each processor implements just two add-compare operations, it may
be possible to get a fairly large degree of parallelism (as high as 100
procesors?), and so offset the work-inefficiency. However, an even more
interesting challenge is to obtain an efficient architecture which can also
exploit the parallelism. Finally, a potetial way to improve efficiency is by
judicious partitioning of the computation between the host and the FPGA.
- Stencil kernels: Successive-over-relaxation (SOR) is an
algorithmic technique commonly used in the numerical simulation of many
physical phenomena (eg. the so called "heat equation", electromagnetic wave
propagation using Maxwell's equations, etc.) It is an iterative technique,
that lends itself to elegant parallelization on a systolic array. However,
many interesting issues would arise when actually implementing such an
architecture on an FPGA co-processor: the tradeoff between precision and the
number of iterations (on an FPGA there is flexibility regarding the arithmetic
oprerations, from full IEEE floating point implementations to fixed
precision); the choice among different systolic arrays possible and the
resulting architectural tradeoffs, and finally issues related to tiling which
could be pursued if time permits.
- LU Decomposition:
Classic example, and if you choose
this, we should expect some strong results.