Foundations of Fine Grain Parallelism
CS/ECE 560: Spring 2011
Combined on-campus on-line sections
Specific course details are being revised. Information below is subject
to change
Class meets: Tu-Th 9:30-10:45; BEH 107
Announcements:
Be sure to watch this space. Late breaking info will be
posted here, rather than sent by email.
Also be sure to frequently check
the Class Schedule
Course Overview
Objectives:
CS-ECE 560, which teaches the foundations of a model of parallelism called
the polyhedral equational model, is unlike
any other course offered anywhere in the world. Here, you will
- learn how to optimize compute-intensive kernels by exploiting fine-grain
parallelism, memory and locality for a variety of target architectures
such as multi-core processors, GPUs, FPGAs, and vector units (SSE
instructions);
- understand the foundations of the polyhedral model, a
mathematical formalism to specify, develop, analyze and transform
high-level programs for parallelism, locality, memory, and complexity
optimization;
- appreciate the elegance of equational programming in the
polyhedral model, where computations are specified as high-level
mathematical equations; and
- acquire the skills to do research in architectural and programming
abstractions for massive parallelism.
The polyhedral model is now used for automatic parallelization in a number
production compilers such as gcc, IBM's XL
series, and Reservoir Lab's R-Stream. There are active research groups on the
polyhedral model at MIT (Amarasinghe), Illinois (Padua), Utah (Hall), Ohio
State (Sayappan), Louisisana State (Ramanujam), IBM Research (Renganarayana,
Bondhugula, etc.), Leiden (Kienhuis, Deprettere), and many groups in France:
IRISA, Rennes (Derrien, Quinton), ENS Lyon (Feautrier, Darte, Alias), Bordeaux
(Barthou), INRIA (Cohen, Bastoul, etc.)
Rajopadhye is one of the inventors of the polyhedral model, and has been
active in the field since his Ph.D. dissertation (Utah, 1986). At CSU, we
take a unique view of the polyhedral model that combines the analytic
quantitative power of the model with the expressivity and clean semantics of
declarative, functional programming in the form of equational
programming.
Motivation
Computer architecture is in a constant flux, with ever-increasing parallelism,
which now comes in many forms: instruction-level parallelism, SIMD
instructions, and recently multiple cores.
- Emerging "many-core" processor architectures in the general-purpose arena
are taking us towards tens to hundreds of processors on the same processor
die.
- Graphics Processing Units (GPUs, specialized hardware devices, previously
used exclusively to accelerate display engines, that are finding their way
into general purpose computing (GPGPU).
- Embedded systems offer yet other forms of parallelism such as FPGAs
(Field Programmable Gate Arrays: chips that can be re-programmed "at the
circuit level," allowing for tremendous performance gains for certain niche
applications) and ASICs (Application Specific Integrated Circuits, that can
provide highly tuned but specialized functionality).
Future processors, whether general or specialized, will have a large number
of cores, typically fine-grain, possibly with dedicated, often distributed
memories, and lower power consumption. Collectively, they will be much more
powerful than today's machines.
In the past, programmers were shielded from details of processor evolution. A
single architectural abstraction (the von Neumann machine), and a single
programming/algorithmic abstraction (the random access machine RAM) allowed a
clean separation of algorithmic, programming and architectural advances. This
has been called the end of "La-Z-Boy Programming," where programmers simply
wait for the next generation of processors. These abstractions are crumbling
today. Programmers, library writers and application developers must be aware
of parallelism and memory locality at multiple levels, or risk losing many of
the Moore's Law gains of modern architectures. This raises a number of
issues.
- The immediate goal is to develop highly tuned applications that can best
exploit the emerging architectures of today.
- Second, the medium term challenge is to ensure that the skills learnt to
do this are portable to tomorrow's architectures, even though we don't know
their details.
- Finally, the long term goal is to enable the foundational research to
render the first two challenges moot. This can be achieved through automatic
compilation and code generation tools, and will enable the "return to La-Z-Boy
Programming."
Details