The goal of the Cameron project is to develop a language and optimizing compiler that will make reconfigurable computers available to applications programmers, particularly in the area of image processing and computer vision. It is not to develop image processing applications. However, we do write applications in SA-C both to test the system and to demonstrate its power. This task has been made easier by the fact that two of the principle investigators (Bruce Draper and Ross Beveridge) are computer vision researchers.
ARAGTAP is a large automatic target recognition (ATR) system for synthetic aperature radar (SAR) images developed by the U.S. Air Force. The ARAGTAP pre-screener is the front end of ARAGTAP. It is a focus of attention (FOA) system designed to detect possible targets in SAR images, returning regions of interest (ROIs) to be verified and identified. The ARAGTAP pre-screener was ported to the Khoros software development environment by Khoral Research, Inc.; we subsequently translated it into SA-C.
The ARAGTAP pre-screener relies on morphology operators, including eight repititions of image dilation (alternating between a square and cross-shaped kernel) and eight repititions of image erosion (again, alternating kernels). The complete ARAGTAP pre-screener has been implemented in SA-C and run on an AMS WildStar adaptive coprocessor with a single Xilinx 1000 FPGA.
We have also implemented some canonical image processing functions in SA-C and tested them on an AMS StarFire reconfigurable processor. The simplest such function is an edge detection program that calculates the square root of the sum of the squares of responses to horizontal and vertical Prewitt edge masks. Since this same task can be performed using the Intel Image Processing Library (IPL), we are able to compare our results to the results of a hand-optimized Pentium program.
A more sophisticated edge detector is the Canny operator. The Canny operator is a four step process of: 1) image smoothing, 2) computing edge magnitudes and (discretized) orientations, 3) non-maximal suppression in the gradient direction, and 4) hysteresis labeling. The last of these steps is a connected components algorithm that is beyond the current abilities of our compiler. Therefore, we have implemented the first three steps in SA-C, under the assumption that the last step will be performed on the host.
Wavelets are commonly used for multi-scale analysis in computer vision, as well as for image compression. Honeywell has defined a set of benchmarks for reconfigurable computing systems, including a wavelet-based image compression algorithm. We have translated their wavelet program into SA-C, generalized to operate on any size image.
Current FPGAs are I/O limited: they can perform hundreds of operations in parallel, but can only read or write a few bytes at a time. As a result, very simple image operations are often faster on traditional processors than on FPGAs, since they are I/O bound. For example, it is faster to add two images on a Pentium than a Xilinx XV-1000 FPGA (at least using our system). The benefits of reconfigurable computing appear when you consider larger programs, such as those above.
Nonetheless, we thought it was important to implement a standard library of image processing algorithms to test the expressiveness of SA-C. The image processing component of the Vector, Signal and Image Processing Library (VSIPL) is a standard interface for image processing routines on parallel coprocessors. It is being proposed by a consortium of government, universities and industry to reduce the problem of proliferating APIs for special-purpose hardware. To make sure that the SA-C language was general enough to support this community, we implemented the VSIPL IP library in SA-C. (Using the November 2000 version of the SA-C compiler, all SA-C VSIPL routines compile to the FPGAs except for the histogram routines, since histograms had not yet been implemented as a VHDL component.)
Unfortunately, the VSIPL IP library gives us no basis for performance comparisons, since their is no standard VSIPL IP implementation to compare to. We have also therefore implemented a subset of Intel's Image Processing Library (IPL). Since the IPL is hand-optimized (by Intel) to maximize performance on Pentium processors, it provides a "state-of-the-art" for comparison. Not surprisingly, Pentiums outperform the FPGAs on individual IPL operators, although the FPGAs outperform the Pentiums on sequences of IPL operations.
More on the SA-C VSIPL Implementation...
More on the SA-C Implementation of the Intel IPL...
(Logo of VSIPL)
After hearing about SA-C, researchers from CSU's Department of Atmospheric Sciences asked us if we could develop a program for solving diagonal systems of equations on FPGAs. We said sure. They provided an iterative algorithms, which we translated into SA-C.....