To test the performance of SA-C programs on simple image processing procedures, we implemented 32 routines to exactly match the API of routines from Intel's Image Processing Library (IPL). These routines were verified by comparing their results to output from Intel's routines; the results match exactly, including the use of saturating arithmetic and (for some routines) rounding remainders of .5 up or down depending on the column number.
The comparisons were made by compiling SA-C routines with the November 2000 version of the SA-C compiler and executing them on an Annapolis Microsystems StarFire with an Xilinx XV-1000 FPGA. The Intel IPL routines were executed under WindowsNT on a 450MHz Pentium II . (We believe the XV-1000 & 450MHz Pentium are of approximately the same age.) The test images were 8-bit 512x512 images.
Execution times are reported in seconds, as are data upload and download times for the RCS. FPGA clock frequencies are reported in MHz.
Routine
|
Pentium Exec.
|
RCS Exec.
|
RCS data download
|
RCS data upload
|
Frequency (MHz)
|
AddS | 0.081531 | 0.008355 | 0.02221 | 0.03292 | 39.5 |
And | 0.003179 | 0.008492 | 0.04418 | 0.03298 | 38.9 |
AndS | 0.001865 | 0.008331 | 0.02222 | 0.03275 | 39.6 |
Close | 0.018069 | 0.012800 | 0.02337 | 0.03308 | 25.0 |
Convolve2D | 0.006548 | 0.006624 | 0.02341 | 0.03385 | 25.1 |
Dilate | 0.011578 | 0.028910 | 0.02376 | 0.03385 | 25.0 |
Erode | 0.016764 | 0.028910 | 0.02375 | 0.03385 | 25.0 |
Gaussian3x3 | 0.005670 | 0.006637 | 0.03386 | 0.03390 | 25.1 |
Greater | 0.000109 | 0.008461 | 0.04430 | 0.00413 | 39.0 |
GreaterS | 0.011431 | 0.009563 | 0.02220 | 0.02238 | 34.5 |
LShiftS | 0.001469 | 0.008537 | 0.02239 | 0.02256 | 38.7 |
Less | 0.000074 | 0.008438 | 0.04434 | 0.00413 | 39.1 |
LessS | 0.011567 | 0.008179 | 0.02176 | 0.00409 | 40.4 |
MaxFilter | 0.005189 | 0.021259 | 0.02304 | 0.03341 | 28.2 |
MinFilter | 0.005328 | 0.021755 | 0.02304 | 0.03342 | 27.5 |
Multiply | 0.003541 | 0.009055 | 0.04322 | 0.03306 | 36.4 |
MultiplyS | 0.039078 | 0.008707 | 0.02228 | 0.03291 | 37.9 |
MultiplySScale | 0.002057 | 0.009470 | 0.02224 | 0.03293 | 34.9 |
MultiplyScale | 0.003659 | 0.011854 | 0.04510 | 0.03336 | 27.8 |
NormC | 0.001053 | 0.010593 | 0.00008 | 31.0 | |
NormL1 | 0.002023 | 0.009099 | 0.00008 | 36.0 | |
Not | 0.001883 | 0.008530 | 0.02227 | 0.03291 | 38.7 |
Open | 0.017859 | 0.034167 | 0.02386 | 0.03380 | 25.0 |
Or | 0.003093 | 0.008492 | 0.04490 | 0.03289 | 38.6 |
OrS | 0.001976 | 0.008331 | 0.02220 | 0.03272 | 39.6 |
RShiftS | 0.001359 | 0.008537 | 0.02233 | 0.03302 | 38.7 |
Square | 0.045160 | 0.008530 | 0.02613 | 0.03297 | 38.7 |
Subtract | 0.030182 | 0.007858 | 0.04330 | 0.03248 | 42.0 |
SubtractS | 0.001854 | 0.008546 | 0.02226 | 0.03293 | 38.6 |
Threshold | 0.001251 | 0.007806 | 0.02223 | 0.03703 | 42.0 |
Xor | 0.002554 | 0.009390 | 0.04451 | 0.03302 | 35.1 |
XorS | 0.001367 | 0.008331 | 0.02183 | 0.03272 | 39.6 |
As you would expect, the Pentium II outperforms the reconfigurable system on simple image processing operators. This is because these tasks are I/O bound, and the I/O paths on the FPGAs are no wider than on the Pentium while operating at a slower clock speed. As a result, the FPGA is unable to exploit its advantage in terms of parallelism. (See ARAGTAP for an example of the reconfigurable system outperforming the Pentium on more complex tasks.)
Readers should also note the other numbers provided here: the time required to download the source image(s) to the RCS, and the time required to upload the output back to the host. These times are artifacts of the FPGAs being on a seperate co-processor board, while the Pentium is the main processor. Future reconfigurable systems may have the FPGAs and risk processors on the same chip, in which case these transfer times go away. In the meantime, it takes about 0.02 seconds to download a 512x512 8-bit image across our PCI bus; operators that take two images as arguments have twice that download time. Upload times depend on whether the result is an image or a single value, and if it is an image whether it is binary, 8-bit, or more.