TMS320C6000 Highest Performance DSP Platform | ||
> Platform Summary | C67x Floating-Point Benchmarks Filters Vector FFTs Search Math 3D-Graphics and Imaging |
Benchmark | Description | Formula |
---|---|---|
Block FIR | The FIR assumes that the number of filter coefficients (numH) is a multiple of 2 and greater than or equal to 4 and the number of outputs (numY) is a multiple of 4 and greater than or equal to 4. The input, output, and coefficient arrays must start on the same double-word boundary to avoid memory bank hits. | ((2*numH)+10)*(numY/4)+8
For numH=64 and numY=64 2216 cycles or 13.296 µsec |
Block IIR | The IIR assumes that the order is a multiple of 2 and greater than or equal to 4, and the number of outputs (numY) is a multiple of 2 and greater than or equal to order+2. To avoid bank hits, the input and output arrays must be aligned on opposite double-word boundaries, and the a and b coefficient arrays must be aligned on opposite double-word boundaries. | (order+10)*(numY-order)+15
For order=16 and numY=64 1263 cycles or 7.578 µsec |
Cascaded IIR Biquads | The Biquad assumes that the number of biquads (numB) is a multiple of 2 and greater than or equal to 2, and it processes one input and produces one output. There are no memory bank hits regardless of where the arguments are placed in memory. | 4*(numB)+29
For numB=8 61 cycles or 366 nsec |
Circular Block FIR | The circular FIR assumes that the number of filter coefficients (hsize) is a multiple of 2 and greater than or equal to 4 and the number of outputs (ysize) is a multiple of 4 and greater than or equal to 4. The input, output, and coefficient arrays must start on the same double-word boundary to avoid memory bank hits. Circular addressing is used for the input array (x) with a circular buffer size 2^(size+1) and the routine uses "index" to define the initial offset into the buffer. | ((2*hsize)+10)*(ysize/4)+9
For hsize=64 and ysize=64 2217 cycles or 13.302 µsec |
Convolution | The convolution assumes that the output array length (nr) is a multiple of 4 and greater than or equal to 4, and the second input array length (nb) is a multiple of 2 and greater than or equal to 4. The first input array length should be (nr+nb-1) where the first nb-1 and last nb-1 values are zero. If all three arrays are aligned on the same double-word boundary and nb is not a multiple of 4 there will be no memory bank hits (if it is a multiple of 4 there will be nr/4 bank hits). | (nb/2)*nr+(nr/2)*5+8
For nb=8 and nr=20 138 cycles or 828 nsec |
Cross Correlation | The Correlation assumes that the output array length (nr) is a multiple of 4 and greater than or equal to 4, and the second input array length (nb) is a multiple of 2 and greater than or equal to 4. The first input array length should be (nr+nb-1) where the first nb-1 and last nb-1 values are zero. If all three arrays are aligned on the same double-word boundary and nb is not a multiple of 4 there will be no memory bank hits (if it is a multiple of 4 there will be nr/4 bank hits). | (nb/2)*nr+(nr/2)*5+8
For nb=8 and nr=20 138 cycles or 828 nsec |
Autocorrelation | Autocorrelation assumes that the correlation is length M, the output array is length M and the input array is length (M+N) where the first M values are zero. The value of N should be a multiple of 2 and greater than or equal to 4. The value of M should be a multiple of 4 and greater than or equal to 4. To prevent memory bank hits, the input array should be aligned on an even double-word boundary (bank 0), and the output array should be aligned on the next word boundary (bank 2). | (N/2)*M+(M/2)*5+9
For M=8 and N=18 101 cycles or 606 nsec |
LMS FIR Filter | The Least Mean Squares adaptive FIR filter assumes that the number of coefficients (numH) is a multiple of 4 and at least 4. The number of inputs must be equal to numH+numY-1, where numY is the number of outputs. | ((5*numH)/4+27)*numY+17 For numH=64 and numY=64 6865 cycles or 41.19 µsec |
Complex FIR Filter | The complex FIR filter assumes that the number of complex coefficients (numH) is a multiple of 2 and at least 4. The number of complex inputs must be equal to numH+numY-1, where numY is the number of complex outputs. | ((2*numH)+14)*numY+17+numY-1 For numH=64 and numY=64 9168 cycles or 55.008 µsec |
Inverse Analysis Lattice Filter | This routine implements an inverse analysis lattice filter (FIR filter or IIR filter with no poles) and stores the result in f. The filter consists of n stages. The value of f is calculated by doing a multiply accumulate on the backward error coefficients, b, and filter gains, k. New backward error coefficients are also calculated. | 4*n+22 For n=8 54 cycles or 324 nsec |
Forward Synthesis Lattice Filter | This routine implements a forward synthesis lattice filter (IIR filter with no zeros) and stores the result in f. The filter consists of n stages. The value of f is calculated by doing a multiply accumulate on the backward error coefficients, b, and filter gains, k. New backward error coefficients are also calculated. The value of n must be at least 4. | 4*n+24 For n=8 56 cycles or 336 nsec |
Benchmark | Description | Formula |
---|---|---|
dot product | The function performs the dot product of two vectors of length N where N is a multiple of 2 and greater than or equal to 10. No memory bank hits occur if the arrays are aligned on opposite double-word boundaries. | N/2 + 24
For N=100 74 cycles or 444 nsec |
Matrix-Vector Multiply (any size) | The function performs the multiplication of a n x m matrix by a m x 1 vector. The a and b arrays should be placed on opposite double-word boundaries to prevent memory bank hits. | (n+20)*m+1
For m=3 and n=3 70 cycles or 420 nsec |
Matrix-Vector Multiply (with even number of columns) | The function performs the multiplication of a n x m matrix by a m x 1 vector. The column dimension (m) must be greater than or equal to 2 and a multiple of 2. The a and b arrays should be placed on opposite double-word boundaries to prevent memory bank hits. | ((n/2)+24)*m+7
For m=3 and n=20 109 cycles or 654 nsec |
Weighted vector sum | The function performs an N element vector sum of two vectors with one vector weighted by a constant. The result is stored in a third vector. The value of N must be a multiple of 2 and greater than or equal to 12. To prevent bank hits, the two input vectors should be aligned on opposite double-word boundaries. | N+12
For N=100 112 cycles or 672 nsec |
Vector Sum | The function calculates the sum of two vectors of length N where N is a multiple of 2 and greater than or equal to 6. To avoid memory bank hits, the vectors should be aligned on opposite double-word boundaries. | N+8
For N=100 108 cycles or 648 nsec |
Sum of squares | The function calculates the sum of the squares of the N elements of the vector. The value N must be a multiple of 2 and greater than or equal to 12. This function performs extraneous loads. | N/2 + 24
For N=100 74 cycles or 444 nsec |
Benchmark | Description | Formula |
---|---|---|
Complex Radix 4 FFT | The function calculates the complex Radix 4 DIF FFT of size N with digit-reversed output and normal order input. | (log4(N))*(14*N/4+23)+20
For N=1024 18,055 cycles or 108.33 µsec |
Complex Radix 2 FFT | The function calculates the complex Radix 2 DIT FFT of size N with bit-reversed output, and coefficients, and normal order input. | ((2*N)+23)*log2 (N)+6 For N=1024 20,716 cycles or 124.30 µsec |
Inverse Complex Radix 2 FFT | The function calculates the inverse complex Radix 2 DIF FFT of size N with bit-reversed input, normal order output, and bit-reversed coefficients.. | ((2*N)+16)*log2(N)+25
For N=1024 20,665 cycles or 124 µsec |
Complex Bit-Reverse | The function performs the bit-reversal for an array of N complex SP floating point numbers. N must be a power of 2. | (N/4)*11+9
For N=1024 2,825 cycles or 16.95 µsec |
Two-level-cache efficient mixed-radix forward FFT | The function performs a mixed radix forward FFT for floating point input and coefficient data using a special sequence of coefficients. This FFT uses a redundant sequence of twiddle factors to allow a linear access through the data. | 3.25 * ceil(log4(N) -1) * N + 3*N + 179 for N = 1024, cycles = 16,563 |
Benchmark | Description | Formula |
---|---|---|
Vector Max | The function finds the maximum value in a vector of length N where N is a multiple of 5 and greater than or equal to 10. No memory bank hits occur regardless of where arguments are in memory. | 3*N/5+14
For N=100 74 cycles or 444 nsec |
Benchmark | Description | Formula |
---|---|---|
Single Precision Floating Point Reciprocal | The function performs the reciprocal using the RCPSP instruction and 2 iterations of the Newton-Rhapson algorithm to produce 23 bits of accuracy. 8 bits of accuracy can be achieved by simply using the RCPSP instruction by itself. 16 bits of accuracy is achieved with only one Newton-Rhapson iteration. | 28 cycles |
Double Precision Floating Point Reciprocal | The function performs the reciprocal using the RCPDP instruction and 2 iterations of the Newton-Rhapson algorithm. | 84 cycles |
Single Precision Floating Point Reciprocal Square Root | The function performs the reciprocal using the RCPSP instruction and 2 iterations of the Newton-Rhapson algorithm to produce 23 bits of accuracy. 8 bits of accuracy can be achieved by simply using the RCPSP instruction by itself. 16 bits of accuracy is achieved with only one Newton-Rhapson iteration. | 34 cycles |
Double Precision Floating Point Reciprocal Square Root | This function performs the DP square root reciprocal using the RSQRDP instruction and 3 iterations of the Newton-Rhapson algorithm. | 113 cycles |
Benchmark | Description | Formula |
---|---|---|
3D Geometry Transformation | This function performs the "front end" of a 3D graphics transformation pipeline. It performs geometry transformation, clipping preprocessing, perspective projection, and viewpoint mapping. | Approx 10.4M vertices/second |
Collision Detection | This function takes a vector of 3D points and translates them in one dimension. The 1D distance from the translated point to the parameter "point" is calculated. If the distance is less than the parameter "distance", a collision is detected and the address of point is returned. There are no memory bank hits regardless of where the function parameters are placed in memory; but, the function performs extraneous loads. | (N/2)*3+32 (worst case)
For N=10,000 15,032 cycles or 90.192 µsec |