TMS320C6000 Assembly Benchmarks at Texas Instruments

>> Semiconductor Home > Products > Digital Signal Processors > DSP Overview > TMS320C6000 Platform Overview >

   TMS320C6000™ Highest Performance DSP Platform

> Platform Summary
> VelociTI™ Architecture
> Applications
> Development Tools
> Technical Documentation
   Search
> Platform Benchmarks
   > C62x DSPs
   > C64x DSPs
   > C67x DSPs
   > C6000 Compiler
      Benchmarks

> C62x™ Fixed-Point DSPs
> C67x™ Floating-Point DSPs

> C6000 Compiler
> MultiChannel Vocoder
   Technology Design Kit
> Foundation Software
> Training
> DSP References

Click here to view C6000 roadmap

C64x™ DSP Benchmarks
         Filters
         Vector
         FFTs
         Search
         Math
         Imaging
         Telecom

FILTERS

Benchmark Description Formula

General FIR Implements a general purpose 16-bit FIR filter. Requires at least 5 filter taps and generates a minimum of one output sample. cycles = (8 + nh') * nr'/4 + 19
For nh = 32 and nr = 100, cycles = 1100.

Horizontal Scaling Scales a set of 'in_len' samples to a set of 'out_len' samples using a polyphase filter bank. The number of filters is given by 'n_hh' and the length of the filters is given by 'l_hh'. The scale factor is given by ratio 'out_len / in_len'. cycles = 0.5 * out_len * l_hh * (1+k) + 30.
If (l_hh % 8) == 0 then k = 1/(4*l_hh) else k = 0.
For l_hh = 16, in_len = 1024, and out_len = 1366, cycles = 11129.
For l_hh = 8, in_len = 640, and out_len = 120, cycles = 525.

Vertical Scaling Calculates a single row of output from multiple rows of input for the vertical pass of a scaling filter, using a polyphase filter bank. Each row of input and output is 'cols' pixels long. The polyphase filter bank contains 'n_hh' filters that are 'l_hh' taps long. cycles = (0.33 * l_hh * cols) + 6 * l_hh + 32
For cols = 800 and l_hh = 4, cycles = 1056.

Return to top

Benchmark	Description	Formula
General FIR	Implements a general purpose 16-bit FIR filter. Requires at least 5 filter taps and generates a minimum of one output sample.	cycles = (8 + nh') * nr'/4 + 19 For nh = 32 and nr = 100, cycles = 1100.
Horizontal Scaling	Scales a set of 'in_len' samples to a set of 'out_len' samples using a polyphase filter bank. The number of filters is given by 'n_hh' and the length of the filters is given by 'l_hh'. The scale factor is given by ratio 'out_len / in_len'.	cycles = 0.5 * out_len * l_hh * (1+k) + 30. If (l_hh % 8) == 0 then k = 1/(4*l_hh) else k = 0. For l_hh = 16, in_len = 1024, and out_len = 1366, cycles = 11129. For l_hh = 8, in_len = 640, and out_len = 120, cycles = 525.
Vertical Scaling	Calculates a single row of output from multiple rows of input for the vertical pass of a scaling filter, using a polyphase filter bank. Each row of input and output is 'cols' pixels long. The polyphase filter bank contains 'n_hh' filters that are 'l_hh' taps long.	cycles = (0.33 * l_hh * cols) + 6 * l_hh + 32 For cols = 800 and l_hh = 4, cycles = 1056.

VECTOR

Benchmark Description Formula

Dot Product Performs the dot product of two arrays by multiplying the individual elements of two arrays and summing them up to return the result. cycles = count/4 + 16
For count = 720, cycles = 196

Return to top

Benchmark	Description	Formula
Dot Product	Performs the dot product of two arrays by multiplying the individual elements of two arrays and summing them up to return the result.	cycles = count/4 + 16 For count = 720, cycles = 196

FFTs

Benchmark Description Formula

Complex Radix 4 FFT Performs a Complex Radix 4 FFT over N points. N must be a power of 4. cycles = 1.25*nsamp*log₄(nsamp) - 0.5*nsamp + 23*log₄(nsamp) - 1.

For N = 64, cycles = 276.
For N = 256, cycles = 1243.
For N = 1024, cycles = 6002.

Return to top

Benchmark	Description	Formula
Complex Radix 4 FFT	Performs a Complex Radix 4 FFT over N points. N must be a power of 4.	cycles = 1.25nsamplog₄(nsamp) - 0.5nsamp + 23log₄(nsamp) - 1. For N = 64, cycles = 276. For N = 256, cycles = 1243. For N = 1024, cycles = 6002.

IMAGING

Benchmark Description Formula

3x3 Correlation Performs the 3x3 Correlation of an input image with a 3x3 mask, that is typically part of the image. Correlation sum is rounded and shifted to produce an eight bit result. cycles = (cols - 2) * 1.25 + 20
For cols = 720, cycles = 925.

8x8 Block Motion Estimation Locates the position in a reference image which most closely matches an 8x8 block from a source image, using the Minimum Absolute Difference metric. Searches over a range that is 'H' pixels wide and 'V' pixels tall within a reference image that is 'pitch' pixels wide. cycles = H * V * 8 + 54
For H = 64 and V = 32, cycles = 16438.

3x3 Median Filter Performs a 3x3 median filter operation for a row of image data. Each output point is the median value selected from the 9 points under the filter kernel. cycles = cols * 2 + 32
For cols = 256, cycles = 544

Horizontal Wavelet Performs a 1D Periodic Orthogonal Wavelet decomposition. An input signal x[n] is low pass and high pass filtered and decimated by two to produce a reference signal r1[n] and d[n] respectively. cycles = (cols * 2) + 25
For cols = 256, cycles = 537.
For cols = 512, cycles = 1049.

Vertical Wavelet Performs the vertical pass of 2D wavelet transform, on 8 rows to produce two lines worth of output, one being the low-pass and the other being the high pass result. Each row of input and output is of length 'cols'. cycles = 4*cols + 96
For cols = 256, cycles = 1120.
For cols = 512, cycles = 2144.

8x8 Block IDCT, IEEE-1180 Compliant. Performs an 2-D Inverse Discrete Cosine Transform on 'num_idcts' 8x8 blocks. Input data is expected in 12Q4 format. cycles = 62 + 92 * num_idcts
For num_idcts = 6, cycles = 614.
For num_idcts = 24, cycles = 2270.

Quantize Quantizes blocks of values by multiplying their contents with a second blocks of values that contains reciprocals of the quantization terms. This step corresponds to the quantization that is performed in 2-D DCT-based compression techniques. cycles = 26 + (blk_size/16) * num_blks * 8
For blk_size = 64 and num_blks = 6, cycles = 218.
For blk_size = 256 and num_blks = 24, cycles = 3098.

Return to top

Benchmark	Description	Formula
3x3 Correlation	Performs the 3x3 Correlation of an input image with a 3x3 mask, that is typically part of the image. Correlation sum is rounded and shifted to produce an eight bit result.	cycles = (cols - 2) * 1.25 + 20 For cols = 720, cycles = 925.
8x8 Block Motion Estimation	Locates the position in a reference image which most closely matches an 8x8 block from a source image, using the Minimum Absolute Difference metric. Searches over a range that is 'H' pixels wide and 'V' pixels tall within a reference image that is 'pitch' pixels wide.	cycles = H * V * 8 + 54 For H = 64 and V = 32, cycles = 16438.
3x3 Median Filter	Performs a 3x3 median filter operation for a row of image data. Each output point is the median value selected from the 9 points under the filter kernel.	cycles = cols * 2 + 32 For cols = 256, cycles = 544
Horizontal Wavelet	Performs a 1D Periodic Orthogonal Wavelet decomposition. An input signal x[n] is low pass and high pass filtered and decimated by two to produce a reference signal r1[n] and d[n] respectively.	cycles = (cols * 2) + 25 For cols = 256, cycles = 537. For cols = 512, cycles = 1049.
Vertical Wavelet	Performs the vertical pass of 2D wavelet transform, on 8 rows to produce two lines worth of output, one being the low-pass and the other being the high pass result. Each row of input and output is of length 'cols'.	cycles = 4*cols + 96 For cols = 256, cycles = 1120. For cols = 512, cycles = 2144.
8x8 Block IDCT, IEEE-1180 Compliant.	Performs an 2-D Inverse Discrete Cosine Transform on 'num_idcts' 8x8 blocks. Input data is expected in 12Q4 format.	cycles = 62 + 92 * num_idcts For num_idcts = 6, cycles = 614. For num_idcts = 24, cycles = 2270.
Quantize	Quantizes blocks of values by multiplying their contents with a second blocks of values that contains reciprocals of the quantization terms. This step corresponds to the quantization that is performed in 2-D DCT-based compression techniques.	cycles = 26 + (blk_size/16) * num_blks * 8 For blk_size = 64 and num_blks = 6, cycles = 218. For blk_size = 256 and num_blks = 24, cycles = 3098.

TELECOM

Benchmark Description Formula

Viterbi GSM Performs the Viterbi Decode for Full-Rate GSM, including traceback. cycles = 14 * n + 33
For n = 189, cycles = 1377.

REED SOLOMON: Syndrome Accumulate Calculates the syndrome values for the received codeword array. These are used to detect errors in the incoming data. This is the first stage of the Reed Solomon decoder. cycles = (T/4)*N + 32

REED SOLOMON: Berlekamp Massey Solves the error locator polynomial equation, Lambda * S = 0, using the Berlekamp-Massey algorithm. (The * denotes convolution.) cycles = 30*T + 6

REED SOLOMON: Chien Search Uses the Chien Search algorithm to locate the zeros for an error polynomial in a Reed-Solomon decoder. cycles = 263 (for GF256)

REED SOLOMON: Forney Corrects errors in the received codeword array in-place. It uses the inputs generated by Syndrome Accumulate, Berlekamp-Massey, and Chien Search to perform the correction. cycles = 12*T + 50

Return to top

		TMS320C6000™ Highest Performance DSP Platform
> Platform Summary > VelociTI™ Architecture > Applications > Development Tools > Technical Documentation Search > Platform Benchmarks > C62x DSPs > C64x DSPs > C67x DSPs > C6000 Compiler Benchmarks > C62x™ Fixed-Point DSPs > C67x™ Floating-Point DSPs > C6000 Compiler > MultiChannel Vocoder Technology Design Kit > Foundation Software > Training > DSP References Click here to view C6000 roadmap		C64x™ DSP Benchmarks Filters Vector FFTs Search Math Imaging Telecom

Benchmark	Description	Formula
Viterbi GSM	Performs the Viterbi Decode for Full-Rate GSM, including traceback.	cycles = 14 * n + 33 For n = 189, cycles = 1377.
REED SOLOMON: Syndrome Accumulate	Calculates the syndrome values for the received codeword array. These are used to detect errors in the incoming data. This is the first stage of the Reed Solomon decoder.	cycles = (T/4)*N + 32
REED SOLOMON: Berlekamp Massey	Solves the error locator polynomial equation, Lambda * S = 0, using the Berlekamp-Massey algorithm. (The * denotes convolution.)	cycles = 30*T + 6
REED SOLOMON: Chien Search	Uses the Chien Search algorithm to locate the zeros for an error polynomial in a Reed-Solomon decoder.	cycles = 263 (for GF256)
REED SOLOMON: Forney	Corrects errors in the received codeword array in-place. It uses the inputs generated by Syndrome Accumulate, Berlekamp-Massey, and Chien Search to perform the correction.	cycles = 12*T + 50