DSP bootloader C5000

Aguilera

DSP bootloader C5000 [Copy link]

Before use, please be aware of the following points:

The function process is manually assembled to fully utilize the device efficiency. At the same time, TI provides C and linear assembly code to the outside world.

For some special personal applications, DSPLIB may cause additional cycle consumption.

TI DSPLIB will change depending on the platform and time. Please refer to the manual for specific instructions and use it with caution.

Precautions for use

1. Almost all array accesses require word/double-word alignment. Double-word alignment is recommended.

2. TI gives the cycle consumption of each library function, assuming that all code and data accesses occur in the L1 cache. If storage access occurs in L2/off-chip memory, the actual number of cycles consumed will increase.

3. Pay attention to compressing the input parameters of some functions to prevent overflow (such as FIR and FFT calculation functions).

4. Library function processes are divided into three types: fully-interruptible, partially-interruptible, and non-interruptible. However, all library functions can be used in systems with interrupts, and there is no need to disable interrupts before calling them. Interrupts will be disabled as needed within the function. Interrupts can also occur at any time during function processing, but the interrupt type of the function determines how long the interrupt handler will be delayed by the library function.

How to use

1. Include rts6400.lib (optional) and dsp64x.lib.

2. Include the corresponding header file (different library functions correspond to different header files).

Library Overview

1. Adaptive Filtering

LMS adaptive filter, each call calculates one data point

long DSP_firlms2(short *h, short *x, short b, int nh)

h[nh]: filter coefficients, updated internally after calling

x[nh+1]: the first nh data plus a new data

b: Error term of the previous filtering calculation

nh: number of filter coefficients, 4N

2. AutoCorrelation

void DSP_autocor(short *r, short *x, int nx, int nr)

x[nx+nr]: The first nx data plus nr new data, double word aligned

r[nr]: nr autocorrelation outputs

nx: correlator length, 8N

nr: related slippage points, 4N

3. Fast Fourier Transform (FFT)

The rotation factors all need to call a special generation function

16*16 complex FFT, real and imaginary parts are stored alternately, real part is in even subscript, imaginary part is in odd subscript

void DSP_fft16x16(short *w, int nx, short *x, short *y)

w[2nx]: complex twiddle factor, Q15, double word aligned

x[2nx]: complex input, double word aligned

y[2nx]: complex output, double word aligned

nx: number of FFT points, 2^N, 16 ≤ nx ≤ 65536

The difference from DSP_fft16x16() is that the imaginary part and the real part are swapped and stored in even subscripts and odd subscripts.

void DSP_fft16x16_imre(short *w, int nx, short *x, short *y)

16*16 complex pre-mixed radix FFT with rounding. Used to compute a sub-FFT of a mixed radix main FFT. The real and imaginary parts are interleaved, with the real part in even subscripts and the imaginary part in odd subscripts.

void DSP_fft16x16r(int nx, short *x, short *w, short *y, int radix, int offset, int nmax)

w[2nx]: complex twiddle factor, Q15, double word aligned

x[2nx]: complex input, double word aligned, needs to be compressed 2^(log2(nx)?ceil[log4(nx)?1]) to prevent overflow

y[2nx]: complex output, double word aligned

nx: number of FFT points, 2^N, 16 ≤ nx ≤ 65536

radix: the radix for decomposing the FFT into sub-FFTs

offset: The complex subscript of the sub-FFT relative to the start of the main FFT

nmax: number of complex samples of the main FFT

16*32 complex FFT, real and imaginary parts are stored alternately, real part is in even subscript, imaginary part is in odd subscript

void DSP_fft16x32(short *w, int nx, int *x, int *y)

w[2nx]: complex twiddle factor, Q15, double word aligned

x[2nx]: 32-bit complex input, double-word aligned, needs to be compressed 2^(log2(nx)?ceil[log4(nx)?1]) to prevent overflow

y[2nx]: 32-bit complex output, double-word aligned

nx: number of FFT points, 2^N, 16 ≤ nx ≤ 65536

32*32 complex FFT, real and imaginary parts are interleaved, real part is in even subscript, imaginary part is in odd subscript

void DSP_fft32x32(short *w, int nx, int *x, int *y)

w[2nx]: complex rotation factor, Q31, double word aligned, scale factor = 2147483647.5

x[2nx]: 32-bit complex input, double-word aligned, needs to be compressed 2^log2(nx) to prevent overflow

y[2nx]: 32-bit complex output, double-word aligned

nx: number of FFT points, 2^N, 16 ≤ nx ≤ 65536

32*32 complex FFT with compression, real and imaginary parts are interleaved, real part is in even subscript, imaginary part is in odd subscript

void DSP_fft32x32s(short *w, int nx, int *x, int *y)

w[2nx]: complex rotation factor, Q31, double word aligned, scale factor = 1073741823.5

x[2nx]: 32-bit complex input, double-word aligned, needs to be compressed 2^(log2(nx)?ceil[log4(nx)?1]) to prevent overflow

y[2nx]: 32-bit complex output, double-word aligned

nx: number of FFT points, 2^N, 16 ≤ nx ≤ 65536

16*16 complex inverse FFT, similar to DSP_fft16x16(). Use DSP_fft16x16() to conjugate the input and then conjugate the output to get the same effect as DSP_ifft16x16()

void DSP_ifft16x16(short *w, int nx, short *x, short *y)

16*32 complex inverse FFT, similar to DSP_fft16x32(), input x needs to be compressed 2^log2(nx) to prevent overflow

void DSP_ifft16x32(short *w, int nx, short *x, short *y)

32*32 complex inverse FFT, similar to DSP_fft32x32()

void DSP_ifft32x32(short *w, int nx, short *x, short *y)

4. Filtering and Convolution

Complex FIR Filter

void DSP_fir_cplx (short *x, short *h, short *r, int nh, int nr)

x[2*(nr+nh-1)]: complex input, the first 2*(nh-1) data plus the new 2nr data

h[2nh]: filter coefficient, complex number

r[2nr]: complex number output, 32 bits are used internally to store temporary results, and the output is shifted right by 15 bits

nh: number of coefficients, 2N

nr: number of output samples, 4N

Complex FIR filter, the difference from DSP_fir_cplx() is that nh satisfies 4N

void DSP_fir_cplx _hM4X4( (short *x, short *h, short *r, int nh, int nr)

FIR Filters

void DSP_fir_gen (short *x, short *h, short *r, int nh, int nr)

x[nr+nh-1]: input, the first nh-1 data plus the new nr data

h[nh]: filter coefficient

r[nr]: Output, 32 bits are used internally to store temporary results, and the output is shifted right by 15 bits

nh: number of coefficients, nh ≥ 5

nr: number of output samples, 4N

FIR filter, the difference from DSP_fir_gen() is that nr satisfies 8N

void DSP_fir_gen_hM17_rA8X8 (short *x, short *h, short *r, int nh, int nr)

FIR filter, the difference from DSP_fir_gen() is that nh satisfies 4N, and nh ≥ 8

void DSP_fir_r4 (short *x, short *h, short *r, int nh, int nr)

FIR filter, the difference from DSP_fir_gen() is that nh satisfies 8N

void DSP_fir_r8 (short *x, short *h, short *r, int nh, int nr)

FIR filter, the difference from DSP_fir_gen() is that nh satisfies 8N, and nh ≥ 16; nr satisfies 8N

void DSP_fir_r8_hM16_rM8A8X8 (short *x, short *h, short *r, int nh, int nr)

FIR filter, only half of the original filter coefficients are required (due to symmetry)

void DSP_fir_sym (short *x, short *h, short *r, int nh, int nr, int s)

x[nr+2nh]: input, the first 2nh data plus the new nr data

h[nh+1]: filter coefficient, half of the original filter coefficient

r[nr]: Output, uses 32 bits to store temporary results internally, and shifts right by s bits when outputting

nh: number of coefficients, the original number of coefficients is 2nh+1, 8N

nr: number of output samples, 4N

IIR filter, input single data, output single data, state vector b is updated internally, and the filtering result is returned

short DSP_iir(short x, short *h, int nh, short *b)

x: input data

h[nh]: filter coefficient, Q14

nh: number of coefficients 8N

b[nh]: state vector

All-pole IIR lattice filter, the filter is composed of nk-level lattice points

void DSP_iir_lat(short *x, int nx, short *k, int nk, int *b, short *r)

x[nx]: input data

k[nk]: reflection coefficient, Q15

b[nk+1]: Delay line data from the previous call, should be initialized to 0

r[nx]: output data

nx: input length

nk: number of reflection coefficients, 2N, and nk≥4

5. Math

Do a dot product of vector x and vector y, add the square of y to G and return it

int DSP_dotp_sqr(int G, short *x, short *y, int *r, int nx)

G: y^2 accumulated value

x[nx]: input data vector 1

y[nx]: input data vector 2

r[nx]: the dot product of x and y

nx: data length, 4N, and N ≥ 12

Returns the dot product of vector x and vector y

int DSP_dotprod(short *x, short *y, int nx)

x[nx]: input data vector 1

y[nx]: input data vector 2

nx: data length, 4N

Returns the maximum value of a vector

short DSP_maxval (short *x, int nx)

x[nx]: input data vector 1

nx: data length, 8N, and N ≥ 32

Returns the minimum value of a vector

short DSP_minval (short *x, int nx)

x[nx]: input data vector 1

nx: data length, 4N, and N ≥ 8

Returns the subscript corresponding to the maximum value of the vector

int DSP_maxidx (short *x, int nx)

x[nx]: input data vector 1

nx: data length, 16N, and N ≥ 32

32x32 data multiplication, output the high 32 bits of the product, the input should be scaled to Q31

void DSP_mul32(int *x, int *y, int *r, short nx)

x[nx]: input data vector 1

y[nx]: input data vector 2

r[nx]: the product of x and y

nx: data length, 8N, and N ≥ 16

Vector Negation

void DSP_neg32(int *x, int * r, short nx)

x[nx]: input data vector

r[nx]: output data vector, *r = -*x

nx: data length, 4N, and N ≥ 8

Returns the decimal and exponential parts of the reciprocal of a number (floating point representation)

void DSP_recip16(short *x, short *rfrac, short *rexp, short nx)

x[nx]: input data vector 1

rfrac[nx]: Output fractional value

rexp[nx]: output exponential value

nx: data length

Returns the sum of the squares of the vector data

int DSP_vecsumsq (short *x, int nx)

x[nx]: input data vector

nx: data length, 4N, and N ≥ 8

Weighted sum: *r = m*(*x)>>15 + *y

void DSP_w_vec(short *x, short *y, short m, short *r, short nx)

x[nx]: weighted data vector 1

y[nx]: input data vector 2

r[nx]: output data vector

nx: data length, 8N, and N ≥ 8

4.6 Matrix

Matrix multiplication: r[r1*c2] = x[r1*c1] * y[c1*c2]

void DSP_mat_mul(short *x, int r1, int c1, short *y, int c2, short *r, int qs)

x[r1*c1]: input matrix x

r1: the number of rows in matrix x, 1~32767

c1: the number of columns of matrix x/the number of rows of matrix y, 1~32767

y[c1*c2]: input matrix y

c2: the number of columns of matrix y, 1~32767

r[r1*c2]: output matrix.

qs: number of right shifts of the element result

Matrix transpose (interchange rows and columns)

void DSP_mat_trans(short *x, short rows, short columns, short *r)

x[rows*columns]: input matrix

rows: the number of rows in the matrix, 4N

columns: the number of columns in the matrix, 4N

r[columns*rows]: output matrix

4.7 Others

Calculates the minimum invalid bit of a vector element, which can be used to find the scaling factor of a data block.

short DSP_bexp(const int *x, short nx)

x[nx]: input data vector

nx: data length, 8N

Data termination mode replacement (big<->little), 16bit

void blk_eswap16(void * restrict x, void * restrict r, int nx)

x[nx]: input data vector

r[nx]: output data vector. If the pointer is empty, the data is returned to x.

nx: data length, 8N, and N ≥ 8

Same as blk_eswap16(), 32 bits

void blk_eswap32(void * restrict x, void * restrict r, int nx)

Same as blk_eswap16(), 64bit

void blk_eswap64(void * restrict x, void * restrict r, int nx)

Data block migration

void DSP_blk_move(short * restrict x, short * restrict r, int nx)

x[nx]: input data vector

r[nx]: target data vector

nx: data length, 8N, and N ≥ 32

Convert IEEE floating point numbers to Q15 fixed point numbers

void DSP_fltoq15(float *x, short *r, short nx)

x[nx]: input floating point vector, [-1,1)

r[nx]: output fixed-point vector, Q15

nx: data length, 2N

Q15 fixed point number converted to IEEE floating point number

void DSP_q15tofl(short * restrict x, float * restrict r, short nx)

x[nx]: input fixed-point vector, Q15

r[nx]: output floating point vector

nx: data length, 2N