Before use, please be aware of the following points:
The function process is manually assembled to fully utilize the device efficiency. At the same time, TI provides C and linear assembly code to the outside world.
For some special personal applications, DSPLIB may cause additional cycle consumption.
TI DSPLIB will change depending on the platform and time. Please refer to the manual for specific instructions and use it with caution.
Precautions for use
1. Almost all array accesses require word/double-word alignment. Double-word alignment is recommended.
2. TI gives the cycle consumption of each library function, assuming that all code and data accesses occur in the L1 cache. If storage access occurs in L2/off-chip memory, the actual number of cycles consumed will increase.
3. Pay attention to compressing the input parameters of some functions to prevent overflow (such as FIR and FFT calculation functions).
4. Library function processes are divided into three types: fully-interruptible, partially-interruptible, and non-interruptible. However, all library functions can be used in systems with interrupts, and there is no need to disable interrupts before calling them. Interrupts will be disabled as needed within the function. Interrupts can also occur at any time during function processing, but the interrupt type of the function determines how long the interrupt handler will be delayed by the library function.
How to use
1. Include rts6400.lib (optional) and dsp64x.lib.
2. Include the corresponding header file (different library functions correspond to different header files).
Library Overview
1. Adaptive Filtering
LMS adaptive filter, each call calculates one data point
long DSP_firlms2(short *h, short *x, short b, int nh)
h[nh]: filter coefficients, updated internally after calling
x[nh+1]: the first nh data plus a new data
b: Error term of the previous filtering calculation
nh: number of filter coefficients, 4N
2. AutoCorrelation
void DSP_autocor(short *r, short *x, int nx, int nr)
x[nx+nr]: The first nx data plus nr new data, double word aligned
r[nr]: nr autocorrelation outputs
nx: correlator length, 8N
nr: related slippage points, 4N
3. Fast Fourier Transform (FFT)
The rotation factors all need to call a special generation function
16*16 complex FFT, real and imaginary parts are stored alternately, real part is in even subscript, imaginary part is in odd subscript
void DSP_fft16x16(short *w, int nx, short *x, short *y)
w[2nx]: complex twiddle factor, Q15, double word aligned
x[2nx]: complex input, double word aligned
y[2nx]: complex output, double word aligned
nx: number of FFT points, 2^N, 16 ≤ nx ≤ 65536
The difference from DSP_fft16x16() is that the imaginary part and the real part are swapped and stored in even subscripts and odd subscripts.
void DSP_fft16x16_imre(short *w, int nx, short *x, short *y)
16*16 complex pre-mixed radix FFT with rounding. Used to compute a sub-FFT of a mixed radix main FFT. The real and imaginary parts are interleaved, with the real part in even subscripts and the imaginary part in odd subscripts.
void DSP_fft16x16r(int nx, short *x, short *w, short *y, int radix, int offset, int nmax)
w[2nx]: complex twiddle factor, Q15, double word aligned
x[2nx]: complex input, double word aligned, needs to be compressed 2^(log2(nx)?ceil[log4(nx)?1]) to prevent overflow
y[2nx]: complex output, double word aligned
nx: number of FFT points, 2^N, 16 ≤ nx ≤ 65536
radix: the radix for decomposing the FFT into sub-FFTs
offset: The complex subscript of the sub-FFT relative to the start of the main FFT
nmax: number of complex samples of the main FFT
16*32 complex FFT, real and imaginary parts are stored alternately, real part is in even subscript, imaginary part is in odd subscript
void DSP_fft16x32(short *w, int nx, int *x, int *y)
w[2nx]: complex twiddle factor, Q15, double word aligned
x[2nx]: 32-bit complex input, double-word aligned, needs to be compressed 2^(log2(nx)?ceil[log4(nx)?1]) to prevent overflow
y[2nx]: 32-bit complex output, double-word aligned
nx: number of FFT points, 2^N, 16 ≤ nx ≤ 65536
32*32 complex FFT, real and imaginary parts are interleaved, real part is in even subscript, imaginary part is in odd subscript
void DSP_fft32x32(short *w, int nx, int *x, int *y)
w[2nx]: complex rotation factor, Q31, double word aligned, scale factor = 2147483647.5
x[2nx]: 32-bit complex input, double-word aligned, needs to be compressed 2^log2(nx) to prevent overflow
y[2nx]: 32-bit complex output, double-word aligned
nx: number of FFT points, 2^N, 16 ≤ nx ≤ 65536
32*32 complex FFT with compression, real and imaginary parts are interleaved, real part is in even subscript, imaginary part is in odd subscript
void DSP_fft32x32s(short *w, int nx, int *x, int *y)
w[2nx]: complex rotation factor, Q31, double word aligned, scale factor = 1073741823.5
x[2nx]: 32-bit complex input, double-word aligned, needs to be compressed 2^(log2(nx)?ceil[log4(nx)?1]) to prevent overflow
y[2nx]: 32-bit complex output, double-word aligned
nx: number of FFT points, 2^N, 16 ≤ nx ≤ 65536
16*16 complex inverse FFT, similar to DSP_fft16x16(). Use DSP_fft16x16() to conjugate the input and then conjugate the output to get the same effect as DSP_ifft16x16()
void DSP_ifft16x16(short *w, int nx, short *x, short *y)
16*32 complex inverse FFT, similar to DSP_fft16x32(), input x needs to be compressed 2^log2(nx) to prevent overflow
void DSP_ifft16x32(short *w, int nx, short *x, short *y)
32*32 complex inverse FFT, similar to DSP_fft32x32()
void DSP_ifft32x32(short *w, int nx, short *x, short *y)
4. Filtering and Convolution
Complex FIR Filter
void DSP_fir_cplx (short *x, short *h, short *r, int nh, int nr)
x[2*(nr+nh-1)]: complex input, the first 2*(nh-1) data plus the new 2nr data
h[2nh]: filter coefficient, complex number
r[2nr]: complex number output, 32 bits are used internally to store temporary results, and the output is shifted right by 15 bits
nh: number of coefficients, 2N
nr: number of output samples, 4N
Complex FIR filter, the difference from DSP_fir_cplx() is that nh satisfies 4N
void DSP_fir_cplx _hM4X4( (short *x, short *h, short *r, int nh, int nr)
FIR Filters
void DSP_fir_gen (short *x, short *h, short *r, int nh, int nr)
x[nr+nh-1]: input, the first nh-1 data plus the new nr data
h[nh]: filter coefficient
r[nr]: Output, 32 bits are used internally to store temporary results, and the output is shifted right by 15 bits
nh: number of coefficients, nh ≥ 5
nr: number of output samples, 4N
FIR filter, the difference from DSP_fir_gen() is that nr satisfies 8N
void DSP_fir_gen_hM17_rA8X8 (short *x, short *h, short *r, int nh, int nr)
FIR filter, the difference from DSP_fir_gen() is that nh satisfies 4N, and nh ≥ 8
void DSP_fir_r4 (short *x, short *h, short *r, int nh, int nr)
FIR filter, the difference from DSP_fir_gen() is that nh satisfies 8N
void DSP_fir_r8 (short *x, short *h, short *r, int nh, int nr)
FIR filter, the difference from DSP_fir_gen() is that nh satisfies 8N, and nh ≥ 16; nr satisfies 8N
void DSP_fir_r8_hM16_rM8A8X8 (short *x, short *h, short *r, int nh, int nr)
FIR filter, only half of the original filter coefficients are required (due to symmetry)
void DSP_fir_sym (short *x, short *h, short *r, int nh, int nr, int s)
x[nr+2nh]: input, the first 2nh data plus the new nr data
h[nh+1]: filter coefficient, half of the original filter coefficient
r[nr]: Output, uses 32 bits to store temporary results internally, and shifts right by s bits when outputting
nh: number of coefficients, the original number of coefficients is 2nh+1, 8N
nr: number of output samples, 4N
IIR filter, input single data, output single data, state vector b is updated internally, and the filtering result is returned
short DSP_iir(short x, short *h, int nh, short *b)
x: input data
h[nh]: filter coefficient, Q14
nh: number of coefficients 8N
b[nh]: state vector
All-pole IIR lattice filter, the filter is composed of nk-level lattice points
void DSP_iir_lat(short *x, int nx, short *k, int nk, int *b, short *r)
x[nx]: input data
k[nk]: reflection coefficient, Q15
b[nk+1]: Delay line data from the previous call, should be initialized to 0
r[nx]: output data
nx: input length
nk: number of reflection coefficients, 2N, and nk≥4
5. Math
Do a dot product of vector x and vector y, add the square of y to G and return it
int DSP_dotp_sqr(int G, short *x, short *y, int *r, int nx)
G: y^2 accumulated value
x[nx]: input data vector 1
y[nx]: input data vector 2
r[nx]: the dot product of x and y
nx: data length, 4N, and N ≥ 12
Returns the dot product of vector x and vector y
int DSP_dotprod(short *x, short *y, int nx)
x[nx]: input data vector 1
y[nx]: input data vector 2
nx: data length, 4N
Returns the maximum value of a vector
short DSP_maxval (short *x, int nx)
x[nx]: input data vector 1
nx: data length, 8N, and N ≥ 32
Returns the minimum value of a vector
short DSP_minval (short *x, int nx)
x[nx]: input data vector 1
nx: data length, 4N, and N ≥ 8
Returns the subscript corresponding to the maximum value of the vector
int DSP_maxidx (short *x, int nx)
x[nx]: input data vector 1
nx: data length, 16N, and N ≥ 32
32x32 data multiplication, output the high 32 bits of the product, the input should be scaled to Q31
void DSP_mul32(int *x, int *y, int *r, short nx)
x[nx]: input data vector 1
y[nx]: input data vector 2
r[nx]: the product of x and y
nx: data length, 8N, and N ≥ 16
Vector Negation
void DSP_neg32(int *x, int * r, short nx)
x[nx]: input data vector
r[nx]: output data vector, *r = -*x
nx: data length, 4N, and N ≥ 8
Returns the decimal and exponential parts of the reciprocal of a number (floating point representation)
void DSP_recip16(short *x, short *rfrac, short *rexp, short nx)
x[nx]: input data vector 1
rfrac[nx]: Output fractional value
rexp[nx]: output exponential value
nx: data length
Returns the sum of the squares of the vector data
int DSP_vecsumsq (short *x, int nx)
x[nx]: input data vector
nx: data length, 4N, and N ≥ 8
Weighted sum: *r = m*(*x)>>15 + *y
void DSP_w_vec(short *x, short *y, short m, short *r, short nx)
x[nx]: weighted data vector 1
y[nx]: input data vector 2
r[nx]: output data vector
nx: data length, 8N, and N ≥ 8
4.6 Matrix
Matrix multiplication: r[r1*c2] = x[r1*c1] * y[c1*c2]
void DSP_mat_mul(short *x, int r1, int c1, short *y, int c2, short *r, int qs)
x[r1*c1]: input matrix x
r1: the number of rows in matrix x, 1~32767
c1: the number of columns of matrix x/the number of rows of matrix y, 1~32767
y[c1*c2]: input matrix y
c2: the number of columns of matrix y, 1~32767
r[r1*c2]: output matrix.
qs: number of right shifts of the element result
Matrix transpose (interchange rows and columns)
void DSP_mat_trans(short *x, short rows, short columns, short *r)
x[rows*columns]: input matrix
rows: the number of rows in the matrix, 4N
columns: the number of columns in the matrix, 4N
r[columns*rows]: output matrix
4.7 Others
Calculates the minimum invalid bit of a vector element, which can be used to find the scaling factor of a data block.
short DSP_bexp(const int *x, short nx)
x[nx]: input data vector
nx: data length, 8N
Data termination mode replacement (big<->little), 16bit
void blk_eswap16(void * restrict x, void * restrict r, int nx)
x[nx]: input data vector
r[nx]: output data vector. If the pointer is empty, the data is returned to x.
nx: data length, 8N, and N ≥ 8
Same as blk_eswap16(), 32 bits
void blk_eswap32(void * restrict x, void * restrict r, int nx)
Same as blk_eswap16(), 64bit
void blk_eswap64(void * restrict x, void * restrict r, int nx)
Data block migration
void DSP_blk_move(short * restrict x, short * restrict r, int nx)
x[nx]: input data vector
r[nx]: target data vector
nx: data length, 8N, and N ≥ 32
Convert IEEE floating point numbers to Q15 fixed point numbers
void DSP_fltoq15(float *x, short *r, short nx)
x[nx]: input floating point vector, [-1,1)
r[nx]: output fixed-point vector, Q15
nx: data length, 2N
Q15 fixed point number converted to IEEE floating point number
void DSP_q15tofl(short * restrict x, float * restrict r, short nx)
x[nx]: input fixed-point vector, Q15
r[nx]: output floating point vector
nx: data length, 2N
|