View RM0020_8315652.PDF datasheet online --- IC-ON-LINE

Datasheet File OCR Text:

september 2013 rev 2 1/69 RM0020 reference manual spc56xx dsp function library 2 introduction the spc56xx dsp function library 2 contains optimized functions for spc56xx family of processors with signal pr ocessing engine (spe apu). www.st.com
contents RM0020 2/69 contents 1 library functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1 fft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 pressure sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 supported compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3 directory structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4 how to use the library in a project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5 example projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6 function api . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.1 bit reversal permutation for 16-bit data . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.2 bit reversal permutation for 32-bit data . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.3 n-point radix-4 complex to complex float in-place fft . . . . . . . . . . . . . . 15 6.4 n-point radix-4 complex to complex frac16/frac32 in-place fft . . . . . . . 16 6.5 n-point quad radix-2 complex to complex float in-place fft . . . . . . . . . . 18 6.6 n-point quad radix-2 complex to complex frac16/frac32 in-place fft . . . 19 6.7 last stage of n-point radix-2 complex to complex float in-place fft . . . . 20 6.8 last stage of n-point radix-2 complex to complex frac32 in-place fft . . 21 6.9 split function of n-point real to complex float fft . . . . . . . . . . . . . . . . . . 22 6.10 split function of n-point real to complex frac32 fft . . . . . . . . . . . . . . . . 25 6.11 complex float windowing function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6.12 complex frac16 windowing function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 6.13 real float windowing function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 6.14 real frac16 windowing function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 6.15 mfb50_pmh - pressure sensing function 1 . . . . . . . . . . . . . . . . . . . . . . . . 32 6.16 mfb50_index - pressure sensing function 2 . . . . . . . . . . . . . . . . . . . . . . . 34
RM0020 contents 3/69 6.17 conv3x3 - 2-d convolution with 3x3 kernel . . . . . . . . . . . . . . . . . . . . . . . . 36 6.18 sobel3x3 - sobel filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6.19 sobel3x3_horizontal - horizontal sobel filter . . . . . . . . . . . . . . . . . . . . . . . 39 6.20 sobel3x3_vertical - vertical sobel filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.21 corr_frac16 - frac16 correlation function . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.22 fir_float - float fir filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.23 fir_frac16 - frac16 fir filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.24 iir_float_1st - float first-order iir filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.25 iir_float_2nd - float second-order iir filter . . . . . . . . . . . . . . . . . . . . . . . . 47 6.26 iir_float_casc - cascade of float second-order iir filters . . . . . . . . . . . . . . 49 6.27 iir_frac16_1st - frac16 first-order iir filter . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.28 iir_frac16_2nd - frac16 second-order iir filter . . . . . . . . . . . . . . . . . . . . . 52 6.29 iir_frac16_casc - cascade of frac16 second-order iir filters . . . . . . . . . . 53 6.30 iir_frac16_2nd_hc - frac16 second-order iir filter with half coefficients . . 55 7 performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 8 revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
list of tables RM0020 4/69 list of tables table 1. bitrev_table_16bit arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 table 2. bitrev_table_32bit arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 table 3. fft_radix4_float arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 table 4. fft_radix4_frac32 arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 table 5. fft_quad_float arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 table 6. fft_quad_frac32 arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 table 7. fft_radix2_last_stage_float arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 0 table 8. fft_radix2_last_stage_frac32 arguments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 table 9. fft_real_split_float arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 table 10. fft_real_split_frac32 arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 table 11. window_apply_complex_float arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 table 12. window_apply_complex_frac16 arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 table 13. window_apply_real_float arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 table 14. window_apply_real_frac16 arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 table 15. mfb50_pmh arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 table 16. mfb50_index arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 table 17. conv3x3 arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 table 18. sobel3x3 arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 table 19. sobel3x3_horizontal arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 table 20. sobel3x3_vertical arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 table 21. corr_frac16 arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 table 22. fir_float arguments arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 table 23. fir_frac16 arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 table 24. iir_float_1st arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 table 25. iir_float_2nd arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 table 26. iir_float_casc arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 table 27. iir_frac16_1st arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 table 28. iir_frac16_2nd arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 table 29. iir_frac16_casc arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 table 30. iir_frac16_2nd_hc arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 table 31. code size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 table 32. radix-4 complex to complex float in-place fft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 table 33. quad radix-2 complex to complex float in-place fft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 table 34. radix-2 complex to complex float in-place fft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 table 35. radix-4 complex to complex frac16/frac32 in-p lace fft, scaling on . . . . . . . . . . . . . . . . . 60 table 36. quad radix-2 complex to complex frac16/frac32 in-place fft, scaling on . . . . . . . . . . . . . 60 table 37. radix-2 complex to complex frac32 in-place fft, scaling on . . . . . . . . . . . . . . . . . . . . . . 61 table 38. real to complex float in-place fft (n odd power of two). . . . . . . . . . . . . . . . . . . . . . . . . . 61 table 39. real to complex float in-place fft (n even power of two) . . . . . . . . . . . . . . . . . . . . . . . . . 61 table 40. real to complex frac16/frac32 in-place fft, scaling on (n odd power of two) . . . . . . . . . 62 table 41. real to complex frac16/frac32 in-place fft, scaling on (n even power of two). . . . . . . . . 62 table 42. complex float windowing function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 table 43. complex frac16 windowing function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 table 44. real float windowing function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 table 45. real frac16 windowing function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 table 46. mfb50_pmh - pressure sensing function 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 table 47. mfb50_index - pressure sensing function 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 table 48. conv3x3 - 2-d convolution with 3x3 kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
RM0020 list of tables 5/69 table 49. sobel3x3 - sobel filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 table 50. sobel3x3_horizontal - horizontal sobel filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 table 51. sobel3x3_vertical - vertical sobel filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 table 52. corr_frac16 - correlation function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 table 53. fir_float - fir filter for float data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 table 54. fir_frac16 - fir filter for frac16 data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 table 55. iir_float_1st - first-order iir filter for float data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 table 56. iir_float_2nd - second-order iir filter for float data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 table 57. iir_float_casc - cascade of second-order iir filters for float data . . . . . . . . . . . . . . . . . . . . 67 table 58. iir_frac16_1st - first-order iir filter for frac16 data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 table 59. iir_frac16_2nd - second-order iir filter for frac16 data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 table 60. iir_frac16_casc - cascade of second-order iir filters for frac16 data . . . . . . . . . . . . . . . . . 67 table 61. iir_frac16_2nd_hc - second-order iir filter for frac16 data with half coefficients . . . . . . . . 67 table 62. document revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
library functions RM0020 6/69 1 library functions the library functions are presented in the following sections. 1.1 fft bitrev_table_16bit - bit reversal permutation for 16-bit data (frac16) bitrev_table_32bit - bit reversal permutation for 32-bit data (frac32 and float) fft_radix4_float - n-point radix-4 complex to comple x float in-place fft, n is even power of two fft_radix4_frac32 - n-point radix-4 complex to complex frac16/frac32 in-place fft, n is even power of two fft_quad_float - n-point quad radix-2 complex to co mplex float in-place fft, n is even power of two fft_quad_frac32 - n-point quad radix-2 complex to complex frac16/frac32 in-place fft, n is even power of two fft_radix2_last_stage_float - calculates the last stage of n-point radix-2 complex to complex float in-place fft, together with fft_quad_float function is used to calculate n-point complex to complex float in-p lace fft where n is odd power of two fft_radix2_last_stage_frac32 - calculates the last stage of n-point radix-2 complex to complex frac32 in-place radix-2 fft, together with fft_quad_frac32 function is used to calculate n-point complex to complex in-place frac16/frac32 fft where n is odd power of two fft_real_split_float - split function for n-poin t real to complex float fft fft_real_split_frac32 - split function for n-point real to complex frac32 fft window_apply_complex_float - complex float windowing function window_apply_complex_frac16 - complex frac16 windowing function window_apply_real_float - real float windowing function window_apply_real_frac16 - real frac16 windowing function 1.2 pressure sensing mfb50_pmh - calculates mass fraction burned (mfb), mfb50 value and pmh index mfb50_index - calculates mfb50 position index 1.3 image conv3x3 - 2-d convolution with 3x3 kernel sobel3x3 - sobel filter sobel3x3_horizontal - horizontal sobel filter sobel3x3_vertical - vertical sobel filter
RM0020 library functions 7/69 1.4 filter corr_frac16 - frac16 correlation function fir_float - float fir filter fir_frac16 - frac16 fir filter iir_float_1st - float first-order iir filter iir_float_2nd - float second-order iir filter iir_float_casc - cascade of float second-order iir filters iir_frac16_1st - frac16 first-order iir filter iir_frac16_2nd - frac16 second-order iir filter iir_frac16_casc - cascade of frac16 second-order iir filters iir_frac16_2nd_hc - frac16 second-order iir filter with half coefficients 1.5 macros the library defines macros in libdsp2.h that encapsulate some of the previous functions to allow easy usage; see the libdsp2.h file and the examples in example directory.
supported compilers RM0020 8/69 2 supported compilers the library was built and tested using the following compilers: codewarrior for powerpc v1.5 beta2 green hills multi for powerpc v4.2.1 wind river compiler version 5.2.1.0
RM0020 directory structure 9/69 3 directory structure doc - contains library user's manual example - contains library example projects for axiom-0321 mpc5554 development board, for each supported compiler include - contains library header file libdsp2.h src - contains library source files for c odewarrior compiler and c source files src\src_ghs - contains library source files for green hills compiler src\src_wr - contains library source files for wind river compiler
how to use the library in a project RM0020 10/69 4 how to use the library in a project codewarrior ? add library source files with required functions into your project window ? add path to library include file libdsp2.h to ?target settings (alt-f7)->access paths->user paths? ? include file libdsp2.h into your source file green hills ? add library source files with required functions into your project window ? use -i compiler option -i{path to libdsp2.h}, ? include file libdsp2.h into your source file wind river ? add library source files with required functions into your project window ? use -i compiler option -i{path to libdsp2.h} ? include file libdsp2.h into your source file code example: #include "libdsp2.h" #define n_max 512 float inout_buffer [2*n_max+8]; /* +8 to ensure there is 32 bytes of readable memory behind buffer, must be double-word aligned */ void main(void) { /* compute complex to complex float in-place fft */ fft_quad_float_512(inout_buffer); }
RM0020 example projects 11/69 5 example projects the library contains ready to run example projects which are located in example directory. the projects are created for axiom mpc5554 development board. codewarrior ? cw_example.mcp - examples of float ffts ? cw_example1.mcp - examples of frac16/frac32 ffts ? cw_example2.mcp - examples of real to complex float ffts ? cw_example3.mcp - examples of real to complex frac16/frac32 ffts ? cw_example4.mcp - examples of real and complex float windowing functions ? cw_example5.mcp - examples of real and complex frac16 windowing functions ? cw_example6.mcp - examples of pressure sensing functions mfb50_pmh and mfb50_index ? cw_example7.mcp - examples of 2d convolution with 3x3 kernel ? cw_example8.mcp - examples of sobel filters ? cw_example9.mcp - examples of frac16 data correlation and auto-correlation function ? cw_example9a.mcp - examples of fir filters ? cw_example9b.mcp - examples of float data iir filters ? cw_example9c.mcp - examples of frac16 data iir filters green hills ? ghs_example.gpj - examples of float ffts ? ghs_example1.gpj - examples of frac16/frac32 ffts ? ghs_example2.gpj - examples of real to complex float ffts ? ghs_example3.gpj - examples of real to complex frac16/frac32 ffts ? ghs_example4.gpj - examples of real and complex float windowing functions ? ghs_example5.gpj - examples of real and complex frac16 windowing functions ? ghs_example6.gpj - examples of pressure sensing functions mfb50_pmh and mfb50_index ? ghs_example7.gpj - examples of 2d convolution with 3x3 kernel ? ghs_example8.gpj - examples of sobel filters ? ghs_example9.gpj - examples of frac16 data correlation and auto-correlation function ? ghs_example9a.gpj - examples of fir filters ? ghs_example9b.gpj - examples of float data iir filters ? ghs_example9c.gpj - examples of frac16 data iir filters wind river ? makefile - examples of float ffts ? makefile1 - examples of frac16/frac32 ffts ? makefile2 - examples of real to complex float ffts ? makefile3 - examples of real to complex frac16/frac32 ffts ? makefile4 - examples of real and complex float windowing functions
example projects RM0020 12/69 ? makefile5 - examples of real and complex frac16 windowing functions ? makefile6 - examples of pressure sensing functions mfb50_pmh and mfb50_index ? makefile7 - examples of 2d convolution with 3x3 kernel ? makefile8 - examples of sobel filters ? makefile9 - examples of frac16 data correlation and auto-correlation function ? makefile9a - examples of fir filters ? makefile9b - examples of float data iir filters ? makefile9c - examples of frac16 data iir filters
RM0020 function api 13/69 6 function api 6.1 bit reversal permutation for 16-bit data function call: void bitrev_table_16bit(unsigned int n, short *inout_buffer, unsigned short *seed_table); arguments: description: implements fast bit reversal permutation on 16-bit complex data using seed table as described in david m.w.evans, an improved digit-reversal permutation algorithm for the fast fourier and hartley transforms ieee transactions on acoustics, speech, and signal processing, vol. assp-35, no. 8, august 1987. performance: see section 7 . table 1. bitrev_table_16bit arguments nin number of groups: for radix-2 bit reverse permutation: n = 8 for 64 and 128-point fft n = 16 for 256 and 512-point fft n = 32 for 1024 and 2048 point fft n = 64 for 4096 point fft for radix-4 2-bit reverse permutation: n = 4 for 64-point fft n = 16 for 256 and 1024-point fft n = 64 for 4096 point fft inout_buffer in/out pointer to the input/output vector of length 2*n, vector must be word aligned memory layout: x_re(0) x_im(0) x_re(1) x_im (1) ... x_re(n-1) x_im(n-1) (re - real part, im - imaginary part) seed_table in pointer to the seed table, use one of the predefined tables: seed_radix2_16bit_64 seed_radix2_16bit_128 seed_radix2_16bit_256 seed_radix2_16bit_512 seed_radix2_16bit_1024 seed_radix2_16bit_2048 seed_radix2_16bit_4096 seed_radix4_16bit_64 seed_radix4_16bit_256 seed_radix4_16bit_1024 seed_radix4_16bit_4096
function api RM0020 14/69 example 1. bitrev_table_16bit #include "libdsp2.h" int inout_buffer[2*256]; void main() { /* bit reverse permutation and calculation of 256-point fft */ bitrev_table_16bit(16, (short *)(inout_buffer+256), seed_radix2_16bit_256); fft_quad_frac32(256, inout_buffer, w_table_radix2_frac32_256);} 6.2 bit reversal permutation for 32-bit data function call: void bitrev_table_32bit(unsigned int n, float *inout_buffer, unsigned short *seed_table); arguments: table 2. bitrev_table_32bit arguments nin number of groups: for radix-2 bit reverse permutation: n = 8 for 64 and 128-point fft n = 16 for 256 and 512-point fft n = 32 for 1024 and 2048 point fft n = 64 for 4096 point fft for radix-4 2-bit reverse permutation: n = 4 for 64-point fft n = 16 for 256 and 1024-point fft n = 64 for 4096 point fft inout_buffer in/out pointer to the input/output vector of length 2*n, vector must be double word aligned memory layout: x_re(0) x_im(0) x_re(1) x_im (1) ... x_re(n-1) x_im(n-1) (re - real part, im - imaginary part) seed_table in pointer to the seed table, use one of the predefined tables: seed_radix2_32bit_64 seed_radix2_32bit_128 seed_radix2_32bit_256 seed_radix2_32bit_512 seed_radix2_32bit_1024 seed_radix2_32bit_2048 seed_radix2_32bit_4096 seed_radix4_32bit_64 seed_radix4_32bit_256 seed_radix4_32bit_1024 seed_radix4_32bit_4096
RM0020 function api 15/69 description: implements fast bit reversal permutation on 32-bit complex data using seed table as described in david m.w.evans, an improved digit-reversal permutation algorithm for the fast fourier and hartley transforms ieee transactions on acoustics, speech, and signal processing, vol. assp-35, no. 8, august 1987. performance: see section 7 . example 2. bitrev_table_32bit #include "libdsp2.h" float inout_buffer[2*256]; void main() { /* bit reverse permutation and calculation of 256-point fft */ bitrev_table_32bit(16, inout_buffer, seed_radix2_32bit_256); fft_quad_float(256, inout_buffer, w_table_radix2_float_256); } 6.3 n-point radix-4 complex to complex float in-place fft function call: void fft_radix4_float(unsigned int n, float *inout_buffer, float *twiddle_factor_table); arguments: description: computes the n-point radix-4 complex to comp lex float in-place fast fourier transform (fft). input 2-bit reversed data are stored in inout_buffer, output data are written over the input data into inout_buffer. there is used radix-4 fft algorithm. table 3. fft_radix4_float arguments nin fft length, must be even power of two, tested for 64, 256, 1024, 4096 inout_buffer in/out pointer to the input/output vector of length 2*n, vector must be double word aligned memory layout: x_re(0) x_im(0) x_re(1) x_i m(1) ... x_re(n -1) x_im(n-1) (re - real part, im - imaginary part) twiddle_factor_table in pointer to the vector of twiddle fa ctors of length 3*n/2-4, vector must be double word aligned memory layout: w_re(0) w_im(0) w_re(1) w_ im(1) ... w_re(3*n/4-3) w_im(3*n/4-3) use predefined w_table_radix4_float_n arrays. the w_table_radix4_float_n is computed according to this formula: for k = 0, 1, ... 3*n/4-3 w_re k ?? iw_imk ?? ? + w n k =
function api RM0020 16/69 algorithm: equation 1 for each n from 0 to n-1 equation 2 where i is the imaginary unit with the property that: . note: there must be at least 32 bytes of readable memory behind inout_buffer. performance: see section 7 and ta b l e 3 2 . example 3. fft_radix4_float #include "libdsp2.h" float inout_buffer[2*256]; void main() { /* bit reverse permutation and calculation of 256-point fft */ bitrev_table_32bit(16, inout_buffer, seed_radix4_32bit_256); fft_radix4_float (256, inout_buffer, w_table_radix4_float_256); } 6.4 n-point radix-4 complex to complex frac16/frac32 in-place fft function call: void fft_radix4_frac32(unsigned int n, int *inout_buffer, int *twiddle_factor_table); arguments: xn ?? xk ?? w n nk ? ? k0 = n1 ? ? = w n e 2 ? i n -------- ? = i 2 1 ? = table 4. fft_radix4_frac32 arguments nin fft length, must be even power of two, tested for 64, 256, 1024, 4096 inout_buffer in/out pointer to the input/output vector of length 2*n, vector must be double word aligned input data alignment: 16-bit fractional input data in range -1 to 1-2^-15 are stored in second half of inout_buffer in order real, imag, real imag, ..... 0, 0, 0, 0, ... x_re(0) x_im(0), x_re(1) x_im(1), ... x_re(n-1) x_im(n-1), output data alignment: 32-bit fractional output data in range -1 to 1-2^-31 are stored: x_re(0) x_im(0) x_re(1) x_im (1) ... x_re(n-1) x_im(n-1) (re - real part, im - imaginary part)
RM0020 function api 17/69 description: computes the n-point radix-4 complex to co mplex frac16/frac32 in-place fast fourier transform (fft). input 2-bit reversed 16-bit fr actional data are stored in second half of inout_buffer, output 32-bit fractional data are written over the input data into inout_buffer. there is used radix-4 fft algorithm. the output scaling is configurable by constant definition in the function source file: .set scaling 1 #/* 0 - off, 1 - on */ by default it is turned on. the scaling is performed by dividing by 4 before each radix- 4 stage. algorithm: equation 3 for each n from 0 to n-1, scaled do wn by n if scaling is turned on equation 4 where i is the imaginary unit with the property that: . note: there must be at least 32 bytes of readable memory behind inout_buffer. performance: see section 7 and ta b l e 3 5 . example 4. fft_radix4_frac32 #include "libdsp2.h" int inout_buffer[2*256]; void main() { /* bit reverse permutation and calculation of 256-point fft */ bitrev_table_16bit(16, (short *)(inout_buffer+256), seed_radix4_16bit_256); fft_radix4_frac32(256, inout_buffer, w_table_radix4_frac32_256); } twiddle_factor_table in pointer to the vector of twiddle factors of length 3*n/2-4, vector must be double word aligned memory layout: w_re(0) w_im(0) w_re(1) w_ im(1) ... w_re(3*n/4-3) w_im(3*n/4-3) use predefined w_table_radix4_frac32_n arrays. the w_table_radix4_frac32_n is computed according to this formula: for k = 0, 1, ... 3*n/4-3 table 4. fft_radix4_frac32 arguments (continued) w_re k ?? iw_imk ?? ? + w n k = xn ?? xk ?? w n nk ? ? k0 = n1 ? ? = w n e 2 ? i n -------- ? = i 2 1 ? =
function api RM0020 18/69 6.5 n-point quad radix-2 comple x to complex float in-place fft function call: void fft_quad_float(unsigned int n, float *inout_buffer, float *twiddle_factor_table); arguments: description: computes the n-point quad radix-2 complex to complex float in-place fast fourier transform (fft). input bit reversed data are stored in inout_buffer, output data are written over the input data into inout_buffer. there is used radix-2 algorithm in which two stages are calculated in one loop (quad butterfly). algorithm: equation 5 for each n from 0 to n-1 equation 6 where i is the imaginary unit with the property that: . note: there must be at least 32 bytes of readable memory behind inout_buffer. performance: see section 7 and ta b l e 3 3 . example 5. fft_quad_float #include "libdsp2.h" float inout_buffer[2*256]; table 5. fft_quad_float arguments nin fft length, must be even power of two, tested for 64, 256, 1024, 4096 inout_buffer in/out pointer to the input/output vector of length 2*n, vector must be double word aligned memory layout: x_re(0) x_im(0) x_re(1) x_i m(1) ... x_re(n -1) x_im(n-1) (re - real part, im - imaginary part) twiddle_factor_table in pointer to the vector of twiddle fa ctors of length n, vector must be double word aligned memory layout: w_re(0) w_im(0) w_re(1) w_im(1 ) ... w_re(n/2-1 ) w_im(n/2-1) use predefined w_table_radix2_float_n arrays. the w_table_radix2_float_n is computed according to this formula: for k = 0, 1, ... n/2-1 w_re k ?? iw_imk ?? ? + w n k = xn ?? xk ?? w n nk ? ? k0 = n1 ? ? = w n e 2 ? i n -------- ? = i 2 1 ? =
RM0020 function api 19/69 void main() { /* bit reverse permutation and calculation of 256-point fft */ bitrev_table_32bit(16, inout_buffer, seed_radix2_32bit_256); fft_quad_float(256, inout_buffer, w_table_radix2_float_256); } 6.6 n-point quad radix-2 comple x to complex frac16/frac32 in- place fft function call: void fft_quad_frac32(unsigned int n, int *inout_buffer, int *twiddle_factor_table); arguments: description: computes the n-point quad radix-2 complex to complex frac16/frac32 in-place fast fourier transform (fft). input bit reversed 16-bit fractional data are stored in second half of inout_buffer, output 32-bit fractional data are written over the input data into inout_buffer. there is used radix-2 algorithm in which two stages are calculated in one loop (quad butterfly). the output scaling is configurable by constant definition in the function source file: .set scaling 1 #/* 0 - off, 1 - on */ table 6. fft_quad_frac32 arguments nin fft length, must be even power of two, tested for 64, 256, 1024, 4096 inout_buffer in/out pointer to the input/output vector of length 2*n, vector must be double word aligned input data alignment: 16-bit fractional input data in range -1 to 1-2^-15 are stored in second half of inout_buffer in order real, imag, real imag, ..... 0, 0, 0, 0, ... x_re (0) x_im(0), x_re(1) x_ im(1), ... x_re(n-1) x_im(n-1), output data alignment: 32-bit fractional output data in ra nge -1 to 1-2^-31 are stored: x_re(0) x_im(0) x_re(1) x_i m(1) ... x_re(n-1) x_im(n-1) (re - real part, im - imaginary part) twiddle_factor_table in pointer to the vector of twiddle factors of length n, vector must be double word aligned memory layout: w_re(0) w_im(0) w_re(1) w_im(1 ) ... w_re(n/2-1) w_im(n/2- 1) use predefined w_table_radix2_frac32_n arrays. the w_table_radix2_frac32_n is computed according to this formula: for k = 0, 1, ... n/2-1 w_re k ?? iw_imk ?? ? + w n k =
function api RM0020 20/69 by default it is turned on. the scaling is performed by dividing by 4 before each pair of radix-2 stages. algorithm: equation 7 for each n from 0 to n-1, scaled do wn by n if scaling is turned on equation 8 where i is the imaginary unit with the property that: . note: there must be at least 16 bytes of readable memory behind inout_buffer. performance: see section 7 and ta b l e 3 6 . example 6. fft_quad_frac32 #include "libdsp2.h" int inout_buffer[2*256]; void main() { /* bit reverse permutation and calculation of 256-point fft */ bitrev_table_16bit(16, (short *)(inout_buffer+256), seed_radix2_16bit_256); fft_quad_frac32(256, inout_buffer, w_table_radix2_frac32_256); } 6.7 last stage of n-point radix- 2 complex to complex float in- place fft function call: void fft_radix2_last_stage_float(unsigned int n, float *inout_buffer, float *twiddle_factor_table); arguments: xn ?? xk ?? w n nk ? ? k0 = n1 ? ? = w n e 2 ? i n -------- ? = i 2 1 ? = table 7. fft_radix2_last_stage_float arguments n in fft length, tested for 128, 512, 2048 inout_buffer in/ou t pointer to the input/output vector of length 2*n, vector must be double word aligned memory layout: x_re(0) x_im(0) x_re(1) x_i m(1) ... x_re(n -1) x_im(n-1) (re - real part, im - imaginary part)
RM0020 function api 21/69 description: computes the last stage of n-point radix- 2 complex to complex float in-place fast fourier transform (fft). input data are stored in inout_buffer, output data are written over the input data into inout_buffer. this function is used to calculate the fft for lengths which are odd power of two (128, 512, 2048). note: there must be at least 16 bytes of readable memory behind inout_buffer. performance: see section 7 and ta b l e 3 4 . example 7. fft_radi x2_last_stage_float #include "libdsp2.h" float inout_buffer[2*512]; void main() { /* bit reverse permutation and calculation of 512-point fft */ bitrev_table_32bit(16, inout_buffer, seed_radix2_32bit_512); fft_quad_float(512, inout_buffer, w_table_radix2_float_512); fft_radix2_last_stage_float(512, inout_buffer, w_table_radix2_float_512); } 6.8 last stage of n-point radix- 2 complex to complex frac32 in- place fft function call: void fft_radix2_last_stage_frac32(unsigned int n, int *inout_buffer, int *twiddle_factor_table); arguments: twiddle_factor_table in pointer to the vector of twiddle fa ctors of length n, vector must be double word aligned memory layout: w_re(0) w_im(0) w_re(1) w_im(1 ) ... w_re(n/2-1) w_im(n/2-1) use predefined w_table_radix2_float_n arrays. the w_table_radix2_float_n is computed according to this formula: for k = 0, 1, ... n/2-1 table 7. fft_radix2_last_stage_float arguments (continued) w_re k ?? iw_imk ?? ? + w n k = table 8. fft_radix2_last_stage_frac32 arguments n in fft length, tested for 128, 512, 2048 inout_buffer in/out pointer to the input/output vector of length 2*n, vector must be double word aligned memory layout: x_re(0) x_im(0) x_re(1) x_im (1) ... x_re(n-1) x_im(n-1) (re - real part, im - imaginary part)
function api RM0020 22/69 description: computes the last stage of n-point radix-2 complex to complex frac32 in-place fast fourier transform (fft). input data are stored in inout_buffer, output data are written over the input data into inout_buffer. this function is used to calculate the fft for lengths which are odd power of two (128, 512, 2048). note: there must be at least 16 bytes of readable memory behind inout_buffer. performance: see section 7 and ta b l e 3 7 . example 8. fft_radi x2_last_stage_frac32 #include "libdsp2.h" int inout_buffer[2*512]; void main() { /* bit reverse permutation and calculation of 512-point fft */ bitrev_table_16bit(16, (short *)(inout_buffer+512), seed_radix2_16bit_512); fft_quad_frac32(512, inout_buffer, w_table_radix2_frac32_512); fft_radix2_last_stage_frac32(512, inout_buffer, w_table_radix2_frac32_512); } 6.9 split function of n-poin t real to complex float fft function call: void fft_real_split_float(unsigned int n, float *y, float *wa, float *wb); twiddle_factor_table in pointer to the vector of twiddle factors of length n, vector must be double word aligned memory layout: w_re(0) w_im(0) w_re(1) w_im(1 ) ... w_re(n/2-1 ) w_im(n/2-1) use predefined w_table_radix2_frac32_n arrays. the w_table_radix2_frac32_n is computed according to this formula: for k = 0, 1, ... n/2-1 table 8. fft_radix2_last_stage_frac32 arguments (continued) w_re k ?? iw_imk ?? ? + w n k =
RM0020 function api 23/69 arguments: description: split function of n-point real to complex float in-place fast fourier transform (fft). input data are stored in y, output data are written over the input data into y. the function gives the first half of the output spectrum. the calculation of the n/2 item is configurable by constant definition in the function source file: .set n2_realpart, 0 #/* 0/1 - real part of n/2 item isn't/is inserted in imaginary part of 0th item, default value is 0 */ by default the n/2 output item is not calculated. table 9. fft_real_split_float arguments n in fft length, tested for 128, 256, 512, 1024, 2048, 4096 y in/out pointer to the input/output vector of length n, vector must be double word aligned input: as input pass t he output from the comple x to complex float fft of half the length, dat a ordered x_re(0) x_im(0) x_re(1) x_im(1)... x_re(n/2-1) x_im(n/2-1) output memory layout: x_re(0) {0 or x_re(n/2)} x_ re(1) x_im(1) . .. x_re(n/2-1) x_im(n/2-1) (re - real part, im - imaginary part) wa in pointer to the wa vector of twiddl e factors of length n/4+1, vector must be double word aligned memory layout: wa_re(0) wa_im(0) wa_re(1) wa_im(1) ... wa_re(n/4+1) wa_im(n/4+1) use predefined wa_table_float_n arrays. the wa_table_float_n is computed according to this formula: for k = 0, 1, ... n/4+1 , where i is the imaginary unit wb in pointer to the wb vector of twiddl e factors of lengt h n/4+1, vector must be double word aligned memory layout: wb_re(0) wb_im(0) wb_re(1) wb_im(1) ... wb_re(n/4+1) wb_im(n/4+1) use predefined wb_table_float_n arrays. the wb_table_float_n is computed according to this formula: for k = 0, 1, ... n/4+1 , where i is the imaginary unit wa k ?? 1 2 -- - 1iw n k ? ?? ?? = w n e i 2 ? n ------ ? = wb k ?? 1 2 -- - 1iw n k + ?? ?? = w n e i 2 ? n ------ ? =
function api RM0020 24/69 here is the summary how the real to complex float in-place fft can be calculated using the split function: ? on real input data in_re(0) in_re(1) ... in_re(n-1), calculate complex to complex fft of half the length ? call the split function which calculates the half of the spectrum x_re(0) x_im(0) x_re(1) x_im(1) ... x_re(n/2-1) x_im(n/2-1) ? if n2_realpart is defined to 1, the split function saves the x_re(n/2) into x_im(0) algorithm: input data in memory: , re - real part, im - imaginary part equation 9 the split function performs calculation of the output sequence from the input sequence according to the following equations: equation 10 equation 11 equation 12 for each k from 1 to n/2-1 where: equation 13 equation 14 equation 15 * denotes complex conjugate operation i is the imaginary unit equation 16 output data in memory: when n2_realpart is defined to 0 when n2_realpart is defined to 1 x_re 0 ?? x_im 0 ?? x_re 1 ?? x_im 1 ??? x_re n 2 --- - 1 ? ?? ?? x_im n 2 --- - 1 ? ?? ?? ?? x_re k ?? ix_im k ?? + = xk ?? ?? ?? x_re 0 ?? x_im 0 ?? + = x n 2 --- - ?? ?? x_re 0 ?? x_im 0 ?? ? = xk ?? wa k ?? xk ?? wb k ?? x * n 2 --- - k ? ?? ?? + = wa k ?? 1 2 -- - 1iw n k ? ?? ?? = wb k ?? 1 2 -- - 1iw n k + ?? ?? = w n e i 2 ? n ------ ? = xk ?? x_re k ?? ix_im k ?? + = x_re 0 ?? x_im 0 ?? x_re 1 ?? x_im 1 ??? x_re n 2 --- - 1 ? ?? ?? x_im n 2 --- - 1 ? ?? ?? ?? x_re n 2 --- - ?? ?? x_re 1 ?? x_im 1 ??? x_re n 2 --- - 1 ? ?? ?? x_im n 2 --- - 1 ? ?? ??
RM0020 function api 25/69 performance: see section 7 and ta b l e 3 8 . example 9. fft_real_split_float #include "libdsp2.h" float inout_buffer [128+8]; /* +8 to ensure there is 32 bytes of readable memory behind buffer */ void main(void) { int i; /* example real input data, sequence from 1 to 128 */ for (i = 0; i < 128; i++) inout_buffer[i] = i+1; /* compute complex to complex fft of half the length */ fft_radix4_float_64(inout_buffer); /* call split function to get correct fft result */ fft_real_split_float(128, inout_buffer, wa_table_float_128, wb_table_float_128); } 6.10 split function of n-poin t real to complex frac32 fft function call: void fft_real_split_frac32(unsigned int n, int *y, int *wa, int *wb); arguments: table 10. fft_real_split_frac32 arguments n in fft length, tested for 128, 256, 512, 1024, 2048, 4096 yin/out pointer to the input/output vector of length n, vector must be double word aligned input: as input pass the output from the complex to complex frac32 fft of half the length, data ordered x_re(0) x_im(0) x_re(1) x_im(1)... x_re(n /2-1) x_im(n/2-1) output memory layout: x_re(0) {0 or x_re(n/2)} x_ re(1) x_im(1) ... x_re(n/2-1) x_im(n/2-1) (re - real part, im - imaginary part)
function api RM0020 26/69 description: split function of n-point real to complex frac32 in-place fast fourier transform (fft). input 32-bit fractional data are stored in y, output 32-bit fractional data are written over the input data into y. the function gives the first half of the output spectrum. the calculation of the n/2 item is configurable by constant definition in the function source file: .set n2_realpart, 0 #/* 0/1 - real part of n/2 item isn't/is inserted in imaginary part of 0th item, default value is 0 */ by default the n/2 output item is not calculated. the output scaling is configurable by constant definition in the function source file: .set scaling 1 #/* 0 - off, 1 - on */ by default it is turned on. the scaling is performed by dividing input data by 2 before calculation. wa in pointer to the wa vector of twiddle factors of length n/4+1, vector must be double word aligned memory layout: wa_re(0) wa_im(0) wa_re(1) wa_im(1) ... wa_re(n/4+1) wa_im(n/4+1) use predefined wa_table_frac32_n arrays. the wa_table_frac32_n is computed according to this formula: for k = 0, 1, ... n/4+1 , where i is the imaginary unit wb in pointer to the wb vector of twiddle factors of length n/4+1, vector must be double word aligned memory layout: wb_re(0) wb_im(0) wb_re(1) wb_im(1) ... wb_re(n/4+1) wb_im(n/4+1) use predefined wb_table_frac32_n arrays. the wb_table_frac32_n is computed according to this formula: for k = 0, 1, ... n/4+1 , where i is the imaginary unit table 10. fft_real_split_frac32 arguments (continued) wa k ?? 1 2 -- - 1iw n k ? ?? ?? = w n e i 2 ? n ------ ? = wb k ?? 1 2 -- - 1iw n k + ?? ?? = w n e i 2 ? n ------ ? =
RM0020 function api 27/69 here is the summary how the real to complex frac32 in-place fft can be calculated using the split function: ? on real input data in_re(0) in_re(1) ... in_re(n-1), calculate complex to complex fft of half the length ? call the split function which calculates the half of the spectrum x_re(0) x_im(0) x_re(1) x_im(1) ... x_re(n/2-1) x_im(n/2-1) ? if n2_realpart is defined to 1, the split function saves the x_re(n/2) into x_im(0) ? note that the fft output is divided by n since scaling is by default turned on algorithm: input data in memory: , re - real part, im - imaginary part equation 17 , divided by two only in the case when scaling is set to 1 the split function performs calculation of the output sequence from the input sequence according to the following equations: equation 18 equation 19 equation 20 for each k from 1 to n/2-1 where: equation 21 equation 22 equation 23 * denotes complex conjugate operation i is the imaginary unit equation 24 output data in memory: when n2_realpart is defined to 0 x_re 0 ?? x_im 0 ?? x_re 1 ?? x_im 1 ??? x_re n 2 --- - 1 ? ?? ?? x_im n 2 --- - 1 ? ?? ?? ?? x_re k ?? 2 ---------------------- i x_im k ?? 2 --------------------- + = xk ?? ?? ?? x_re 0 ?? x_im 0 ?? + = x n 2 --- - ?? ?? x_re 0 ?? x_im 0 ?? ? = xk ?? wa k ?? xk ?? wb k ?? x * n 2 --- - k ? ?? ?? + = wa k ?? 1 2 -- - 1iw n k ? ?? ?? = wb k ?? 1 2 -- - 1iw n k + ?? ?? = w n e i 2 ? n ------ ? = xk ?? x_re k ?? ix_im k ?? + = x_re 0 ?? x_im 0 ?? x_re 1 ?? x_im 1 ??? x_re n 2 --- - 1 ? ?? ?? x_im n 2 --- - 1 ? ?? ??
function api RM0020 28/69 when n2_realpart is defined to 1 performance: see section 7 and ta b l e 4 0 . example 10. fft_real_split_frac32 #include "libdsp2.h" int inout_buffer [128+8]; /* +8 to ensure there is 32 bytes of readable memory behind buffer */ void main(void) { int i; /* example real input data, sequence from 1/2^15 to 128/2^15 */ for (i = 0; i < 128; i++) *((short *)(inout_buffer + 64)+i) = (short) (i+1); /* compute complex to complex fft of half the length */ fft_radix4_frac32_64(inout_buffer); /* call split function to get correct fft result divided by n */ fft_real_split_frac32(128, inout_buffer, wa_table_frac32_128, wb_table_frac32_128); } 6.11 complex float windowing function function call: void window_apply_complex_float(unsigned int n, float *x, float *y, float *w); arguments: description: windowing function for complex float data. input data are stored in x vector, output data are written into y vector. the y vector can be the same as x vector (in-place windowing). x_re 0 ?? x_re n 2 --- - ?? ?? x_re 1 ?? x_im 1 ??? x_re n 2 --- - 1 ? ?? ?? x_im n 2 --- - 1 ? ?? ?? table 11. window_apply_complex_float arguments n in length of complex vectors, must be multiple of 4 x (1) 1. all vectors must be double word aligned. al l vectors have the following memory layout: re (0) im (0) re (1) im (1) ... re (n-1) im (n-1), where re is real part and im is imaginary part. in pointer to the input vector of length 2*n y (1) out pointer to the output vector of length 2*n, y vector can be the same as x vector (in-place windowing). w (1) in pointer to the window vector of length 2*n
RM0020 function api 29/69 algorithm: equation 25 for each k = 0,1,...n-1 where x denotes complex multiplication. note: there must be 16 bytes of readable memory behind x and w buffers. performance: see section 7 and ta b l e 4 2 . example 11. window_apply_complex_float #include "libdsp2.h" float x[2*64+4]; /* +4 to ensure there is 16 bytes of readable memory behind buffer, must be aligned to 8 bytes */ float w[2*64+4]; /* +4 to ensure there is 16 bytes of readable memory behind buffer, must be aligned to 8 bytes */ void main(void) { unsigned int i, n; /* length of complex vectors is 64 */ n = 64; /* prepare example input data */ for (i = 0; i < 2*n; i++) x[i] = i+1; for (i = 0; i < 2*n; i++) w[i] = i+2*n; /* windowing function on complex float data, x[i] = x[i]*w[i] */ window_apply_complex_float(n, x, x, w); } 6.12 complex frac16 windowing function function call: void window_apply_complex_frac16(unsigned int n, short *x, short *y, short *w); arguments: description: windowing function for complex frac16 data. all the data are handled as signed fractional 16-bit (range -1 to 1-2^-15). input data are stored in x vector, output data yk ?? xk ?? wk ?? ? = table 12. window_apply_complex_frac16 arguments n in length of complex vectors, must be multiple of 4 x (1) 1. all vectors must be word aligned. all ve ctors have the following memory layout: re (0) im (0) re (1) im (1)... re (n-1) im (n-1), where re is real part and im is imaginary part. in pointer to the input vector of length 2*n y (1) out pointer to the output vector of length 2*n, y vector can be the same as x vector (in-place windowing). w (1) in pointer to the window vector of length 2*n
function api RM0020 30/69 are written into y vector. the y vector can be the same as x vector (in-place windowing). the output scaling is configurable by constant definition in the function source file: .set scaling, 1 #/* 0 - off, 1 - on (output divided by 2) */ by default it is turned on. the scaling is performed by dividing output data by 2. algorithm: equation 26 for each k = 0,1,...n-1 where x denotes complex multiplication. the output value is divided by 2 when scaling == 1. note: there must be 8 bytes of readable memory behind x and w buffers. if scaling == 1, then w array mustn't have any value equal to -1 (-32768) since there is used integer multiplication for fractional data. performance: see section 7 and ta b l e 4 3 . example 12. window_apply_complex_frac16 #include "libdsp2.h" short x[2*64+4]; /* +4 to ensure there is 8 bytes of readable memory behind buffer, must be aligned to 4 bytes */ short w[2*64+4]; /* +4 to ensure there is 8 bytes of readable memory behind buffer, must be aligned to 4 bytes */ void main(void) { unsigned int i, n; /* length of complex vectors is 64 */ n = 64; /* prepare example input data */ for (i = 0; i < 2*n; i++) x[i] = (i+1)<<5; for (i = 0; i < 2*n; i++) w[i] = (i+2*n)<<5; /* windowing function on complex frac16 data, x[i] = x[i]*w[i] */ window_apply_complex_frac16(n, x, x, w); } 6.13 real float windowing function function call: void window_apply_real_float(unsigned int n, float *x, float *y, float *w); yk ?? xk ?? wk ?? ? = yk ??
RM0020 function api 31/69 arguments: description: windowing function for real float data. input data are stored in x vector, output data are written into y vector. the y vector can be the same as x vector (in-place windowing). algorithm: equation 27 for each k = 0,1,...n-1 note: there must be 16 bytes of readable memory behind input vector x and 8 bytes of readable memory behind window vector w. performance: see section 7 and ta b l e 4 4 . example 13. window_apply_real_float #include "libdsp2.h" float x[64+4]; /* +4 to ensure there is 16 bytes of readable memory behind buffer */ float w[64+2]; /* +2 to ensure there is 8 bytes of readable memory behind buffer */ void main(void) { unsigned int i, n; /* length of vectors is 64 */ n = 64; /* prepare example input data */ for (i = 0; i < n; i++) x[i] = i+1; for (i = 0; i < n; i++) w[i] = i+n; /* windowing function on real float data, x[i] = x[i]*w[i] */ window_apply_real_float(n, x, x, w); } 6.14 real frac16 windowing function function call: void window_apply_real_frac16(unsigned int n, short *x, short *y, short *w); table 13. window_apply_real_float arguments n in length of vectors, must be multiple of 16 x (1) 1. all vectors must be double word aligned. al l vectors have the following memory layout: x (0) x (1) ... x (n-1). in pointer to the input vector of length n y (1) out pointer to the output vector of l ength n, y vector can be the same as x vector (in-place windowing) w (1) in pointer to the window vector of length n yk ?? xk ?? wk ?? =
function api RM0020 32/69 arguments: description: windowing function for real frac16 data. all the data are handled as signed fractional 16-bit (range -1 to 1-2^-15). input data are stored in x vector, output data are written into y vector. the y vector can be the same as x vector (in-place windowing). algorithm: equation 28 for each k = 0,1,...n-1 note: there must be 16 bytes of readable memory behind input vector x and 8 bytes of readable memory behind window vector w. performance: see section 7 and ta b l e 4 5 . example 14. windo w_apply_real_frac16 #include "libdsp2.h" short x[64+4]; /* +4 to ensure there is 8 bytes of readable memory behind buffer, must be aligned to 4 bytes */ short w[64+2]; /* +2 to ensure there is 4 bytes of readable memory behind buffer, must be aligned to 4 bytes */ void main(void) { unsigned int i, n; /* length of vectors is 64 */ n = 64; /* prepare example input data */ for (i = 0; i < n; i++) x[i] = (i+1)<<5; for (i = 0; i < n; i++) w[i] = (i+n)<<5; /* windowing function on real float data, x[i] = x[i]*w[i] */ window_apply_real_frac16(n, x, x, w); } 6.15 mfb50_pmh - pressure sensing function 1 function call: float mfb50_pmh(unsigned int n, float *p_mfb, float *v, float k, float *pmh); table 14. window_apply_real_frac16 arguments n in length of complex vectors, must be multiple of 16 x (1) 1. all vectors must be word aligned. all ve ctors have the following memory layout: x (0) x (1) ... x (n-1). in pointer to the input vector of length n y (1) out pointer to the output vector of l ength n, y vector can be the same as x vector (in-place windowing) w (1) in pointer to the window vector of length n yk ?? xk ?? wk ?? =
RM0020 function api 33/69 arguments: description: pressure sensing function 1. the function is used for in-cylinder pressure sensor management. for each cylinder and every engine revolution, there is calculated the mass fraction burned function (mfb), mfb50 value and pmh torque index. as inputs there are used vector of in-cylinder pressure values p_mfb, vector of cylinder volume values v, and constant k. the function returns the mfb50 value, mass fraction burned function (mfb) is returned in p_mfb vector and pmh index in a variable addresses by pmh pointer. algorithm: calculation of mfb and mfb50 value: 1. aquisition of in-cylinder pressure at any sampling position ? [i] (aquisition window ranges from -180 ? before tdc to 180 ? after tdc): p[0] p[1] ... p[n-1] 2. calculation of cylinder volume v[i] at any sampling position ? [i], v[i] can be precalculated and stored in a calibration vector: v[0] v[1] ... v[n-1] 3. calculation of heat release rate (qist), (supposed delta ? is constant e.g. 0,5 ? ), for each i = 1,2 ... n-2: equation 29 note: the function calculates the heat release rate without multiplication with constant in both qv and qp equations. 4. calculation of mass fraction burned mfb (integration of heat release rate), initial condition is mfb[1] = 0, then for each i = 2,3...n-2: equation 30 note: the function calculates the mass fraction burned without multiplication with constant . table 15. mfb50_pmh arguments nin length of pressure vector p_mfb and volume vector v, n-2 must be multiple of 8, n must be greater than or equal to 18 p_mfb (1) 1. both pressure and volume vector must be double word aligned. in/out pointer to the input pressure vector of length n, input memory layout is p(0) p(1) ... p(n-1), output memory layout is mfb(1) mfb(2) ... mfb(n-2) mfb(n-1) 0 v (1) in pointer to the input volume vector of length n, memory layout is v(0) v(1) ... v(n-1) k in constant related to engine setup pmh out pointer to variable where pmh index is returned qv k p i ?? vi 1 + ?? vi 1 ? ?? ? k1 ? ?? 2 ? ? ?? --------------------------------------------- - ?? = qp v i ?? pi 1 + ?? pi 1 ? ?? ? k1 ? ?? 2 ? ? ?? --------------------------------------------- - ? = qist i ?? qv qp + = 1 k1 ? ?? 2 ? ? ?? ------------------------------------ - mfb i ?? mfb i 1 ? ?? qist i ?? qist i 1 ? ?? + ?? ? ? 2 ------ ? + = ? ? 2 ------
function api RM0020 34/69 also note that the function writes calculated mfb[1] at the p_mfb array position 0. 5. find maximum mfbmax and minimum mfbmin value from mfb[i], i = 1,2...n-2 6. calculate mfb50 value as: equation 31 calculation of pmh index: 7. starting with pmh_old = 0, calculate for each i = 1,2...n-1: equation 32 note: there must be 8 bytes of readable memory behind input vector p_mfb and v. performance: see section 7 and ta b l e 4 6 . example 15. mfb50_pmh #include "libdsp2.h" float p[740]; float v[740]; float k = 1.4f; unsigned int n = 714; float pmh; unsigned int mfb50_index; float mfb50; void main(void) { /* from pressure vector p and volume vector v, calculate mfb50 value, pmh index and mass fraction burned (mfb), mfb returned back in p vector, mfb1 at p array position 0 */ mfb50 = mfb50_pmh(n, p, v, k, &pmh); /* search for mfb50 value in mass fraction burned (p) vector and return index at which the mfb50 value was found */ mfb50_index = mfb50_index(n, p, mfb50); } 6.16 mfb50_index - pressure sensing function 2 function call: unsigned int mfb50_index(unsigned int n, float *mfb, float mfb50); mfb50 mfbmax mfbmin + 2 ------------------------------------------------------- - = dva v i ?? vi 1 ? ?? ? = pma pi ?? pi 1 ? ?? + 2 ------------------------------------ = pmh pmh_old pma dva ? + =
RM0020 function api 35/69 arguments: description: pressure sensing function 2. the function is used for in-cylinder pressure sensor management. this function should follow the mfb50_pmh function call. the function searches the mass fraction burned (mfb) vector and returns index at which the mfb50 value was found. the returned index is called mfb50% position index. algorithm: 1. for each i = 2,3 ... n-2, calculate absolute value of difference mfb[i] and mfb50: equation 33 2. find index i at which the diff [ i ] has minimal value. note: there must be 8 bytes of readable memory behind input vector mfb. performance: see section 7 and ta b l e 4 7 . example 16. mfb50_index #include "libdsp2.h" float p[740]; float v[740]; float k = 1.4f; unsigned int n = 714; float pmh; unsigned int mfb50_index; float mfb50; void main(void) { /* from pressure vector p and volume vector v, calculate mfb50 value, pmh index and mass fraction burned (mfb), mfb returned back in p vector, mfb1 at p array position 0 */ mfb50 = mfb50_pmh(n, p, v, k, &pmh); /* search for mfb50 value in mass fraction burned (p) vector and return index at which the mfb50 value was found */ mfb50_index = mfb50_index(n, p, mfb50); } table 16. mfb50_index arguments nin length of mass fraction burned vector mfb, n-2 must be multiple of 4, n must be greater than or equal to 10 mfb (1) 1. the mfb vector must be double word aligned. in pointer to the input mass fraction burned vector of length n, layout is mfb(1) mfb(2) ...mfb(n-2) mf b(n-1) 0, use vector calculated by mfb50_pmh function mfb50 in mfb50 value, use value returned by mfb50_pmh function diff i ?? mfb i ?? mfb50 ? =
function api RM0020 36/69 6.17 conv3x3 - 2-d convolution with 3x3 kernel function call: void conv3x3(unsigned char *a, unsigned char *b, signed char *c, unsigned short m, unsigned short n, unsigned short shift); arguments: description: computes the 2-d convolution on mxn input image using a 3x3 convolution kernel. the computed convolution value is shifted by shift bits to the right and then range limited to 0..255. the output image is of the same size as input image and has one pixel border that is not computed. the first and the last row of output image is not changed, the first column is zeroed and th e last column will contain not meaningful results. the last column can be zeroed when zero_lastcol is defined to 1 for the penalty of one more instruction inside the inner loop. the zero_lastcol is a symbol defined in function source file. algorithm: a - input matrix, c - 3x3 convolution kernel, b - output matrix 1. compute 2-d convolution, i.e. for each i = 1,2 ... m-2, j = 1,2 ... n-2 compute b ij : equation 34 table 17. conv3x3 arguments a (1) 1. a and b must be word aligned. in input image of size mxn b (1) out output image of size mxn c (2) 2. c must be half word aligned and there must be 1 byte of readable memory behind c. in 3x3 convolution kernel (mask) m in number of rows of a and b, must be >= 3 n in number of columns of a and b, must be multiple of 4, must be >= 8 shift in number of bits result is shifted down c c 00 c 01 c 02 c 10 c 11 c 12 c 20 c 21 c 22 = b b 00 b 01 b 02 ? b 0n 1 ? b 10 b 11 b 12 ? b 1n 1 ? b 20 b 21 b 22 ? b 2n 1 ? ????? b m1 ? ?? 0 b m1 ? ?? 1 b m1 ? ?? 2 ? b m1n ? 1 ? = a a 00 a 01 a 02 ? a 0n 1 ? a 10 a 11 a 12 ? a 1n 1 ? a 20 a 21 a 22 ? a 2n 1 ? ????? a m1 ? ?? 0 a m1 ? ?? 1 a m1 ? ?? 2 ? a m1n ? 1 ? = b ij a ik1 ? + ?? jp1 ? + ?? c 2k ? ?? 2p ? ?? ? p0 = 2 ? k0 = 2 ? =
RM0020 function api 37/69 2. shift each computed bij by shift bits to the right: 3. range limit to 0..255: if ( b ij <0) then b ij =0, if ( b ij > 255) then b ij =255 note: the input/output images and convolution kernel must be properly aligned in memory as described in ta bl e 1 7 . and there must be 1 byte of readable memory behind the convolution kernel c. performance: see section 7 and ta b l e 4 8 . example 17. conv3x3 #include "libdsp2.h" /* #pragma and align macros used to align to 4 bytes */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) unsigned char a[16*12]; /* #pragma and align macros used to align to 2 bytes */ #ifdef __ghs__ #pragma alignvar (2) #endif align_dcc(2) signed char c[3*3] align_mwerks(2) = { -1, -2, -1, 0, 0, 0, 1, 2, 1, }; /* #pragma and align macros used to align to 4 bytes */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) unsigned char b[16*12] align_mwerks(4); unsigned short m, n, shift; void main(void) { /******** 2-d convolution with 3x3 kernel *********/ m = 16; n = 12; shift = 4; conv3x3(a, b, c, m, n, shift); } b ij b ij 2 shift ? =
function api RM0020 38/69 6.18 sobel3x3 - sobel filter function call: void sobel3x3(unsigned char *a, unsigned char *b, unsigned short m, unsigned short n); arguments: description: computes the sobel filter on mxn input image. the function adds together absolute values of horizontal and vertical sobel filters and then limits output to 0..255. the output image is of the same size as input image and has one pixel border that is not computed. the first and the last row of output image is not changed, the first column is zeroed and the last column will contain not mean ingful results. the last column can be zeroed when zero_lastcol is defined to 1 for the penalty of one more instruction inside the inner loop. the zero_lastcol is a symbol defined in function source file. algorithm: 1. compute horizontal sobel filter, i.e. compute 2-d convolution with the following kernel: 2. compute vertical sobel filter, i.e. comp ute 2-d convolution with the following kernel: 3. add absolute values of horizontal and vertical sobel filters 4. range limit to 0..255 note: the input/output images must be properly aligned in memory as described in ta bl e 1 8 . performance: see section 7 and ta b l e 4 9 . example 18. sobel3x3 #include "libdsp2.h" /* #pragma and align macros used to align to 4 bytes */ #ifdef __ghs__ #pragma alignvar (4) table 18. sobel3x3 arguments a (1) 1. a and b must be word aligned. in input image of size mxn b (1) out output image of size mxn m in number of rows of a and b, must be >= 3 n in number of columns of a and b, must be multiple of 4, must be >= 8 g y 1 ? 2 ? 1 ? 000 120 = g x 1 ? 01 2 ? 02 1 ? 01 =
RM0020 function api 39/69 #endif align_dcc(4) unsigned char a[16*12]; /* #pragma and align macros used to align to 4 bytes */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) unsigned char b[16*12] align_mwerks(4); unsigned short m, n, shift; void main(void) { /******** sobel filter *********/ m = 16; n = 12; sobel3x3(a, b, m, n); } 6.19 sobel3x3_horizontal - horizontal sobel filter function call: void sobel3x3_horizontal(unsigned char *a, unsigned char *b, unsigned short m, unsigned short n); arguments: description: computes the horizontal sobel filter on mxn input image. the absolute value of horizontal sobel filter result is range limited to 0..255. the output image is of the same size as input image and has one pixel border that is not computed. the first and the last row of output image is not changed, the first column is zeroed and the last column will contain not meaningful results. the last column can be zeroed when zero_lastcol is defined to 1 for the penalty of one more instruction inside the inner loop. the zero_lastcol is a symbol defined in function source file. table 19. sobel3x3_horizontal arguments a (1) 1. a and b must be word aligned. in input image of size mxn b (1) out output image of size mxn m in number of rows of a and b, must be >= 3 n in number of columns of a and b, must be multiple of 4, must be >= 8
function api RM0020 40/69 algorithm: 1. compute horizontal sobel filter, i.e. compute 2-d convolution with the following kernel: 2. get absolute value of result 3. range limit to 0..255 note: the input/output images must be properly aligned in memory as described in ta bl e 1 9 . performance: see section 7 and ta b l e 5 0 . example 19. sobel3x3_horizontal #include "libdsp2.h" /* #pragma and align macros used to align to 4 bytes */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) unsigned char a[16*12]; /* #pragma and align macros used to align to 4 bytes */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) unsigned char b[16*12] align_mwerks(4); unsigned short m, n, shift; void main(void) { /******** horizontal sobel filter *********/ m = 16; n = 12; sobel3x3_horizontal(a, b, m, n); } 6.20 sobel3x3_vertical - vertical sobel filter function call: void sobel3x3_vertical(unsigned char *a, unsigned char *b, unsigned short m, unsigned short n); g y 1 ? 2 ? 1 ? 000 120 =
RM0020 function api 41/69 arguments: description: computes the vertical sobel filter on mxn input image. the absolute value of vertical sobel filter result is range limited to 0..255. the output image is of the same size as input image and has one pixel border that is not computed. the first and the last row of output image is not changed, the first co lumn is zeroed and the last column will contain not meaningful results. the last column can be zeroed when zero_lastcol is defined to 1 for the penalty of one more instruction inside inner loop. the zero_lastcol is a symbol defined in function source file. algorithm: 1. compute vertical sobel filter, i.e. comput e 2-d convolution with the following kernel: 2. 2. get absolute value of result 3. 3. range limit to 0..255 note: the input/output images must be properly aligned in memory as described in ta bl e 2 0 . performance: see section 7 and ta bl e 5 1 . example 20. sobel3x3_vertical #include "libdsp2.h" /* #pragma and align macros used to align to 4 bytes */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) unsigned char a[16*12]; /* #pragma and align macros used to align to 4 bytes */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) unsigned char b[16*12] align_mwerks(4); unsigned short m, n, shift; table 20. sobel3x3_vertical arguments a (1) 1. a and b must be word aligned. in input image of size mxn b (1) out output image of size mxn m in number of rows of a and b, must be >= 3 n in number of columns of a and b, must be multiple of 4, must be >= 8 g x 1 ? 01 2 ? 02 1 ? 01 =
function api RM0020 42/69 void main(void) { /******** vertical sobel filter *********/ m = 16; n = 12; sobel3x3_vertical(a, b, m, n); } 6.21 corr_frac16 - frac16 correlation function function call: void corr_frac16(unsigned short n, unsigned short m, short *x, short *y, short *r); arguments: description: computes the correlation of the arrays x and y. the result is written to the array r. all data types are signed 16-bit fractional in range -1 to 1-2 -15 . there is used the arithmetic with saturation so the output will ne ver overflow. algorithm: correlation: equation 35 for each j = 0, 1, ..., m-1 note: the x and y can be the same array, in that case the auto-correlation is computed. performance: see section 7 and ta b l e 5 2 . example 21. corr_frac16 #include "libdsp2.h" /* #pragma and align macros used to align to 4 bytes */ #ifdef __ghs__ #pragma alignvar (4) #endif table 21. corr_frac16 arguments (1) 1. the x,y,r data are in fractional 16-bit format in range -1 to 1. the arrays x,r must be word aligned, the array y must be half-word aligned (implicit alignment of short data). n in length of input vectors x and y (must be multiple of 4) m in length of output vector r (must be multiple of 4, m <= n) x in first input vector of length n y in second input vector of length n r out correlation of vectors x and y out j ?? xj ?? y0 ?? xj 1 + ?? y1 ?? ... x n 1 ? ?? yn j ? 1 ? ?? +++ =
RM0020 function api 43/69 align_dcc(4) short x[8] align_mwerks(4) = {6554,10393,269,12670,2496,6233,9945,- 303}; short y[8] = {8192,-6113,3558,4066,-4590,10723,- 3612,6019}; /* #pragma and align macros used to align to 4 bytes */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) short r [8] align_mwerks(4); unsigned short m, n; void main(void) { /* compute correlation of x and y */ n = 16; m = 16; corr_frac16(n, m, x, y, r); } 6.22 fir_float - float fir filter function call: void fir_float (unsigned short n, unsigned short ntaps, float *x, float *y, float *h); arguments: description: computes the real fir filter on float data. the input samples are stored in the array x, output samples are written to the array y. table 22. fir_float arguments arguments (1) 1. the arrays x, y, h must be double-word aligned. there must be 8 bytes of readable memory behind the arrays h and x. n in length of output array y (must be multiple of 2) ntaps in number of filter coefficients (must be >= 4) xin array of input samples of length n+ntaps-1, x(-ntaps+1) x(-ntaps+2) ... x(0) x(1) ... x(n-1) y out array of output samples of length n, y(0) y(1) ... y(n-1) hin array of filter coefficients stored in reversed order, h(ntaps-1) ... h(1) h(0)
function api RM0020 44/69 algorithm: equation 36 for each n from 0 to n-1 note: the filter coefficients must be stored in reversed order. performance: see section 7 and ta b l e 5 3 . example 22. fir_float #include "libdsp2.h" /* #pragma and align macros used to align to 8 bytes */ #ifdef __ghs__ #pragma alignvar (8) #endif align_dcc(8) float x[288/*+2*/] align_mwerks(8); /* +2 to ensure there are 8 bytes of readable memory behind x */ #ifdef __ghs__ #pragma alignvar (8) #endif align_dcc(8) float y [256] align_mwerks(8); #ifdef __ghs__ #pragma alignvar (8) #endif align_dcc(8) float hr[33/*+2*/] align_mwerks(8) = { /* +2 to ensure there are 8 bytes of readable memory behind hr */ -0.0015288448f,-0.0019041377f,-0.0025120102f,-0.0031517229f,- 0.0033815748f,- 0.0025689987f,0.0000000000f,0.0049703708f,0.0127558533f,0.023399631 1f,0.0364901802f,0.0511553726f,0.0661431077f,0.0799794722f,0.091179 2726f,0.0984722196f,0.1010036179f,0.0984722196f,0.0911792726f,0.079 9794722f,0.0661431077f,0.0511553726f,0.0364901802f,0.0233996311f,0. 0127558533f,0.0049703708f,0.0000000000f,-0.0025689987f,- 0.0033815748f,-0.0031517229f,-0.0025120102f,-0.0019041377f,- 0.0015288448f, }; unsigned short n, ntaps; void main(void) { /* filter the samples in x with fir filter of 32nd order, */ /* write result into y, hr are filter coefficients stored */ /* in reversed order */ n = 256; yn ?? hk ?? xn k ? ?? ? k0 = ntaps 1 ? ? =
RM0020 function api 45/69 ntaps = 33; fir_float(n, ntaps, x, y, hr); } 6.23 fir_frac16 - frac16 fir filter function call: void fir_frac16(unsigned short n, unsigned short ntaps, short *x, short *y, short *h); arguments: description: computes the real fir filter on frac16 data. the input samples are stored in the array x, output samples are written to the array y. there is used arithmetic with saturation. algorithm: equation 37 for each n from 0 to n-1 note: the coefficients stored in the array h must be scaled to prevent a saturation, i.e. the sum of the absolute values of all coefficients must be lower than 2 15 : equation 38 the filter coefficients must be stored in reversed order. performance: see section 7 and ta b l e 5 4 . example 23. fir_frac16 #include "libdsp2.h" /* #pragma and align macros used to align to 4 bytes */ table 23. fir_frac16 arguments (1) 1. the x,y,h data are in fractional 16-bit format in range -1 to 1. the arrays x, y must be word aligned, the array h must be half-word aligned (implicit al ignment of short data). there must be 2 bytes of readable memory behind the array h and 4 byte s of readable memory behind the array x. n in length of output array y (must be multiple of 2) ntaps in number of filter coefficients (must be >= 4) xin array of input samples of length n+ntaps-1, x(-ntaps+1) x(-ntaps+2) ... x(0) x(1) ... x(n-1) y out array of output samples of length n, y(0) y(1) ... y(n-1) hin array of filter coefficients stored in reversed order, h(ntaps-1) ... h(1) h(0) yn ?? hk ?? xn k ? ?? ? k0 = ntaps 1 ? ? = hk ?? k0 = ntaps 1 ? ? 2 15 ?
function api RM0020 46/69 #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) short x[288/*+2*/] align_mwerks(4); /* +2 to ensure there are 4 bytes of readable memory behind x */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) short y[256] align_mwerks(4); short hr[33/*+1*/] = { /* +1 to ensure there are 2 bytes of readable memory behind hr */ -50,-62,-82,-103,-111,- 84,0,163,418,767,1196,1676,2167,2621,2988,3227,3310,3227,2988,2621, 2167,1676,1196,767,418,163,0,-84,-111,-103,-82,-62,-50, }; unsigned short n, ntaps; void main(void) { /* filter the samples in x with fir filter of 32nd order, */ /* write result into y, hr are filter coefficients stored */ /* in reversed order */ n = 256; ntaps = 33; fir_frac16(n, ntaps, x, y, hr); } 6.24 iir_float_1st - float first-order iir filter function call: void iir_float_1st (unsigned short n, float *x, float *y, float *c, float *s); arguments: table 24. iir_float_1st arguments (1) 1. the arrays x,y,s must be double-word aligned, the array c must be word aligned (implicit alignment of float data). there must be 16 bytes of readabl e memory behind the array x. there are returned new values of x(-1), y(-1) in the ar ray s to be ready for next function call. nin number of input and output samples (n must be multiple of 4, must be >= 8) x in array of input samples of length n, x(0) x(1) ... x(n-1) y out array of output samples of length n, y(0) y(1) ... y(n-1) c in array of filter coefficients, c[3] = {b0, b1, a1} s in/out array of previous input and output values (state vector), s[2] = {x(-1), y(-1)}
RM0020 function api 47/69 description: computes the first-order iir filter on float data. the input samples are stored in the array x, output samples are written to the array y. algorithm: equation 39 for each n from 0 to n-1 performance: see section 7 and ta bl e 5 5 . example 24. iir_float_1st #include "libdsp2.h" /* #pragma and align macros used to align to 8 bytes */ #ifdef __ghs__ #pragma alignvar (8) #endif align_dcc(8) float x[256/*+4*/] align_mwerks(8); /* +4 to ensure there are 16 bytes of readable memory behind x */ #ifdef __ghs__ #pragma alignvar (8) #endif align_dcc(8) float y [256] align_mwerks(8); #ifdef __ghs__ #pragma alignvar (8) #endif align_dcc(8) float s [2] align_mwerks(8) = {0,0}; float c [3] = {0.5157131330f,0.5157131330f,0.0314262660f}; unsigned short n; void main(void) { /* filter the samples in x with first-order iir filter, */ /* write result into y */ n = 256; iir_float_1st(n, x, y, c, s); } 6.25 iir_float_2nd - float second-order iir filter function call: void iir_float_2nd (unsigned short n, float *x, float *y, float *c, float *s); yn ?? b0 ?? xn ?? b1 ?? xn 1 ? ?? a1 ?? yn 1 ? ?? ? + =
function api RM0020 48/69 arguments: description: computes the second-order iir filter on float data. the input samples are stored in the array x, output samples are written to the array y. algorithm: equation 40 for each n from 0 to n-1 performance: see section 7 and ta b l e 5 6 . example 25. iir_float_2nd #include "libdsp2.h" /* #pragma and align macros used to align to 8 bytes */ #ifdef __ghs__ #pragma alignvar (8) #endif align_dcc(8) float x[256/*+4*/] align_mwerks(8); /* +4 to ensure there are 16 bytes of readable memory behind x */ #ifdef __ghs__ #pragma alignvar (8) #endif align_dcc(8) float y [256] align_mwerks(8); #ifdef __ghs__ #pragma alignvar (8) #endif align_dcc(8) float s[4] align_mwerks(8) = {0,0,0,0}; float c[5] = {0.3115387742f,0.6230775485f,0.3115387742f,0.0736238464f,0.17253125 05f}; table 25. iir_float_2nd arguments (1) 1. the arrays x,y,s must be double-word aligned, the array c must be word aligned (implicit alignment of float data). there must be 16 bytes of readabl e memory behind the array x. there are returned new values of x(-2), x(-1), y(-2), y(-1) in the array s to be ready for next function call. nin number of input and output samples (n must be multiple of 4, must be >= 8) x in array of input samples of length n, x(0) x(1) ... x(n-1) y out array of output samples of length n, y(0) y(1) ... y(n-1) c in array of filter coefficients, c[5] = {b0, b1, b2, a1, a2} s in/out array of previous input and output valu es (state vector), s[4] = {x(-2), x(-1), y(-2), y(-1)} yn ?? b0 ?? xn ?? b1 ?? xn 1 ? ?? b2 ?? xn 2 ? ?? a1 ?? yn 1 ? ?? ? a2 ?? yn 2 ? ?? ? ++ =
RM0020 function api 49/69 unsigned short n; void main(void) { /* filter the samples in x with second-order iir filter, */ /* write result into y */ n = 256; iir_float_2nd(n, x, y, c, s); } 6.26 iir_float_casc - cascade of float second-order iir filters function call: #define iir_float_casc iir_float_casc_c void iir_float_casc_c(unsigned short n, float *x, float *y, float *c, float *s, unsigned short m); arguments: description: computes the cascade of second-order iir filters on float data. the input samples are stored in the array x, output samples are written to the array y. the function calls the iir_float_2nd function for each second-order iir filter in the cascade. algorithm: the filter equation for the i-order iir filter (i = 2*m): table 26. iir_float_casc arguments (1) 1. the arrays x,y,s must be double-word aligned, the array c must be word aligned (implicit alignment of float data). there must be 16 bytes of r eadable memory behind the array x and y. there are returned new values of x(-2), x(-1), y(-2), y(-1 ) in the array s to be ready for next function call. nin number of input and output samples (n must be multiple of 4, must be >= 8) x in array of input samples of length n, x(0) x(1) ... x(n-1) y out array of output samples of length n, y(0) y(1) ... y(n-1) cin array of filter coefficients for each second-order iir filter, c[m*5] = {b10, b11, b12, a11, a12, b20, b21, b22, a21, a22, ..., bm0, bm1, bm2, am1, am2} s in/out array of previous input and output values (state vector), s[m*4] = {x1(-2), x1(-1), y1(-2), y1(-1), x2(-2), x2(-1), y2(-2), y2(-1), ..., xm(-2), xm(-1), ym(-2), ym(-1)} m in number of second-order iir filters in the cascade
function api RM0020 50/69 equation 41 for each n = 0,1...n-1 the i-order filter is implemented using the cascade of m iir filters of second-order: equation 42 equation 43 equation 44 for each n = 0,1...n-1 performance: see section 7 and ta b l e 5 7 . example 26. iir_float_casc #include "libdsp2.h" /* #pragma and align macros used to align to 8 bytes */ #ifdef __ghs__ #pragma alignvar (8) #endif align_dcc(8) float x[256/*+4*/] align_mwerks(8); /* +4 to ensure there are 16 bytes of readable memory behind x */ #ifdef __ghs__ #pragma alignvar (8) #endif align_dcc(8) float y [256/*+4*/] align_mwerks(8); /* +4 to ensure there are 16 bytes of readable memory behind y */ #ifdef __ghs__ #pragma alignvar (8) #endif align_dcc(8) float s[3*4] align_mwerks(8) = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; float c[3*5] = {0.3284466238f, 0.6596805377f, 0.3312418003f, 0.0639408215f, 0.0183196767f, 0.2531886900f, 0.5063763109f, 0.2531936959f, 0.0736238464f, 0.1725312505f, 0.4280607669f, 0.8524906977f, 0.4244401939f, 0.0998014847f, 0.5894355624f,}; yn ?? b0 ?? xn ?? b1 ?? xn 1 ? ??? ++ bi ?? xn i ? ?? a1 ?? yn 1 ? ?? ? ? ai ?? yn i ? ?? ? ? + = y 1 n ?? b10 ?? x 1 n ?? b11 ?? x 1 n1 ? ?? b12 ?? x 1 n2 ? ?? a11 ?? y 1 n1 ? ?? ? a12 ?? y 1 n2 ? ?? ? ++ = y 2 n ?? b20 ?? y 1 n ?? b21 ?? y 1 n1 ? ?? b22 ?? y 1 n2 ? ?? a21 ?? y 2 n1 ? ?? ? a22 ?? y 2 n2 ? ?? ? ++ = ? y m n ?? bm0 ?? y m1 ? n ?? bm1 ?? y m1 ? n1 ? ?? bm2 ?? y m1 ? n2 ? ?? am1 ?? y m n1 ? ?? ? am2 ?? y m n2 ? ?? ? ++ =
RM0020 function api 51/69 unsigned short n; void main(void) { /* filter the samples in x with sixth-order iir filter, */ /* write result into y */ n = 256; iir_float_casc(n, x, y, c, s, 3); } 6.27 iir_frac16_1st - frac1 6 first-order iir filter function call: void iir_frac16_1st(unsigned short n, short *x, short *y, short *c, short *s); arguments: description: computes the first-order iir filter on frac16 data. the input samples are stored in the array x, output samples are written to the array y. there is used the arithmetic with saturation so the output will never overflow. algorithm: equation 45 for each n from 0 to n-1 performance: see section 7 and ta b l e 5 8 . example 27. iir_frac16_1st #include "libdsp2.h" /* #pragma and align macros used to align to 4 bytes */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) table 27. iir_frac16_1st arguments (1) 1. the x,y,c,s data are in fractional 16-bit format in range -1 to 1. the arrays x,y,s must be word aligned, the array c must be half-word aligned (imp licit alignment of short data). there must be 4 bytes of readable memory behind the array x. there ar e returned new values of x(-1), y(-1) in the array s to be ready for next function call. n in number of input and output samples (n must be multiple of 4) x in array of input samples of le ngth n, x(0) x(1) ... x(n-1) y out array of output samples of le ngth n, y(0) y(1) ... y(n-1) c in array of filter coeffici ents, c[3] = {b0, b1, a1} s in/out array of previous input and output valu es (state vector), s[2] = {x(-1), y(-1)} yn ?? b0 ?? xn ?? b1 ?? xn 1 ? ?? a1 ?? yn 1 ? ?? ? + =
function api RM0020 52/69 short x[256/*+2*/] align_mwerks(4); /* +2 to ensure there are 4 bytes of readable memory behind x */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) short y[256] align_mwerks(4); #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) short s[2] align_mwerks(4) = {0,0}; short c[3] = {16899,16899,1030}; unsigned short n; void main(void) { /* filter the samples in x with first-order iir filter, */ /* write result into y */ n = 256; iir_frac16_1st(n, x, y, c, s); } 6.28 iir_frac16_2nd - frac 16 second-order iir filter function call: void iir_frac16_2nd(unsigned short n, short *x, short *y, short *c, short *s); arguments: description: computes the second-order iir filter on frac16 data. the input samples are stored in the array x, output samples are written to the array y. there is used the arithmetic with saturation so the output will never overflow. table 28. iir_frac16_2nd arguments (1) 1. the x,y,c,s data are in fractional 16-bit format in range -1 to 1. the arrays x,y,s must be word aligned, the array c must be half-word aligned (imp licit alignment of short data). there must be 4 bytes of readable memory behind the array x. there ar e returned new values of x(-2), x(-1), y(-2), y(-1) in the array s to be ready for next function call. n in number of input and output samp les (n must be multiple of 4) x in array of input samples of length n, x(0) x(1) ... x(n-1) y out array of output samples of length n, y(0) y(1) ... y(n-1) c in array of filter coefficients , c[5] = {b0, b1, b2, a1, a2} sin/out array of previous input and output val ues (state vector), s[4] = {x(-2), x(-1), y(-2), y(-1)}
RM0020 function api 53/69 algorithm: equation 46 for each n from 0 to n-1 performance: see section 7 and ta b l e 5 9 . example 28. iir_frac16_2nd #include "libdsp2.h" /* #pragma and align macros used to align to 4 bytes */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) short x[256/*+2*/] align_mwerks(4); /* +2 to ensure there are 4 bytes of readable memory behind x */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) short y[256] align_mwerks(4); #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) short s[4] align_mwerks(4) = {0,0,0,0}; short c[5] = {10209,20417,10209,2413,5654}; unsigned short n; void main(void) { /* filter the samples in x with second-order iir filter, */ /* write result into y */ n = 256; iir_frac16_2nd(n, x, y, c, s); } 6.29 iir_frac16_casc - cascade of frac16 second-ord er iir filters function call: #define iir_frac16_casc iir_frac16_casc_c void iir_frac16_casc_c(unsigned short n, short *x, short *y, short *c, short *s, unsigned short m); yn ?? b0 ?? xn ?? b1 ?? xn 1 ? ?? b2 ?? xn 2 ? ?? a1 ?? yn 1 ? ?? ? a2 ?? yn 2 ? ?? ? ++ =
function api RM0020 54/69 arguments: description: computes the cascade of second-order iir filters on frac16 data. the input samples are stored in the array x, output samples are written to the array y. the function calls the iir_float_2nd function for each second-order iir filter in the cascade. there is used the arithmetic with saturation so the output will never overflow. algorithm: the filter equation for the i-order iir filter (i = 2*m): equation 47 for each n = 0,1...n-1 the i-order filter is implemented using the cascade of m iir filters of second-order: equation 48 equation 49 equation 50 for each n = 0,1...n-1 performance: see section 7 and ta b l e 6 0 . table 29. iir_frac16_casc arguments (1) 1. the x,y,c,s data are in fractional 16-bit format in range -1 to 1. the arrays x,y,s must be word aligned, the array c must be half-word aligned (imp licit alignment of short data). there must be 4 bytes of readable memory behind the array x and y. t here are returned new val ues of x(-2), x(-1), y(-2), y(-1) in the array s to be ready for next function call. n in number of input and output samp les (n must be multiple of 4) x in array of input samples of length n, x(0) x(1) ... x(n-1) y out array of output samples of length n, y(0) y(1) ... y(n-1) cin array of filter coefficients for each second-order iir filter, c[m*5] = {b10, b11, b12, a11, a12, b20, b21, b22, a21, a22, ..., bm0, bm1, bm2, am1, am2} s in/out array of previous input and output values (state vector), s[m*4] = {x1(-2), x1(-1), y1(-2), y1(-1), x2(-2), x2(-1), y2(-2), y2(-1), ..., xm(-2), xm(-1), ym(-2), ym(-1)} m in number of second-order iir filters in the cascade yn ?? b0 ?? xn ?? b1 ?? xn 1 ? ??? ++ bi ?? xn i ? ?? a1 ?? yn 1 ? ?? ? ? ai ?? yn i ? ?? ? ? + = y 1 n ?? b10 ?? x 1 n ?? b11 ?? x 1 n1 ? ?? b12 ?? x 1 n2 ? ?? a11 ?? y 1 n1 ? ?? ? a12 ?? y 1 n2 ? ?? ? ++ = y 2 n ?? b20 ?? y 1 n ?? b21 ?? y 1 n1 ? ?? b22 ?? y 1 n2 ? ?? a21 ?? y 2 n1 ? ?? ? a22 ?? y 2 n2 ? ?? ? ++ = ? ?? bm0 ?? y m1 ? n ?? bm1 ?? y m1 ? n1 ? ?? bm2 ?? y m1 ? n2 ? ?? am1 ?? y m n1 ? ?? ? am2 ?? y m n2 ? ?? ? ++ =
RM0020 function api 55/69 example 29. iir_frac16_casc #include "libdsp2.h" /* #pragma and align macros used to align to 4 bytes */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) short x[256/*+2*/] align_mwerks(4); /* +2 to ensure there are 4 bytes of readable memory behind x */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) short y[256/*+2*/] align_mwerks(4); /* +2 to ensure there are 4 bytes of readable memory behind y */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) short s[3*4] align_mwerks(4) = {0,0,0,0,0,0,0,0,0,0,0,0}; short c[3*5] = {10763,21616,10854,2095,600, 8296,16593,8297,2413,5654, 14027,27934,13908,3270,19315,}; unsigned short n; void main(void) { /* filter the samples in x with sixth-order iir filter, */ /* write result into y */ n = 256; iir_frac16_casc(n, x, y, c, s, 3); } 6.30 iir_frac16_2nd_hc - frac16 second-order iir filter with half coefficients function call: void iir_frac16_2nd_hc(unsigned short n, short *x, short *y, short *c, short *s); arguments: dkkdkdkd returns:
function api RM0020 56/69 description: computes the second-order iir filter on frac16 data. the input samples are stored in the array x, output samples are written to the array y. the filter coefficients are stored in the array c divided by 2. there is used the arithmetic with saturation so the output will never overflow. algorithm: equation 51 for each n from 0 to n-1 note: the routine requires these conditions are met: -1 =< b1/2-a1*b0/2 < 1 -1 =< b2/2-a1*b1/2 < 1 -1 =< -a1*b2/2 < 1 -1 =< a2/2-a1*a1/2 < 1 -1 =< -a1*a2/2 < 1 performance: see section 7 and ta b l e 6 1 . example 30. iir_frac16_2nd_hc #include "libdsp2.h" /* #pragma and align macros used to align to 4 bytes */ #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) short x[256] align_mwerks(4); #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) short y[256] align_mwerks(4); table 30. iir_frac16_2nd_hc arguments (1) 1. the x,y,c,s data are in fractional 16-bit format in range -1 to 1. the arrays x,y,s must be word aligned, the array c must be half-word aligned (imp licit alignment of short data). there must be 4 bytes of readable memory behind the array x. there ar e returned new values of x(-2), x(-1), y(-2), y(-1) in the array s to be ready for next function call. nin number of input and output samples (n must be multiple of 4, n must be >= 8) x in array of input samples of length n, x(0) x(1) ... x(n-1) y out array of output samples of l ength n, y(0) y(1) ... y(n-1) cin array of filter coefficients divided by 2, c[5] = {b(0)/2, b(1)/2, b(2)/2, a(1)/2, a(2)/2} s in/out array of previous input and output valu es (state vector), s[4] = {x(-2), x(-1), y(-2), y(-1)} yn ?? 2 b0 ?? 2 ----------- xn ?? b1 ?? 2 ----------- xn 1 ? ?? b2 ?? 2 ----------- xn 2 ? ?? a1 ?? 2 ----------- yn 1 ? ?? ? a2 ?? 2 ----------- yn 2 ? ?? ? ++ ?? ?? =
RM0020 function api 57/69 #ifdef __ghs__ #pragma alignvar (4) #endif align_dcc(4) short s[4] align_mwerks(4) = {9830,13107,-3932,11665}; short c[5] = {10209,20417,10209,2413,5654}; unsigned short n; void main(void) { /* filter the samples in x with second-order iir filter, */ /* write result into y, coefficients in c are stored */ /* divided by 2 */ n = 256; iir_frac16_2nd_hc(n, x, y, c, s); }
performance RM0020 58/69 7 performance this section shows the code size and clock cycl es for each function. the clock cycles were measured for the following conditions: code memory (internal flash) - cache on data memory (internal ram) - cache on stack memory - locked in cache data acquired on the mpc5554 with siu_midr = 0x55540011 branch target buffer (btb) enabled biucr = 0x00094bfd the ?improvement to c function? column shows performance increase comparing the assembly code to a respective c function, the value is calculated as a ratio of the number_of_clock_cycles_of_c_ function/num ber_of_clock_cycles_of _optimized_ library_ function. the number of clock cycles are taken for the third function call. the c-code was compiled in codewarrior for powerpc v1.5 beta2, global optimizations set to level 4, optimize for faster execution speed, instruction scheduling and peephole optimization turned on. the ipc column shows instructions per cycle. table 31. code size function code size [bytes] bitrev_table_16bit 168 bitrev_table_32bit 168 fft_radix4_float 1216 fft_radix4_frac32 - scaling off 1224 fft_radix4_frac32 - scaling on 1344 fft_quad_float 1268 fft_quad_frac32 - scaling off 1276 fft_quad_frac32 - scaling on 1388 fft_radix2_last_stage_float 324 fft_radix2_last_stage_frac32 - scaling off 328 fft_radix2_last_stage_frac32 - scaling on 368 fft_real_split_float 452 (1) fft_real_split_frac32 - scaling off 452 (1) fft_real_split_frac32 - scaling on 460 (2) window_apply_complex_float 260 window_apply_complex_frac16 244 (3) window_apply_real_float 172 window_apply_real_frac16 172 mfb50_pmh 1100
RM0020 performance 59/69 mfb50_index 180 conv3x3 672 (4) sobel3x3 492 (4) sobel3x3_horizontal 348 (4) sobel3x3_vertical 352 (4) corr_frac16 440 fir_float 776 fir_frac16 568 iir_float_1st 324 iir_float_2nd 416 iir_float_casc (iir_float_casc_c) 124 (5) iir_frac16_1st 164 iir_frac16_2nd 208 iir_frac16_casc (iir_frac16_casc_c) 124 (5) iir_frac16_2nd_hc 284 1. n2_realpart configured to 0, for n2_realpart configured to 1 code size is 460 bytes. 2. the code size is the same for n2_realpart configured to 0 or 1. 3. valid for both scaling on and off. 4. zero_lastcol configured to 0, if configured to 1 t he size is 4 bytes greater for conv3x3 function and 8 bytes greater for th e sobel functions. 5. the iir_float_casc (iir_frac16_casc) function calls function iir_float_2nd (iir_frac16_2nd). the code size in the table is valid for the codewarrior compiler with optimizations set to level 4, instruction scheduling and peephole optimization turned on. table 31. code size (continued) function code size [bytes] table 32. radix-4 complex to complex float in-place fft (1) n (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 64 2546 2274 2259 2.57 0.97 256 12362 11653 11637 2.4 0.98 1024 59917 57374 56911 2.3 0.99 4096 307499 305244 304521 2.1 0.89 1. the clock cycles include bit reversing by bitrev_table_32bit and fft computation by fft_radix4_float function. w_table in internal flash, seed_table in internal flash. 2. n is fft length.
performance RM0020 60/69 table 33. quad radix-2 complex to complex float in-place fft (1) n (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 64 2683 2421 2415 2.2 0.96 256 12843 12196 12173 2.1 0.98 1024 62713 60186 60075 2.0 0.99 4096 320423 316501 316011 1.9 0.90 1. the clock cycles include bit reversing by bitrev_table_32bit and fft computati on by fft_quad_float function. w_table in internal flash, seed_table in internal flash. 2. n is fft length. table 34. radix-2 complex to complex float in-place fft (1) n (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 128 6197 5749 5739 2.1 0.98 512 29801 28519 28473 2.1 0.99 2048 144575 138638 137365 2.0 0.99 1. the clock cycles include bit reversin g by bitrev_table_32bit and fft co mputation by fft_quad_float function and fft_radix2_last_stage_float functi on. w_table in internal flash, seed_table in internal flash. 2. n is fft length. table 35. radix-4 complex to complex frac16/frac32 in-place fft, scaling on (1) n (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 64 2746 2480 2474 - 0.97 256 13411 12773 12740 - 0.98 1024 65051 62715 62334 - 0.99 4096 326464 324470 324053 - 0.91 1. the clock cycles include bit reversing by bitrev_table_16bit and fft computation by fft_radix4_frac32 function. w_table in internal flash, seed_table in internal flash. 2. n is fft length. table 36. quad radix-2 complex to complex frac16/frac32 in-place fft, scaling on (1) n (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 64 2901 2627 2620 - 0.97 256 13913 13234 13234 - 0.98 1024 67669 65418 65328 - 0.99 4096 340105 337619 337101 - 0.92
RM0020 performance 61/69 the number of clock cycles for third function call of the frac16/frac32 ffts without scaling is the same as for respective float fft plus n/8. 1. the clock cycles include bit reversing by bitrev_table_16bit and fft computati on by fft_quad_frac32 function. w_table in internal flash, seed_table in internal flash. 2. n is fft length. table 37. radix-2 complex to complex frac32 in-place fft, scaling on (1) n (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 128 6763 6270 6261 - 0.98 512 32312 31060 31043 - 0.99 2048 156208 151037 149960 - 0.99 1. the clock cycles include bit reversin g by bitrev_table_16bit and fft com putation by fft_quad_frac32 function and fft_radix2_last_stage_frac32 functi on. w_table in internal flash, seed_table in internal flash. 2. n is fft length. table 38. real to complex float in-place fft (n odd power of two) (1) n (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 128 3659 3258 3233 2.3 0.97 512 16522 15384 15251 2.3 0.98 2048 76073 72880 71331 2.2 0.99 1. the clock cycles include bit reversing by bitrev_table_32bit and fft computat ion by fft_radix4_float function and fft_real_split_float function. w_table in in ternal flash, seed_table in in ternal flash, wa_table and wb_table in internal flash . n2_realpart set to 0. 2. n is fft length. table 39. real to complex float in-place fft (n even power of two) (1) n (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 256 8288 7615 7593 2.1 0.98 1024 37929 35826 35607 2.0 0.99 4096 176668 173279 172939 1.9 0.95 1. the clock cycles include bit reversin g by bitrev_table_32bit and fft co mputation by fft_quad_float function, fft_radix2_last_stage_float function and fft_real _split_float function. w_table in inter nal flash, seed_table in internal flash , wa_table and wb_table in internal flash. n2_realpart set to 0. 2. n is fft length.
performance RM0020 62/69 the number of clock cycles for third function call of the frac16/frac32 real to complex in- place ffts without scaling is the same as for respective float fft plus n/16. table 40. real to complex frac16/frac32 in-place fft, scaling on (n odd power of two) (1) n (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 128 3881 3477 3450 - 0.97 512 17495 16492 16356 - 0.99 2048 81210 78233 76899 - 0.99 1. the clock cycles include bit reversing by bitrev_table_16bit and fft comput ation by fft_radix4_frac32 function and fft_real_split_frac32 function. w_ table in internal flash, seed_table in inter nal flash, wa_table and wb_table in internal flas h. n2_realpart set to 0. 2. n is fft length. table 41. real to complex frac16/frac32 in-place fft, scaling on (n even power of two) (1) n (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 256 8887 8172 8117 - 0.98 1024 40376 38527 38179 - 0.99 4096 188384 185207 184761 - 0.96 1. the clock cycles include bit reversin g by bitrev_table_16bit and fft co mputation by fft_quad_frac32 function, fft_radix2_last_stage_frac32 func tion and fft_real_split_frac32 function. w_table in internal flash, seed_table in internal flash, wa_table and wb_table in internal flash. n2_realpart set to 0. 2. n is fft length. table 42. complex float windowing function (1) n (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 64 815 666 666 2.0 0.99 128 1545 1295 1290 2.0 0.99 256 2991 2541 2538 2.0 0.99 512 5882 5078 5034 2.0 0.99 1024 11652 10192 10026 2.0 0.99 2048 23189 21928 21427 1.9 0.93 1. w vector in internal flash. 2. n is window length.
RM0020 performance 63/69 table 43. complex frac16 windowing function (1) n (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 64 691 608 602 - 0.99 128 1291 1162 1162 - 0.99 256 2502 2282 2282 - 0.99 512 4910 4522 4522 - 0.99 1024 9724 9022 9002 - 0.99 2048 19329 18061 17962 - 0.99 1. w vector in internal flash. the t able is valid for both scaling on and off. 2. n is window length. table 44. real float windowing function (1) n (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 64 298 188 188 3.1 0.86 128 524 360 360 3.1 0.85 256 965 704 704 3.1 0.84 512 1843 1398 1392 3.1 0.84 1024 3609 2801 2768 3.1 0.84 2048 7159 5658 5520 3.1 0.84 4096 14289 13145 12564 2.8 0.74 1. w vector in internal flash. 2. n is window length. table 45. real frac16 windowing function (1) n (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 64 264 188 188 - 0.86 128 469 360 360 - 0.85 256 862 704 704 - 0.84 512 1644 1392 1392 - 0.84 1024 3207 2777 2768 - 0.84 2048 6365 5540 5520 - 0.84 4096 12644 11143 11024 - 0.84 1. w vector in internal flash. 2. n is window length.
performance RM0020 64/69 note: the number of clock cycles for third function call for n greater than or equal 42 is num_of_cycles = 609+((n-2)/8-5)*102 (609 is nu mber of clock cycles for n = 42 and 102 is number of instructions in the loop). note: the number of clock cycles for third function call for n greater than or equal 18 is num_of_cycles = 91+((n-2)/4-4)*16 (91 is number of clock cycles for n = 18 and 16 is number of instructions in the loop). note: the number of instructions executed is 76+(m-2)*(39+53*(n-4)/4). table 46. mfb50_pmh - pressure sensing function 1 n (1) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 18 390 308 301 2.9 0.96 26 488 407 407 3.1 0.96 34 598 507 507 3.3 0.97 42 693 609 609 3.3 0.97 362 4999 4702 4689 3.7 1.00 722 9812 9301 9279 3.7 1.00 1. n is length of p_mfb and v vectors. table 47. mfb50_index - pressure sensing function 2 n (1) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 18 128 91 91 2.5 0.95 26 157 123 123 2.7 0.96 34 191 155 155 2.9 0.97 42 225 187 187 3.0 0.97 362 1585 1467 1467 3.4 1.00 722 3116 2907 2907 3.5 1.00 1. n is length of mfb vector. table 48. conv3x3 - 2-d convolution with 3x3 kernel (1) mxn (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 16x8 1453 1401 1399 3.0 0.97 8x16 1367 1320 1318 3.0 0.96 10x16 1767 1719 1717 3.1 0.97 32x32 12584 12484 12477 3.3 0.99 1. all measurements executed wi th convolution kernel c placed in internal ram. 2. mxn is size of input and output matrix (m is number of rows).
RM0020 performance 65/69 table 49. sobel3x3 - sobel filter (1) mxn (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 16x8 1304 1256 1248 3.0 0.98 8x16 1209 1162 1162 3.1 0.97 10x16 1569 1528 1528 3.1 0.97 32x32 11419 11318 11314 3.3 0.99 1. the number of instructions exec uted is 38+(m-2)*(37+48*(n-4)/4). 2. mxn is size of input and output matrix (m is number of rows). table 50. sobel3x3_horizontal - horizontal sobel filter (1) mxn (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 16x8 904 871 871 2.7 0.98 8x16 852 812 812 2.8 0.96 10x16 1100 1066 1066 2.8 0.96 32x32 7910 7820 7820 2.9 0.99 1. the number of instructions exec uted is 28+(m-2)*(26+33*(n-4)/4). 2. mxn is size of input and output matrix (m is number of rows). table 51. sobel3x3_vertical - vertical sobel filter (1) mxn (2) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 16x8 935 907 907 2.9 0.96 8x16 841 815 815 3.1 0.96 10x16 1099 1071 1071 3.1 0.97 32x32 7932 7847 7847 3.3 0.99 1. the number of instructions exec uted is 28+(m-2)*(27+33*(n-4)/4). 2. mxn is size of input and output matrix (m is number of rows). table 52. corr_frac16 - correlation function n (1) m (1) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 4 4 97 57 57 4.0 0.86 16 16 400 368 355 4.6 0.92 32 32 1168 1115 1115 4.8 0.96 128 128 15161 15083 15083 4.7 0.99 256 256 58938 58808 58795 4.6 1.00
performance RM0020 66/69 1. n - length of input vectors, m - length of output vector. table 53. fir_float - fir filter for float data n (1) ntaps (1) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 256 20 12974 12856 12862 3.0 0.98 256 22 13762 13633 13633 3.2 0.98 256 32 19119 19008 19006 3.1 0.99 256 64 35516 35392 35390 2.9 0.99 1. n - number of output samples, ntaps - number of filt er coefficients. table 54. fir_frac16 - fir filter for frac16 data n (1) ntaps (1) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 256 20 10631 10544 10550 3.8 0.97 256 22 11294 11196 11193 4.0 0.97 256 32 16018 15926 15926 3.8 0.98 256 64 30358 30262 30262 3.5 0.99 1. n - number of output samples, ntaps - number of filt er coefficients. table 55. iir_float_1st - first-order iir filter for float data n (1) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 128 1013 935 935 2.2 0.97 256 1946 1831 1831 2.2 0.97 1. n - number of input/output samples. table 56. iir_float_2nd - second-order iir filter for float data n (1) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 128 1265 1171 1171 2.5 1.00 256 2403 2291 2291 2.5 1.00 1. n - number of input/output samples.
RM0020 performance 67/69 table 57. iir_float_casc - cascade of second-order iir filters for float data n (1) m (1) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 256 1 2461 2339 2338 - 0.99 256 2 4799 4649 4640 - 0.99 256 3 7084 6936 6936 - 0.99 1. n - number of input/output samples, m - number of second-order filters. table 58. iir_frac16_1st - first-order iir filter for frac16 data n (1) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 128 758 703 703 2.5 1.00 256 1451 1375 1375 2.5 1.00 1. n - number of input/output samples. table 59. iir_frac16_2nd - second-order iir filter for frac16 data n (1) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 128 890 838 838 3.3 1.00 256 1719 1638 1638 3.3 1.00 1. n - number of input/output samples. table 60. iir_frac16_casc - cascade of second-order iir filters for frac16 data n (1) m (1) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 256 1 1777 1688 1685 - 0.99 256 2 3454 3340 3334 - 0.99 256 3 5099 4977 4977 - 0.99 1. n - number of input/output samples, m - number of second-order filters. table 61. iir_frac16_2nd_hc - second-order iir filter for frac16 data with half coefficients n (1) first function call [clock cycles] second function call [clock cycles] third function call [clock cycles] improvement to c function [-] ipc [-] 128 970 919 919 3.1 1.00 256 1855 1783 1783 3.2 1.00 1. n - number of input/output samples.
revision history RM0020 68/69 8 revision history table 62. document revision history date revision changes 30-jun-2008 1 initial release 18-sep-2013 2 updated disclaimer.
RM0020 69/69 please read carefully: information in this document is provided solely in connection with st products. stmicroelectronics nv and its subsidiaries (?st ?) reserve the right to make changes, corrections, modifications or improvements, to this document, and the products and services described he rein at any time, without notice. all st products are sold pursuant to st?s terms and conditions of sale. purchasers are solely responsible for the choice, selection and use of the st products and services described herein, and st as sumes no liability whatsoever relating to the choice, selection or use of the st products and services described herein. no license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted under this document. i f any part of this document refers to any third party products or services it shall not be deemed a license grant by st for the use of such third party products or services, or any intellectual property contained therein or considered as a warranty covering the use in any manner whatsoev er of such third party products or services or any intellectual property contained therein. unless otherwise set forth in st?s terms and conditions of sale st disclaims any express or implied warranty with respect to the use and/or sale of st products including without limitation implied warranties of merchantability, fitness for a particul ar purpose (and their equivalents under the laws of any jurisdiction), or infringement of any patent, copyright or other intellectual property right. st products are not designed or authorized for use in: (a) safety critical applications such as life supporting, active implanted devices or systems with product functional safety requirements; (b) aeronautic applications; (c) automotive applications or environments, and/or (d) aerospace applications or environments. where st products are not designed for such use, the purchaser shall use products at purchaser?s sole risk, even if st has been informed in writing of such usage, unless a product is expressly designated by st as being intended for ? automotive, automotive safe ty or medical? industry domains according to st product design specifications. products formally escc, qml or jan qualified are deemed suitable for use in aerospace by the corresponding governmental agency. resale of st products with provisions different from the statements and/or technical features set forth in this document shall immediately void any warranty granted by st for the st product or service described herein and shall not create or extend in any manner whatsoev er, any liability of st. st and the st logo are trademarks or registered trademarks of st in various countries. information in this document supersedes and replaces all information previously supplied. the st logo is a registered trademark of stmicroelectronics. all other names are the property of their respective owners. ? 2013 stmicroelectronics - all rights reserved stmicroelectronics group of companies australia - belgium - brazil - canada - china - czech republic - finland - france - germany - hong kong - india - israel - ital y - japan - malaysia - malta - morocco - philippines - singapore - spain - sweden - switzerland - united kingdom - united states of america www.st.com

▲Up To Search▲

Price & Availability of RM0020

	To Download RM0020 Datasheet File
If you can't view the Datasheet, Please click here to try to view without PDF Reader .