Abstract
The video compression algorithms based on the 3D wavelet transform obtain excellent compression rates at the expense of huge memory requirements, that drastically affects the execution time of such applications. Its objective is to allow the real-time video compression based on the 3D fast wavelet transform. We show the hardware and software interaction for this multimedia application on a general-purpose processor. First, we mitigate the memory problem by exploiting the memory hierarchy of the processor using several techniques. As for instance, we implement and evaluate the blocking technique. We present two blocking approaches in particular: cube and rectangular, both of which differ in the way the original working set is divided. We also put forward the reuse of previous computations in order to decrease the number of memory accesses and floating point operations. Afterwards, we present several optimizations that cannot be applied by the compiler due to the characteristics of the algorithm. On the one hand, the Streaming SIMD Extensions (SSE) are used for some of the dimensions of the sequence (y andtime), to reduce the number of floating point instructions, exploiting Data Level Parallelism. Then, we apply loop unrolling and data prefetching to specific parts of the code. On the other hand, the algorithm is vectorized by columns, allowing the use of SIMD instructions for they dimension. Results show speedups of 5x in the execution time over a version compiled with the maximum optimizations of the Intel C/C++ compiler, maintaining the compression ratio and the video quality (PSNR) of the original encoder based on the 3D wavelet transform. Our experiments also show that, allowing the compiler to perform some of these optimizations (i.e. automatic code vectorization), causes performance slowdown, demonstrating the effectiveness of our optimizations.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.
Similar content being viewed by others
References
I. Daubechies, “Ten Lectures on Wavelets,” Society for Industrial and Applied Mathematics, 1992.
M. Antonini and M. Barlaud, “Image Coding Using Wavelet Transform,”IEEE Transactions on Image Processing, vol. 1, no. 2, 1992, pp. 205–220.
A.S. Lewis and G. Knowles, “Image Compression Using the 2-d Wavelet Transform,”IEEE Transactions on Image Processing, vol. 1, no. 2, 1992, pp. 244–256.
J.M. Shapiro, “Embedded Image Coding Using Zerotrees of Wavelets Coefficients,”IEEE Transcations on Signal Processing, vol. 41, no. 12, 1993, pp. 3445–3462.
M.W. Marcellin, M.J. Gormish, A. Bilgin, and M.P. Boliek, “An Overview of jpeg-2000,” inProceedings of Data Compression Conference, March 2000.
D. Santa-Cruz and T. Ebrahimi, “A Study of jpeg 2000 Still Image Coding Versus Others Standards” inProc. of the X European Signal Processing Conference, September 2000.
M.L. Hilton, B.D. Jawerth, and A. Sengupta, “Compressing Still and Moving Images with Wavelets,”Multimedia Systems, vol. 2, no. 3, 1994.
S. Muraki, “Approximation and Rendering of Volume Data Using Wavelet Transforms,” inProceedings of Visualization, October 1992, pp. 21–28.
S. Muraki, “Multiscale Volume Representation by a Aog Wavelet,”IEEE Transactions on Visualization and Computer Graphics, vol. 1, no. 2, 1995, pp. 109–116.
Y. Chen and W.A. Pearlman, “Three-Dimensional Subband Coding of Video Using the Zero-Tree Method,” inProc. of SPIE-Visual Communications and Image Processing, March 1996, pp. 1302–1310.
B.-J. Kim and W.A. Pearlman, “An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (spiht),” inProceedings of Data Compression Conference, 1997.
S. Battista, F. Casalino, and C. Lande, “Mpeg-4: A Multimedia Standard for the Third Millenium, part 1,”IEEE Multimedia, vol. 6, no. 4, 1999, pp. 74–83.
S. Battista, F. Casalino, and C. Lande, “Mpeg-4: A Multimedia Standard for the Third Millenium, part 2,”IEEE Multimedia, vol. 7, no. 1, 2000, pp. 76–84.
G. Bernabé, J. González, J.M. García, and J. Duato, “A New Lossy 3-d Wavelet Transform for High-Quality Compression of Medical Video,” inProc. of IEEE EMBS International Conference on Information Technology Applications in Biomedicine, November 2000, pp. 226–231.
G. Bernabé, J. González, J.M. García, and J. Duato, “Enhancing the Entropy Encoder of a 3d-fwt for High-Quality Compression of Medical Video,” inProc. of IEEE International Symposium for Intelligent Signal Processing and Communication Systems, November 2001.
A. Bik, M. Girkar, P. Grey, and X. Tian, “Efficient Exploitation of Parallelism on Pentium iii and Pentium iv Processor-Based systems,” Available athttp://developer.intel.com/.
I. Corporation, “Intel C/C++ Compiler for Linux,” Available athttp://www.intel.com/software/products/compiler/c50/linux.
G. Bernabé, J. González, J.M. García, and J. Duato, “Memory Conscious 3d Wavelet Transform,” inProceedings of the 28th Euromicro Conference. Multimedia and Telecommunications, September 2002.
G. Bernabé, J.M. García, and J. González, “Reducing 3d Wavelet Transform Execution Time Through the Streaming Simd Extensions,” inProceedings of the 11th Euromicro Conference on Parallel Distributed and Network based Processing, February 2003.
I.J.W. (JPEG/JBIG). Fcd 14495, lossless and near-lossless coding of continuous tone still images (jpeg-ls).
S. Mallat, “A Theory for Multiresolution Signal Descomposition: The Wavelet Representation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, 1989, pp. 674–693.
N. Ahmed, N. Mateev, and K. Pingali, “Tiling Imperfectly-Nested Loop Nests,” inProceedings of Supercomputing, November 2000.
M.S. Lam, E.E. Rothberg, and M.E. Wolf, “The Cache Perfomance and Optimizations of Blocked Algorithms,”Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV), April 1991.
A.W. Lim, S.-W. Liao, and M.S. Lam, “Blocking and Array Contraction Across Arbitrarily Nested Loops Using Affine Patitioning,” inProceedings of the 8th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, June 2001.
J. Dongarra, J.D. Croz, I.S. Duff, and S. Hammarling, “A Set of Level 3 Basic Linear Algebra Subprogram,”ACM Trans. Math. Soft, vol. 14, 1988, pp. 1–17.
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J.D. Croz, A. Greenbaum, S. Hammarling, A.M. Kenney, and D. Sorensen, “Lapack: A Portable Linear Algebra Library for High-Performance Computers,”Tech. Report CS-90-105, (LAPACK Working Note #20), Univ. Of Tennessee, Knoxville, 1990.
R.C. Whaley, A. Petitet, and J.J. Dongarra, “Automated Empirical Optimizations of Software and the Atlas Project,”Parallel Computing, vol. 27, nos. 1–2, 2001, pp. 3–35.
C. Chrysafis and A. Ortega, “Line Based Reduced Memory Wavelet Image Compression,”IEEE Transactions on Image Processing, vol. 9, March 2000, pp. 378–389.
Y. Kim and W.A. Pearlman, “Stripe-Based Spiht Lossy Compression of Volumetric Medical Images for Low Memory Usage and Uniform Reconstruction Quality,” inProceedings of International Conference on Acoustics, Speech and Signal Processing, 2000.
C. Parisot, M. Antonini, and M. Barlaud, “3d Scan-Based wavelet Transform and Quality Control for Video Coding,”EURASIP Journal on Applied Signal Processing, 1, January 2003.
O. Lempel, A. Peleg, and U. Weiser, “Intel’s mmx Technology—A New Instruction Set,” inProceedings of 42nd IEEE Computer Society International Conference, 1997.
I. Corporation, “Ia-32 Intel Architecture Software Developer’s Manual,” Available athttp://developer.intel.com/.
P. Ranganathan, S. Adve, and N.P. Jouppi, “Performance of Image and Video Processing with General-Purpose Processors and Media isa Extensions,” international Symposium on Computer Architecture, May 1999.
L. Nachtergaele, G. Lafruit, J. Bormans, and I. Bolsens, “Fast Software Implementation of the mpeg-4 Reversible Integer Wavelet Transform on Pentium mmx, Sharc adsp and Trimedia tm1000,” inProceedings of Packet Video, 2000.
G. Conte, S. Tommesani, and F. Zanichelli, “The Long and Winding Road to High-Perfomance Image Processing with mmx/sse,” inProceedings of the Fifth IEEE International Workshop on Computer Architectures for Machine Perception, 2000.
S. Thakkar and T. Huff, “Internet Streaming simd Extensions,”IEEE Computer, vol. 32, 1999, pp. 26–34.
M.J. Wolfe,High Perfomance Compilers for Parallel Computer. Addison-Wesley Publishing Company, 1996.
D. Heller, Rabbit: A perfomance counters library for intel/amd processors and linux. Available athttp://www.scl.ameslab.gov/Projects/Rabbit/.
T. Sikora,MPEG Digital Video Coding Standars, McGraw Hill Company, 1997.
Author information
Authors and Affiliations
Dpto. Ingeniería y Tecnología de Computadores, Universidad de Murcia, 30071, Murcia, Spain
Gregorio Bernabé & José M. García
Intel Barcelona Research Center, Intel Labs, 08034, Barcelona, Spain
José González
- Gregorio Bernabé
You can also search for this author inPubMed Google Scholar
- José M. García
You can also search for this author inPubMed Google Scholar
- José González
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toGregorio Bernabé.
Additional information
Special Issue on Media and Communication Applications on General Purpose Processors: Hardware and Software Issues/Journal of VLSI Signal Processing Systems/Dr. Eric Debes, (Lead) Guest Editor. Contact Author: Gregorio Bernabé.
Gregorio Bernabé was born in Antibes (Alpes Maritimos, France) on 21 November 1974. He received the M.S. in Computer Science from the University of Murcia (Spain) in 1997. In 1998, he joined the Computer Engineering Department of the University of Murcia, where he is an Assistant Professor as well as a Ph. D. candidate. His current research interests include video compression using the Wavelet Transform, and the development of optimizations to improve the performance of the video compression algorithms based on the 3D wavelet transform.
Jose M. Garcia was born in Valencia, Spain on 9 January, 1962. He received the MS and the PhD degrees in electrical engineering from the Technical University of Valencia (Valencia, Spain), in 1987 and 1991, respectively. In 1987 he joined the Computer Science Department at the University of Castilla-La Mancha at the Campus of Albacete (Spain). From 1987 to 1993, he was an Assistant Professor of Computer Architecture. In 1994 he became an Associate Professor at the University of Murcia (Spain). From 1995 to 1997 he served as Vice-Dean of the School of Computer Science. At present, he is the Director of the Computer Engineering Department, and also the Head of the Research Group on Parallel Computing and Architecture. He has developed several courses on Computer Structure, Peripheral Devices, Computer Architecture and Multicomputer Design. His current research interests include Multiprocessors Systems, Interconnection Networks, File Systems, Grid Computing and its Application in Multimedia Systems. He has published over 45 refereed papers in different Journals and Conferences in these fields. Dr. Garcia is a member of several international associations as IEEE Computer Society, ACM, USENIX, and also a member of some European associations (Euromicro and ATI).
Pepe Gonzalez received the M.S. and Ph.D. degrees from the Universitat Politecnica de Catalunya (UPC). In January 2000, he joined the Computer Engineering Department of the University of Murcia, Spain, and became an Associate Professor in June 2001. In March 2002, he joined the Intel Barcelona Research Center, where he is a Senior Researcher. Currently, Pepe is working in new paradigms for the IA-32 family, in particular, Thermal-and Power-Aware clustered microarchitectures. pepe.gonzalez@intel.com
Rights and permissions
About this article
Cite this article
Bernabé, G., García, J.M. & González, J. Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions.J VLSI Sign Process Syst Sign Image Video Technol41, 209–223 (2005). https://doi.org/10.1007/s11265-005-6651-6
Received:
Revised:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative