Movatterモバイル変換

94Accesses
Explore all metrics

Abstract

The video compression algorithms based on the 3D wavelet transform obtain excellent compression rates at the expense of huge memory requirements, that drastically affects the execution time of such applications. Its objective is to allow the real-time video compression based on the 3D fast wavelet transform. We show the hardware and software interaction for this multimedia application on a general-purpose processor. First, we mitigate the memory problem by exploiting the memory hierarchy of the processor using several techniques. As for instance, we implement and evaluate the blocking technique. We present two blocking approaches in particular: cube and rectangular, both of which differ in the way the original working set is divided. We also put forward the reuse of previous computations in order to decrease the number of memory accesses and floating point operations. Afterwards, we present several optimizations that cannot be applied by the compiler due to the characteristics of the algorithm. On the one hand, the Streaming SIMD Extensions (SSE) are used for some of the dimensions of the sequence (y andtime), to reduce the number of floating point instructions, exploiting Data Level Parallelism. Then, we apply loop unrolling and data prefetching to specific parts of the code. On the other hand, the algorithm is vectorized by columns, allowing the use of SIMD instructions for they dimension. Results show speedups of 5x in the execution time over a version compiled with the maximum optimizations of the Intel C/C++ compiler, maintaining the compression ratio and the video quality (PSNR) of the original encoder based on the 3D wavelet transform. Our experiments also show that, allowing the compiler to perform some of these optimizations (i.e. automatic code vectorization), causes performance slowdown, demonstrating the effectiveness of our optimizations.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Single-Loop Architecture for JPEG 2000

Parallelization and improvement of the MDV-SW algorithm for HEVC intra-prediction coding

Article09 May 2018

Parallel Integer Motion Estimation for High Efficiency Video Coding (HEVC) Using OpenCL

References

I. Daubechies, “Ten Lectures on Wavelets,” Society for Industrial and Applied Mathematics, 1992.
M. Antonini and M. Barlaud, “Image Coding Using Wavelet Transform,”IEEE Transactions on Image Processing, vol. 1, no. 2, 1992, pp. 205–220.
Article Google Scholar
A.S. Lewis and G. Knowles, “Image Compression Using the 2-d Wavelet Transform,”IEEE Transactions on Image Processing, vol. 1, no. 2, 1992, pp. 244–256.
Article Google Scholar
J.M. Shapiro, “Embedded Image Coding Using Zerotrees of Wavelets Coefficients,”IEEE Transcations on Signal Processing, vol. 41, no. 12, 1993, pp. 3445–3462.
Article MATH Google Scholar
M.W. Marcellin, M.J. Gormish, A. Bilgin, and M.P. Boliek, “An Overview of jpeg-2000,” inProceedings of Data Compression Conference, March 2000.
D. Santa-Cruz and T. Ebrahimi, “A Study of jpeg 2000 Still Image Coding Versus Others Standards” inProc. of the X European Signal Processing Conference, September 2000.
M.L. Hilton, B.D. Jawerth, and A. Sengupta, “Compressing Still and Moving Images with Wavelets,”Multimedia Systems, vol. 2, no. 3, 1994.
S. Muraki, “Approximation and Rendering of Volume Data Using Wavelet Transforms,” inProceedings of Visualization, October 1992, pp. 21–28.
S. Muraki, “Multiscale Volume Representation by a Aog Wavelet,”IEEE Transactions on Visualization and Computer Graphics, vol. 1, no. 2, 1995, pp. 109–116.
Article MathSciNet Google Scholar
Y. Chen and W.A. Pearlman, “Three-Dimensional Subband Coding of Video Using the Zero-Tree Method,” inProc. of SPIE-Visual Communications and Image Processing, March 1996, pp. 1302–1310.
B.-J. Kim and W.A. Pearlman, “An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (spiht),” inProceedings of Data Compression Conference, 1997.
S. Battista, F. Casalino, and C. Lande, “Mpeg-4: A Multimedia Standard for the Third Millenium, part 1,”IEEE Multimedia, vol. 6, no. 4, 1999, pp. 74–83.
Article Google Scholar
S. Battista, F. Casalino, and C. Lande, “Mpeg-4: A Multimedia Standard for the Third Millenium, part 2,”IEEE Multimedia, vol. 7, no. 1, 2000, pp. 76–84.
Article Google Scholar
G. Bernabé, J. González, J.M. García, and J. Duato, “A New Lossy 3-d Wavelet Transform for High-Quality Compression of Medical Video,” inProc. of IEEE EMBS International Conference on Information Technology Applications in Biomedicine, November 2000, pp. 226–231.
G. Bernabé, J. González, J.M. García, and J. Duato, “Enhancing the Entropy Encoder of a 3d-fwt for High-Quality Compression of Medical Video,” inProc. of IEEE International Symposium for Intelligent Signal Processing and Communication Systems, November 2001.
A. Bik, M. Girkar, P. Grey, and X. Tian, “Efficient Exploitation of Parallelism on Pentium iii and Pentium iv Processor-Based systems,” Available athttp://developer.intel.com/.
I. Corporation, “Intel C/C++ Compiler for Linux,” Available athttp://www.intel.com/software/products/compiler/c50/linux.
G. Bernabé, J. González, J.M. García, and J. Duato, “Memory Conscious 3d Wavelet Transform,” inProceedings of the 28th Euromicro Conference. Multimedia and Telecommunications, September 2002.
G. Bernabé, J.M. García, and J. González, “Reducing 3d Wavelet Transform Execution Time Through the Streaming Simd Extensions,” inProceedings of the 11th Euromicro Conference on Parallel Distributed and Network based Processing, February 2003.
I.J.W. (JPEG/JBIG). Fcd 14495, lossless and near-lossless coding of continuous tone still images (jpeg-ls).
S. Mallat, “A Theory for Multiresolution Signal Descomposition: The Wavelet Representation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, 1989, pp. 674–693.
Article MATH Google Scholar
N. Ahmed, N. Mateev, and K. Pingali, “Tiling Imperfectly-Nested Loop Nests,” inProceedings of Supercomputing, November 2000.
M.S. Lam, E.E. Rothberg, and M.E. Wolf, “The Cache Perfomance and Optimizations of Blocked Algorithms,”Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV), April 1991.
A.W. Lim, S.-W. Liao, and M.S. Lam, “Blocking and Array Contraction Across Arbitrarily Nested Loops Using Affine Patitioning,” inProceedings of the 8th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, June 2001.
J. Dongarra, J.D. Croz, I.S. Duff, and S. Hammarling, “A Set of Level 3 Basic Linear Algebra Subprogram,”ACM Trans. Math. Soft, vol. 14, 1988, pp. 1–17.
Article MATH Google Scholar
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J.D. Croz, A. Greenbaum, S. Hammarling, A.M. Kenney, and D. Sorensen, “Lapack: A Portable Linear Algebra Library for High-Performance Computers,”Tech. Report CS-90-105, (LAPACK Working Note #20), Univ. Of Tennessee, Knoxville, 1990.
R.C. Whaley, A. Petitet, and J.J. Dongarra, “Automated Empirical Optimizations of Software and the Atlas Project,”Parallel Computing, vol. 27, nos. 1–2, 2001, pp. 3–35.
Article MATH Google Scholar
C. Chrysafis and A. Ortega, “Line Based Reduced Memory Wavelet Image Compression,”IEEE Transactions on Image Processing, vol. 9, March 2000, pp. 378–389.
Article MathSciNet MATH Google Scholar
Y. Kim and W.A. Pearlman, “Stripe-Based Spiht Lossy Compression of Volumetric Medical Images for Low Memory Usage and Uniform Reconstruction Quality,” inProceedings of International Conference on Acoustics, Speech and Signal Processing, 2000.
C. Parisot, M. Antonini, and M. Barlaud, “3d Scan-Based wavelet Transform and Quality Control for Video Coding,”EURASIP Journal on Applied Signal Processing, 1, January 2003.
O. Lempel, A. Peleg, and U. Weiser, “Intel’s mmx Technology—A New Instruction Set,” inProceedings of 42nd IEEE Computer Society International Conference, 1997.
I. Corporation, “Ia-32 Intel Architecture Software Developer’s Manual,” Available athttp://developer.intel.com/.
P. Ranganathan, S. Adve, and N.P. Jouppi, “Performance of Image and Video Processing with General-Purpose Processors and Media isa Extensions,” international Symposium on Computer Architecture, May 1999.
L. Nachtergaele, G. Lafruit, J. Bormans, and I. Bolsens, “Fast Software Implementation of the mpeg-4 Reversible Integer Wavelet Transform on Pentium mmx, Sharc adsp and Trimedia tm1000,” inProceedings of Packet Video, 2000.
G. Conte, S. Tommesani, and F. Zanichelli, “The Long and Winding Road to High-Perfomance Image Processing with mmx/sse,” inProceedings of the Fifth IEEE International Workshop on Computer Architectures for Machine Perception, 2000.
S. Thakkar and T. Huff, “Internet Streaming simd Extensions,”IEEE Computer, vol. 32, 1999, pp. 26–34.
Article Google Scholar
M.J. Wolfe,High Perfomance Compilers for Parallel Computer. Addison-Wesley Publishing Company, 1996.
D. Heller, Rabbit: A perfomance counters library for intel/amd processors and linux. Available athttp://www.scl.ameslab.gov/Projects/Rabbit/.
T. Sikora,MPEG Digital Video Coding Standars, McGraw Hill Company, 1997.

Download references

Author information

Authors and Affiliations

Dpto. Ingeniería y Tecnología de Computadores, Universidad de Murcia, 30071, Murcia, Spain
Gregorio Bernabé & José M. García
Intel Barcelona Research Center, Intel Labs, 08034, Barcelona, Spain
José González

Authors

Gregorio Bernabé
View author publications
You can also search for this author inPubMed Google Scholar
José M. García
View author publications
You can also search for this author inPubMed Google Scholar
José González
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toGregorio Bernabé.

Additional information

Special Issue on Media and Communication Applications on General Purpose Processors: Hardware and Software Issues/Journal of VLSI Signal Processing Systems/Dr. Eric Debes, (Lead) Guest Editor. Contact Author: Gregorio Bernabé.

Gregorio Bernabé was born in Antibes (Alpes Maritimos, France) on 21 November 1974. He received the M.S. in Computer Science from the University of Murcia (Spain) in 1997. In 1998, he joined the Computer Engineering Department of the University of Murcia, where he is an Assistant Professor as well as a Ph. D. candidate. His current research interests include video compression using the Wavelet Transform, and the development of optimizations to improve the performance of the video compression algorithms based on the 3D wavelet transform.

Jose M. Garcia was born in Valencia, Spain on 9 January, 1962. He received the MS and the PhD degrees in electrical engineering from the Technical University of Valencia (Valencia, Spain), in 1987 and 1991, respectively. In 1987 he joined the Computer Science Department at the University of Castilla-La Mancha at the Campus of Albacete (Spain). From 1987 to 1993, he was an Assistant Professor of Computer Architecture. In 1994 he became an Associate Professor at the University of Murcia (Spain). From 1995 to 1997 he served as Vice-Dean of the School of Computer Science. At present, he is the Director of the Computer Engineering Department, and also the Head of the Research Group on Parallel Computing and Architecture. He has developed several courses on Computer Structure, Peripheral Devices, Computer Architecture and Multicomputer Design. His current research interests include Multiprocessors Systems, Interconnection Networks, File Systems, Grid Computing and its Application in Multimedia Systems. He has published over 45 refereed papers in different Journals and Conferences in these fields. Dr. Garcia is a member of several international associations as IEEE Computer Society, ACM, USENIX, and also a member of some European associations (Euromicro and ATI).

Pepe Gonzalez received the M.S. and Ph.D. degrees from the Universitat Politecnica de Catalunya (UPC). In January 2000, he joined the Computer Engineering Department of the University of Murcia, Spain, and became an Associate Professor in June 2001. In March 2002, he joined the Intel Barcelona Research Center, where he is a Senior Researcher. Currently, Pepe is working in new paradigms for the IA-32 family, in particular, Thermal-and Power-Aware clustered microarchitectures. pepe.gonzalez@intel.com

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bernabé, G., García, J.M. & González, J. Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions.J VLSI Sign Process Syst Sign Image Video Technol41, 209–223 (2005). https://doi.org/10.1007/s11265-005-6651-6

Download citation

Received:13 February 2003
Revised:29 December 2003
Accepted:30 July 2004
Published:01 September 2005
Issue Date:September 2005
DOI:https://doi.org/10.1007/s11265-005-6651-6

Movatterモバイル変換

Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Single-Loop Architecture for JPEG 2000

Parallelization and improvement of the MDV-SW algorithm for HEVC intra-prediction coding

Parallel Integer Motion Estimation for High Efficiency Video Coding (HEVC) Using OpenCL

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Access this article

Subscribe and save

Buy Now