352Accesses
86Citations
6Altmetric
Abstract
Recent extensions to the Intel® Architecture feature the SIMD technique to enhance the performance of computational intensive applications that perform the same operation on different elements in a data set. To date, much of the code that exploits these extensions has been hand-coded. The task of the programmer is substantially simplified, however, if a compiler does this exploitation automatically. The high-performance Intel® C++/Fortran compiler supports automatic translation of serial loops into code that uses the SIMD extensions to the Intel® Architecture. This paper provides a detailed overview of the automatic vectorization methods used by this compiler together with an experimental validation of their effectiveness.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.
Similar content being viewed by others
REFERENCES
Michael J. Flynn,Computer Architecture, Jones and Bartlett Publishers, Boston, Massachusetts (1995).
John L. Hennessy and David A. Patterson,Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, San Mateo, California (1990).
Vipin Kumar, Ananth Grama, Anshul Gupta, and George Karypis,Introduction to Parallel Programming, The Benjamin/Cummings Publishing Company, Redwood City, California (1994).
Dezsö Sima, Terence Fountain, and Péter Kacsuk,Advanced Computer Architectures- A Design Space Approach, Addison-Wesley, Harlow England (1997).
R. M. Russel, The CRAY-1 Processor System,Comm. ACM21(1):63–72 (January 1978).
T. Blank, The Maspar MP-1 Architecture,Proc. IEEE Compcon Spring (February 1990).
David Bistryet al.,The Complete Guide to MMX− Technology, McGraw-Hill, New York (1997).
Intel Corporation,Intel Architecture MMX− Technology-Programmer's Reference Manual, Intel Corporation, Order No. 243007-003, available at http://developer.intel.com (1997).
Glenn Hinton, Dave Sager, Mike Upton, Darrell Boggs, Doug Carmean, Alan Kyker, and Patrice Roussel,The Microarchitecture of the Pentium® 4 Processor. Intel Technology Journal (2001), http://intel.com/technology/itj/.
Intel Corporation,Intel Architecture Software Developer's Manual, Volume 1: Basic Architecture, Intel Corporation, available at http://developer.intel.com/ (2001).
J. R. Allen and K. Kennedy, Automatic Translation of Fortran Programs to Vector Form,ACM Transactions on Programming Languages and Systems9:491–542 (1987).
David J. Kuck,The Structure of Computers and Computations, John Wiley and Sons, New York (1978), Vol. 1.
John M. Levesque and Joel W. Williamson,A Guidebook to Fortran on Supercomputers, Academic Press, San Diego (1991).
Constantine D. Polychronopoulos,Parallel Programming and Compilers, Kluwer, Boston (1988).
Michael J. Wolfe,High Performance Compilers for Parallel Computing, Addison-Wesley, Redwood City, California (1996).
Hans Zima,Supercompilers for Parallel and Vector Computers, ACM Press, New York (1990).
Alfred V. Aho, Ravi Sethi, and Jeffrey D. U llman,Compilers Principles, Techniques and Tools, Addison-Wesley (1986).
Andrew Appel,Modern Compiler Implementation in C, Cambridge University Press (1998).
Utpal Banerjee.Dependence Analysis, Kluwer, Boston, 1997. A Book Series on Loop Transformations for Restructuring Compilers.
Michael Burke and Ron Cytron, Interprocedural dependence analysis and parallelization,Proceedings of the Symposium on Compiler Construction, pp. 162–175 (1986).
C. N. Fisher and R. J. LeBlanc,Crafting a Compiler with C, Benjamin-Cummings, Menlo Park, California (1991).
Steven S. Muchnick,Advanced Compiler Design and Implementation, Morgan Kaufman Publishers (1997).
George B. Dantzig and B. Curtis Eaves, Fourier-Motzkin Elimination and Its Dual,J. Comb. Theory14:288–297 (1973).
Alexander Schrijver,Theory of Linear and Integer Programming, John Wiley and Sons, Chichester, England (1986).
Brian W. Kernighan and Dennis M. R itchie,The C Programming Language, Prentice-Hall, Englewood Cliffs, New Jersey (1988).
Aart J. C. Bik, Milind Girkar, Paul M. Grey, and Xinmin Tian,Automatic Detection of Saturation and Clipping Idioms for the Intel® Architecture, manuscript in preparation (2001).
Aart J. C. Bik, Milind Girkar, Paul M. Grey, and Xinmin Tian, Experiments with automatic vectorization for the Pentium® 4 Processor,Proceedings of the 9th Workshop on Compilers for Parallel Computers, pp. 1–10 (June 2001).
J. J. Dongarra, I. S. Duff, D. C. Sorensen, and H. A. van der Vorst,Solving linear systems on vector and shared memory computers, SIAM, Philadelphia, PA (1991).
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh, Basic Linear algebra subprograms for Fortran usage,ACM Transactions on Mathematical Software5:308–323 (1979).
R. J. Fisher and H. G. Dietz,Compiling for SIMD within a Register, 1998 Workshop on Languages and Compilers for Parallel Computing, University of North Carolina at Chapel Hill, North Carolina, August 7-9 (1998).
Samuel Larsen and Saman Amarasinghe, Exploiting Superword Level Parallelism with Multimedia Instruction Sets,Proceeding of the SIGPLAN Conference on Programming Language Design and Implementation, Vancouver, B.C. (June 2000).
Gilles Pokam, Julien Simonnet, and FranÇois Bodin, A Retargetable Preprocessor for Multimedia Instructions,Proceedings of the 9th Workshop on Compilers for Parallel Computers, pp. 291–301 (June 2001).
Aart J. C. Bik, Milind Girkar, Paul M. Grey, and Xinmin T ian,Efficient Exploitation of Parallelism on Pentium® III and Pentium® 4 Processor-Based Systems, Intel Technology Journal (2001), http://intel.com/technology/itj/.
Author information
Authors and Affiliations
Intel Corporation, 2200 Mission College Blvd. SC12-301, Santa Clara, California, 95052
Aart J. C. Bik, Milind Girkar, Paul M. Grey & Xinmin Tian
- Aart J. C. Bik
You can also search for this author inPubMed Google Scholar
- Milind Girkar
You can also search for this author inPubMed Google Scholar
- Paul M. Grey
You can also search for this author inPubMed Google Scholar
- Xinmin Tian
You can also search for this author inPubMed Google Scholar
Rights and permissions
About this article
Cite this article
Bik, A.J.C., Girkar, M., Grey, P.M.et al. Automatic Intra-Register Vectorization for the Intel® Architecture.International Journal of Parallel Programming30, 65–98 (2002). https://doi.org/10.1023/A:1014230429447
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative