- Per Gunnar Kjeldsberg1,
- Francky Catthoor2,3,
- Sven Verdoolaege3,4,
- Martin Palkovic2,
- Arnout Vandecappelle2,
- Qubo Hu1 &
- …
- Einar J. Aas1
129Accesses
2Citations
Abstract
Data dominated signal processing applications are typically described using large and multi-dimensional arrays and loop nests. The order of production and consumption of array elements in these loop nests has huge impact on the amount of memory required during execution. This is essential since the size and complexity of the memory hierarchy is the dominating factor for power, performance and chip size in these applications. This paper presents a number of guiding principles for the ordering of the dimensions in the loop nests. They enable the designer, or design tools, to find the optimal ordering of loop nest dimensions for individual data dependencies in the code. We prove the validity of the guiding principles when no prior restrictions are given regarding fixation of dimensions. If some dimensions are already fixed at given nest levels, this is taken into account when fixing the remaining dimensions. In most cases an optimal ordering is found for this situation as well. The guiding principles can be used in the early design phases in order to enable minimization of the memory requirement through in-place mapping. We use real life examples to show how they can be applied to reach a cost optimized end product. The results show orders of magnitude improvement in memory requirement compared to using the declared array sizes, and similar penalties for choosing the suboptimal ordering of loops when in-place mapping is exploited.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.











Similar content being viewed by others
Abbreviations
- DP:
dependency part
- LR:
length ratio
- DV:
dependency vector
- ND:
nonspanning dimension
- DVP:
dependency vector polytope
- SD:
spanning dimension
- ID:
iteration domain
- UB:
upper bound
- LB:
lower bound
References
Catthoor, F., Wuytack, S., De Greef, E., Balasa, F., Nachtergaele, L., & Vandecappelle, A. (1998).Custom memory management methodology—Exploration of memory organisation for embedded multimedia system design. Boston, USA: Kluwer.
Catthoor, F., Danckaert, K., Kulkarni, K. K., Brockmeyer, E., Kjeldsberg, P. G., van Achteren, T., et al. (2002).Data access and storage management for embedded programmable processors. Boston, USA: Kluwer.
Banerjee, U. (1988).Dependence analysis for supercomputing. Boston, USA: Kluwer.
Allen, J. R., & Kennedy, K. (1984). Automatic loop inter change.Proc. of the SIGPLAN’84 symposium on compiler construction, SIGPLAN Notices (Vol. 19, pp. 233–246) (June).
Pugh, W., & Wonnacott, D. (1993). An exact method for analysis of value-based array data dependences. InProc. 6th intnl. wsh. on languages and compilers for parallel computing. Portland OR, USA, (pp. 546–566) (August).
Vanbroekhoven, P., Janssens, G., Bruynooghe, M., Corporaal, H., & Catthoor, F. (2005). Transformation to dynamic single assignment using a simple data flow analysis. InProc. 3rd Asian symp. on programming languages and systems, APLAS’05, (Tsukuba, Japan), vol. 3780 ofLecture Notes Comp. Sc., Springer Verlag (pp. 330–346) (November).
Palkovic, M., Brockmeyer, E., Vanbroekhoven, P., Corporaal, H., & Catthoor, F. (2005). Systematic preprocessing of data dependent constructs for embedded systems. InProc. 15th intnl. wsh. on integrated circuit and system design, power and timing modeling, optimization and simulation (PATMOS), IEEE. Leuven, Belgium (pp. 89–98) (September).
Verbauwhede, I., Catthoor, F., Vandewalle, J., & De Man, H. (1989). Background memory management for the synthesis of algebraic algorithms on multi-processor dsp chips. InProc. VLSI’89, intnl. conf. on VLSI. Munich, Germany (pp. 209–218) (August).
Wolf, M. E., & Lam, M. S. (1991). A loop transformation theory and an algorithm to maximize parallelism.IEEE Transactions on Parallel and Distributed Systems, 2, 452–471) (October).
Kennedy, K., & McKinley, K. S. (1992). Optimizing for parallelism and data locality. InProc. of the 6th international conference on supercomputing. Washington, DC, USA (pp. 323–334) (August).
Clauss, P., & Loechner, V. (1998). Parametric analysis of polyhedral iteration spaces.Journal of VLSI Signal Processing, 19, 179–194 (July).
Allen, R. & Kennedy, K. (2002).Optimizing compilers for modern architectures. San Francisco, USA: KMorgan Kaufmann.
McKinley, K. S., Carr, S., & Tseng, C.-W. (1996). Improving data locality with loop transformations.ACM Transactions on Programming Languages and Systems, 18, 424–453 (July).
Danckaert, K., Catthoor, F., & De Man, H. (2000). A loop transformation approach for combined parallelization and data transfer and storage optimization. InProc. ACM conf. on par. and dist. proc. techniques and applications, PDPTA’00. Las Vegas NV, USA (pp. 2591–2597) (June).
Verdoolaege, S., Bruynooghe, M., Janssens, G., & Catthoor, F. (2003). Multi-dimensional incremental loop fusion for data locality. InProc. IEEE international conference on application-specific systems, architectures, and processors, ASAP’03. Leiden, The Netherlands (pp. 17–27) (June).
De Greef, E., Catthoor, F., & De Man, H. (1997). Array placement for storage size reduction in embedded multimedia systems. InProc. intnl. conf. on applic.-spec. systems arch. and processors. Zurich, Switzerland (pp. 66–75) (July).
Lefebvre, V., & Feautrier, P. (1997). Optimizing storage size for static control programs in automatic parallelizers. InProc. EuroPar conf., vol. 1300 ofLecture notes in computer science. Springer Verlag, Passau, Germany (pp. 356–363) (August).
Quillere, F., & Rajopadhye, S. (2000). Optimizing memory usage in the polyhedral model.ACM Transactions on Programming Languages and Systems, 22, 773–815 (September).
Darte, A., Schreiber, R., & Villard, G. (2005). Lattice-based memory allocation.IEEE Transactions on Computers, 54, 1242–1257 (October).
Chakrabarti, C. (2001). Cache design and exploration for low power embedded systems. InProc. intnl. conf. on performance, computing, and communications, IEEE. Phoenix, Arizona, USA (pp. 135–139) (April).
Kandemir, M., Ramanujam, J., Irwin, M. J., Vijaykrishnan, N., Kadayif, I., & Parikh, A. (2004). A compiler-based approach for dynamically managing scratch-pad memories in embedded systems.IEEE Transactions on Computer-Aided Design, 23, 243–260 (February).
Kirovski, D., Lee, C., Potkonjak, M., & Mangione-Smith, W. H. (1999). Application-driven synthesis of memory-intensive systems-on-chip.IEEE Transactions on Computer-Aided Design, 18, 1316–1326 (September).
Panda, P. R., Dutt, N. D., & Nicolau, A. (1999). Local memory exploration and optimization in embedded systems.IEEE Transactions on Computer-Aided Design, 18, 3–13 (January).
Kurdahi, F., & Parker, A. (1987). Real: A program for register allocation. InProc. 24th ACM/IEEE design automation conf. Miami FL, USA (pp. 210–215) (June).
Ohm, S. Y., Kurdahi, F. J., & Dutt, N. (1994). Comprehensive lower bound estimation from behavioral description. InIEEE/ACM Intnl. Conf. on Computer-Aided Design, IEEE. San Jose CA, USA (pp. 182–187) (November).
Paulin, P. G., & Knight, J. P. (1989). Force-directed scheduling for the behavioral synthesis of asics.IEEE Transactions on Computer-Aided Design, 8, 661–679 (June).
Tseng, C.-J., & Siewiorek, D. (1986). Automated synthesis of data paths in digital systems.IEEE Transactions on Computer-Aided Design, 5, 379–395 (July).
Gebotys, C. H., & Elmasry, M. I. (1991). Simultaneous scheduling and allocation for cost constrained optimal architectural synthesis. InProc. of the 28th ACM/IEEE design automation conf. San Jose CA, USA (pp. 2–7) (November).
Verbauwhede, I., Scheers, C., & Rabaey, J. (1994). Memory estimation for high-level synthesis. InProc. 31st ACM/IEEE design automation conf. San Diego CA, USA (pp. 143–148) (June).
Grun, P., Balasa, F., & Dutt, N. (1998). Memory size estimation for multimedia applications. InProc. ACM/IEEE wsh. on hardware/software co-design (Codes). Seattle WA, USA (pp. 145–149) (March).
Zhao, Y., & Malik, S. (1999). Exact memory size estimation for array computation without loop unrolling. In36th ACM/IEEE design automation conf. New Orleans, USA (pp. 811–816) (June).
Ramanujam, J., Hong, J., Kandemir, M., & Narayan, A. (2001). Reducing memory requirements of nested loops for embedded systems. In38th ACM/IEEE design automation conf. Las Vegas NV, USA (pp. 359–364) (June).
Balasa, F., Catthoor, F., & De Man, H. (1995). Background memory area estimation for multi-dimensional signal processing systems.IEEE Transactions on VLSI Systems, 3, 157–172 (June).
Balasa, F., Zhu, H., & Luican, I. (2007). Computation of storage requirements for multi-dimensional signal processing applications.IEEE Transactions on VLSI Systems, 15, 447–460 (April).
Hu, Q., Vandecappelle, A., Palkovic, M., Kjeldsberg, P. G., Brockmeyer, E., & Catthoor, F. (2006). Hierarchical memory size estimation for loop fusion and loop shifting in data-dominated applications. InProc. of the 11th Asia and South Pacific design automation conference, ASP-DAC 2006. Yokohama, Japan (pp. 606–611) (January).
Smailagic, A. (Guest ed.) (2001). Special issue on system level design.IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 9 (December).
Kjeldsberg, P. G., Catthoor, F., & Aas, E. J. (2003). Detection of partially simultaneously alive signals in storage requirement estimation for data-intensive applications.IEEE Transactions on Computer-Aided Design, 22, 908–921 (July).
Kjeldsberg, P. G., Catthoor, F., & Aas, E. J. (2004). Storage requirement estimation for optimized design of data intensive applications.ACM Transactions on Design Automation of Electronic Systems, 9, 133–158 (April).
van Swaaij, M. Franssen, F., Catthoor, F., & De Man, H. (1992). Modeling data flow and control flow for high level memory management. InProc. of the European conference on design automation. Brussels, Belgium (pp. 8–13) (March).
De Greef, E., Catthoor, F., & De Man, H. (1997). Memory size reduction through storage order optimization for embedded parallel multimedia applications.Elsevier Parallel Computing Journal, 23, 1811–1837 (December).
Shang, W., Hodzic, E., & Chen, Z. (1996). On uniformization of affine dependence algorithms.IEEE Transactions on Computers, 45, 827–840 (July).
Knuth, D. (1997).The art of computer programming, volume 3: Sorting and searching, third edition. Addison-Wesley.
IMEC (2007). Atomium web site.http://www.imec.be/design/atomium/.
Kjeldsberg, P. G., Catthoor, F., & Aas, E. J. (2001). Detection of partially simultaneously alive signals in storage requirement estimation for data-intensive applications. In38th ACM/IEEE design automation conf. Las Vegas N, USA (pp. 365–370) (June).
Kulkarni, D., & Stumm, M. (1993). Loop and data transformations: A tutorial. Tech. Rep. CSRI-337, Computer Systems Research Inst., Univ. of Toronto (June).
Moonen, M., Dooren, P. V., & Vandewalle, J. (1992). An svd updating algorithm for subspace tracking.SIAM Journal on Matrix Analysis and Applications, 13(4), 1015–1038.
Danckaert, K., Catthoor, F., & De Man, H. (2000). A preprocessing step for global loop transformations for data transfer and storage optimization. InProc. intnl. conf. on compilers, arch. and synth. for emb. sys. San Jose, CA, USA (pp. 34–40) (November).
Wuytack, S., Diguet, J. P., Catthoor, F., & De Man, H. (1998). Formalized methodology for data reuse exploration for low-power hierarchical memory mappings.IEEE Transactions on VLSI Systems, 6, 529–537 (December).
Author information
Authors and Affiliations
Norwegian University of Science and Technology, Trondheim, Norway
Per Gunnar Kjeldsberg, Qubo Hu & Einar J. Aas
IMEC, Leuven, Belgium
Francky Catthoor, Martin Palkovic & Arnout Vandecappelle
Katholieke Universiteit Leuven, Leuven, Belgium
Francky Catthoor & Sven Verdoolaege
Leiden Institute of Advanced Computer Science, Leiden, The Netherlands
Sven Verdoolaege
- Per Gunnar Kjeldsberg
You can also search for this author inPubMed Google Scholar
- Francky Catthoor
You can also search for this author inPubMed Google Scholar
- Sven Verdoolaege
You can also search for this author inPubMed Google Scholar
- Martin Palkovic
You can also search for this author inPubMed Google Scholar
- Arnout Vandecappelle
You can also search for this author inPubMed Google Scholar
- Qubo Hu
You can also search for this author inPubMed Google Scholar
- Einar J. Aas
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toPer Gunnar Kjeldsberg.
Rights and permissions
About this article
Cite this article
Kjeldsberg, P.G., Catthoor, F., Verdoolaege, S.et al. Guidance of Loop Ordering for Reduced Memory Usage in Signal Processing Applications.J Sign Process Syst Sign Image Video Technol53, 301–321 (2008). https://doi.org/10.1007/s11265-008-0178-6
Received:
Revised:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative