- Michael B. Taylor4,
- Walter Lee5,
- Jason E. Miller6,
- David Wentzlaff6,
- Ian Bratt5,
- Ben Greenwald7,
- Henry Hoffmann6,
- Paul R. Johnson6,
- Jason S. Kim6,
- James Psota6,
- Arvind Saraf8,
- Nathan Shnidman9,
- Volker Strumpen10,
- Matthew I. Frank11,
- Saman Amarasinghe6 &
- …
- Anant Agarwal6
Part of the book series:Integrated Circuits and Systems ((ICIR))
2048Accesses
Abstract
For the last few decades Moore’s Law has continually provided exponential growth in the number of transistors on a single chip. This chapter describes a class of architectures, calledtiled multicore architectures, that are designed to exploit massive quantities of on-chip resources in an efficient, scalable manner. Tiled multicore architectures combine each processor core with a switch to create a modular element called a tile. Tiles are replicated on a chip as needed to create multicores with any number of tiles. The Raw processor, a pioneering example of a tiled multicore processor, is examined in detail to explain the philosophy, design, and strengths of such architectures. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications than existing microprocessors, while still running existing ILP-based sequential programs with reasonable performance. Central to achieving this goal is Raw’s ability to exploit all forms of parallelism, including ILP, DLP, TLP, and Stream parallelism. Raw approaches this challenge by implementing plenty of on-chip resources – including logic, wires, and pins – in a tiled arrangement, andexposing them through a new ISA, so that the software can take advantage of these resources for parallel applications. Compared to a traditional superscalar processor, Raw performs within a factor of 2x for sequential applications with a very low degree of ILP, about 2x–9x better for higher levels of ILP, and 10x–100x better when highly parallel applications are coded in a stream language or optimized by hand.
Based on “Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams”, by M.B. Taylor, W. Lee, J.E. Miller, et al. which appeared in The 31st Annual International Symposium on Computer Architecture (ISCA). © 2004 IEEE. [46]
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 17159
- Price includes VAT (Japan)
- Softcover Book
- JPY 21449
- Price includes VAT (Japan)
- Hardcover Book
- JPY 21449
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A. Agarwal and M. Levy. Going multicore presents challenges and opportunities.Embedded Systems Design, 20(4), April 2007.
V. Agarwal, M. S. Hrishikesh, S. W. Keckler, and D. Burger. Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures. InISCA ’00: Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 248–259, 2000.
E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum, A. McKenney, J. Du Croz, S. Hammerling, J. Demmel, C. Bischof, and D. Sorensen. LAPACK: A Portable Linear Algebra Library for High-Performance Computers. InSupercomputing ’90: Proceedings of the 1990 ACM/IEEE Conference on Supercomputing, pages 2–11, 1990.
M. Annaratone, E. Arnould, T. Gross, H. T. Kung, M. Lam, O. Menzilicioglu, and J. A. Webb. The Warp Computer: Architecture, Implementation and Performance.IEEE Transactions on Computers, 36(12):1523–1538, December 1987.
J. Babb, M. Frank, V. Lee, E. Waingold, R. Barua, M. Taylor, J. Kim, S. Devabhaktuni, and A. Agarwal. The RAW Benchmark Suite: Computation Structures for General Purpose Computing. InProceedings of the IEEE Workshop on FPGAs for Custom Computing Machines (FCCM), pages 134–143, 1997.
M. Baron. Low-key Intel 80-core Intro: The tip of the iceberg.Microprocessor Report, April 2007.
M. Baron. Tilera’s cores communicate better.Microprocessor Report, November 2007.
R. Barua, W. Lee, S. Amarasinghe, and A. Agarwal. Maps: A Compiler-Managed Memory System for Raw Machines. InISCA ’99: Proceedings of the 26th Annual International Symposium on Computer Architecture, pages 4–15, 1999.
M. Bohr. Interconnect Scaling – The Real Limiter to High Performance ULSI. In1995 IEDM, pages 241–244, 1995.
P. Bose, D. H. Albonesi, and D. Marculescu. Power and complexity aware design.IEEE Micro: Guest Editor’s Introduction for Special Issue on Power and Complexity Aware Design, 23(5):8–11, Sept/Oct 2003.
S. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. R. Taylor, and R. Laufer. PipeRench: A Coprocessor for Streaming Multimedia Acceleration. InISCA ’99: Proceedings of the 26th Annual International Symposium on Computer Architecture, pages 28–39, 1999.
M. Gordon, W. Thies, and S. Amarasinghe. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. InASPLOS-XII: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 75–86, October 2006.
M. I. Gordon, W. Thies, M. Karczmarek, J. Lin, A. S. Meli, A. A. Lamb, C. Leger, J. Wong, H. Hoffmann, D. Maze, and S. Amarasinghe. A Stream Compiler for Communication-Exposed Architectures. InASPLOS-X: Proceedings of the Tenth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 291–303, 2002.
T. Gross and D. R. O’Halloron.iWarp, Anatomy of a Parallel Computing System. The MIT Press, Cambridge, MA, 1998.
J. R. Hauser and J. Wawrzynek. Garp: A MIPS Processor with Reconfigurable Coprocessor. InProceedings of the IEEE Workshop on FPGAs for Custom Computing Machines (FCCM), pages 12–21, 1997.
R. Ho, K. W. Mai, and M. A. Horowitz. The Future of Wires.Proceedings of the IEEE, 89(4):490–504, April 2001.
H. Hoffmann, V. Strumpen, A. Agarwal, and H. Hoffmann. Stream Algorithms and Architecture. Technical Memo MIT-LCS-TM-636, MIT Laboratory for Computer Science, 2003.
H. P. Hofstee. Power efficient processor architecture and the Cell processor. InHPCA ’05: Proceedings of the 11th International Symposium on High Performance Computer Architecture, pages 258–262, 2005.
U. Kapasi, W. J. Dally, S. Rixner, J. D. Owens, and B. Khailany. The Imagine Stream Processor. InICCD ’02: Proceedings of the 2002 IEEE International Conference on Computer Design, pages 282–288, 2002.
A. KleinOsowski and D. Lilja. MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research.Computer Architecture Letters, 1, June 2002.
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded Sparc Processor.IEEE Micro, 25(2):21–29, 2005.
C. Kozyrakis and D. Patterson. A New Direction for Computer Architecture Research.IEEE Computer, 30(9):24–32, September 1997.
R. Krashinsky, C. Batten, M. Hampton, S. Gerding, B. Pharris, J. Casper, and K. Asanovic. The Vector-Thread Architecture. InISCA ’04: Proceedings of the 31st Annual International Symposium on Computer Architecture, June 2004.
J. Kubiatowicz.Integrated Shared-Memory and Message-Passing Communication in the Alewife Multiprocessor. PhD thesis, Massachusetts Institute of Technology, 1998.
W. Lee, R. Barua, M. Frank, D. Srikrishna, J. Babb, V. Sarkar, and S. Amarasinghe. Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine. InASPLOS-VIII: Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 46–54, 1998.
W. Lee, D. Puppin, S. Swenson, and S. Amarasinghe. Convergent Scheduling. InMICRO-35: Proceedings of the 35th Annual International Symposium on Microarchitecture, pages 111–122, 2002.
K. Mai, T. Paaske, N. Jayasena, R. Ho, W. J. Dally, and M. Horowitz. Smart Memories: A Modular Reconfigurable Architecture. InISCA ’00: Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 161–171, 2000.
D. Matzke. Will Physical Scalability Sabotage Performance Gains?IEEE Computer, 30(9):37–39, September 1997.
J. McCalpin. STREAM: Sustainable Memory Bandwidth in High Performance. Computers.http://www.cs.virginia.edu/stream.
J. E. Miller.Software Instruction Caching. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, June 2007.http://hdl.handle.net/1721.1/40317.
C. A. Moritz, D. Yeung, and A. Agarwal. SimpleFit: A Framework for Analyzing Design Tradeoffs in Raw Architectures.IEEE Transactions on Parallel and Distributed Systems, pages 730–742, July 2001.
S. Naffziger, G. Hammond, S. Naffziger, and G. Hammond. The Implementation of the Next-Generation 64b Itanium Microprocessor. InProceedings of the IEEE International Solid-State Circuits Conference, pages 344–345, 472, 2002.
R. Nagarajan, K. Sankaralingam, D. Burger, and S. W. Keckler. A Design Space Evaluation of Grid Processor Architectures. InMICRO-34: Proceedings of the 34th Annual International Symposium on Microarchitecture, pages 40–51, 2001.
M. Narayanan and K. A. Yelick. Generating Permutation Instructions from a High-Level Description. TR UCB-CS-03-1287, UC Berkeley, 2003.
S. Palacharla.Complexity-Effective Superscalar Processors. PhD thesis, University of Wisconsin–Madison, 1998.
J. Sanchez and A. Gonzalez. Modulo Scheduling for a Fully-Distributed Clustered VLIW Architecture. InMICRO-33: Proceedings of the 33rd Annual International Symposium on Microarchitecture, pages 124–133, December 2000.
K. Sankaralingam, R. Nagarajan, R. McDonald, R. Desikan, S. Drolia, M. S. Govindan, P. Gratz, D. Gulati, H. Hanson, C. Kim, H. Liu, N. Ranganathan, S. Sethumadhavan, S. Sharif, P. Shivakumar, S. W. Keckler, and D. Burger. Distributed microarchitectural protocols in the TRIPS prototype processor. InMICRO-39: Proceedings of the 39th Annual International Symposium on Microarchitecture, pages 480–491, Dec 2006.
D. Shoemaker, F. Honore, C. Metcalf, and S. Ward. NuMesh: An Architecture Optimized for Scheduled Communication.Journal of Supercomputing, 10(3):285–302, 1996.
G. Sohi, S. Breach, and T. Vijaykumar. Multiscalar Processors. InISCA ’95: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 414–425, 1995.
J. Suh, E.-G. Kim, S. P. Crago, L. Srinivasan, and M. C. French. A Performance Analysis of PIM, Stream Processing, and Tiled Processing on Memory-Intensive Signal Processing Kernels. InISCA ’03: Proceedings of the 30th Annual International Symposium on Computer Architecture, pages 410–419, June 2003.
M. B. Taylor. Deionizer: A Tool for Capturing and Embedding I/O Calls. Technical Report MIT-CSAIL-TR-2004-037, MIT CSAIL/Laboratory for Computer Science, 2004.http://cag.csail.mit.edu/∼mtaylor/deionizer.html.
M. B. Taylor.Tiled Processors. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, Feb 2007.
M. B. Taylor, J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B. Greenwald, H. Hoffman, J.-W. Lee, P. Johnson, W. Lee, A. Ma, A. Saraf, M. Seneski, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, and A. Agarwal. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs.IEEE Micro, pages 25–35, Mar 2002.
M. B. Taylor, W. Lee, S. Amarasinghe, and A. Agarwal. Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures. InHPCA ’03: Proceedings of the 9th International Symposium on High Performance Computer Architecture, pages 341–353, 2003.
M. B. Taylor, W. Lee, S. Amarasinghe, and A. Agarwal. Scalar Operand Networks.IEEE Transactions on Parallel and Distributed Systems (Special Issue on On-chip Networks), Feb 2005.
M. B. Taylor, W. Lee, J. E. Miller, D. Wentzlaff, I. Bratt, B. Greenwald, H. Hoffmann, P. Johnson, J. Kim, J. Psota, A. Saraf, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, and A. Agarwal. Evaluation of the Raw microprocessor: An exposed-wire-delay architecture for ILP and streams. InISCA ’04: Proceedings of the 31st Annual International Symposium on Computer Architecture, pages 2–13, June 2004.
W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In2002 Compiler Construction, pages 179–196, 2002.
E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it All to Software: Raw Machines.IEEE Computer, 30(9):86–93, Sep 1997.
D. Wentzlaff. Architectural Implications of Bit-level Computation in Communication Applications. Master’s thesis, Massachusetts Institute of Technology, 2002.
D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J. F. Brown, and A. Agarwal. On-Chip Interconnection Architecture of the Tile Processor.IEEE Micro, 27(5):15–31, Sept–Oct 2007.
R. Whaley, A. Petitet, J. J. Dongarra, and Whaley. Automated Empirical Optimizations of Software and the ATLAS Project.Parallel Computing, 27(1–2):3–35, 2001.
Acknowledgments
We thank our StreamIt collaborators, specifically M. Gordon, J. Lin, and B. Thies for the StreamIt backend and the corresponding section of this chapter. We are grateful to our collaborators from ISI East including C. Chen, S. Crago, M. French, L. Wang and J. Suh for developing the Raw motherboard, firmware components, and several applications. T. Konstantakopoulos, L. Jakab, F. Ghodrat, M. Seneski, A. Saraswat, R. Barua, A. Ma, J. Babb, M. Stephenson, S. Larsen, V. Sarkar, and several others too numerous to list also contributed to the success of Raw. The Raw chip was fabricated in cooperation with IBM. Raw is funded by DARPA, NSF, ITRI, and the Oxygen Alliance.
Author information
Authors and Affiliations
University of California, San Diego, CA, USA
Michael B. Taylor
Tilera Corporation, Westborough, MA, USA
Walter Lee & Ian Bratt
MIT CSAIL, Cambridge, MA, USA
Jason E. Miller, David Wentzlaff, Henry Hoffmann, Paul R. Johnson, Jason S. Kim, James Psota, Saman Amarasinghe & Anant Agarwal
Veracode, Burlington, MA, USA
Ben Greenwald
Swasth Foundation, Bangalore, India
Arvind Saraf
The MITRE Corporation, Bedford, MA, USA
Nathan Shnidman
IBM Austin Research Laboratory, Austin, TX, USA
Volker Strumpen
University of Illinois at Urbana - Champaign, Urbana, IL, USA
Matthew I. Frank
- Michael B. Taylor
You can also search for this author inPubMed Google Scholar
- Walter Lee
You can also search for this author inPubMed Google Scholar
- Jason E. Miller
You can also search for this author inPubMed Google Scholar
- David Wentzlaff
You can also search for this author inPubMed Google Scholar
- Ian Bratt
You can also search for this author inPubMed Google Scholar
- Ben Greenwald
You can also search for this author inPubMed Google Scholar
- Henry Hoffmann
You can also search for this author inPubMed Google Scholar
- Paul R. Johnson
You can also search for this author inPubMed Google Scholar
- Jason S. Kim
You can also search for this author inPubMed Google Scholar
- James Psota
You can also search for this author inPubMed Google Scholar
- Arvind Saraf
You can also search for this author inPubMed Google Scholar
- Nathan Shnidman
You can also search for this author inPubMed Google Scholar
- Volker Strumpen
You can also search for this author inPubMed Google Scholar
- Matthew I. Frank
You can also search for this author inPubMed Google Scholar
- Saman Amarasinghe
You can also search for this author inPubMed Google Scholar
- Anant Agarwal
You can also search for this author inPubMed Google Scholar
Editor information
Editors and Affiliations
College of Natural Sciences, University of Texas, Austin, University Station 1, Austin, 78712-0233, U.S.A.
Stephen W. Keckler
Dept. Electrical Engineering, Stanford University, Stanford, 94305-9510, U.S.A.
Kunle Olukotun
IBM Software Group, Burnet Rd. 11501, Austin, 78758, U.S.A.
H. Peter Hofstee
Rights and permissions
Copyright information
© 2009 Springer-Verlag US
About this chapter
Cite this chapter
Taylor, M.B.et al. (2009). Tiled Multicore Processors. In: Keckler, S., Olukotun, K., Hofstee, H. (eds) Multicore Processors and Systems. Integrated Circuits and Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-0263-4_1
Download citation
Published:
Publisher Name:Springer, Boston, MA
Print ISBN:978-1-4419-0262-7
Online ISBN:978-1-4419-0263-4
eBook Packages:Computer ScienceComputer Science (R0)
Share this chapter
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative