Movatterモバイル変換

Dominik Grewe¹⁷ &
Michael F. P. O’Boyle¹⁷

Part of the book series:Lecture Notes in Computer Science ((LNTCS,volume 6601))

Included in the following conference series:

International Conference on Compiler Construction

3395Accesses
80Citations
6Altmetric

Abstract

Heterogeneous multi-core platforms are increasingly prevalent due to their perceived superior performance over homogeneous systems. The best performance, however, can only be achieved if tasks are accurately mapped to the right processors. OpenCL programs can be partitioned to take advantage of all the available processors in a system. However, finding the best partitioning for any heterogeneous system is difficult and depends on the hardware and software implementation.

We propose a portable partitioning scheme for OpenCL programs on heterogeneous CPU-GPU systems. We develop a purely static approach based on predictive modelling and program features. When evaluated over a suite of 47 benchmarks, our model achieves a speedup of 1.57 over a state-of-the-art dynamic run-time approach, a speedup of 3.02 over a purely multi-core approach and 1.55 over the performance achieved by using just the GPU.

Download to read the full chapter text

Chapter PDF

Automatic Mapping for OpenCL-Programs on CPU/GPU Heterogeneous Platforms

Automatic OpenCL Task Adaptation for Heterogeneous Architectures

Understanding Data Partition for Applications on CPU-GPU Integrated Processors

Keywords

References

Clang: a C language family frontend for LLVM (2010),http://clang.llvm.org/
AMD/ATI. ATI Stream SDK (2009),http://www.amd.com/stream/
Augonnet, C., Thibault, S., Namyst, R.: Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures. In: Lin, H.-X., Alexander, M., Forsell, M., Knüpfer, A., Prodan, R., Sousa, L., Streit, A. (eds.) Euro-Par 2009. LNCS, vol. 6043, pp. 56–65. Springer, Heidelberg (2010)
Chapter Google Scholar
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.:starPU: A unified platform for task scheduling on heterogeneous multicore architectures. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 863–874. Springer, Heidelberg (2009)
Chapter Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus (2006)
MATH Google Scholar
Braun, T.D., Siegel, H.J., Beck, N., Bölöni, L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B., Hensgen, D.A., Freund, R.F.: A comparison study of static mapping heuristics for a class of meta-tasks on heterogeneous computing systems. In: Heterogeneous Computing Workshop (1999)
Google Scholar
Buck, I., Foley, T., Horn, D.R., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph. 23(3) (2004)
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available athttp://www.csie.ntu.edu.tw/~cjlin/libsvm
Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V., Vetter, J.S.: The scalable heterogeneous computing (SHOC) benchmark suite. In: GPGPU (2010)
Google Scholar
Diamos, G.F., Yalamanchili, S.: Harmony: an execution model and runtime for heterogeneous many core systems. In: HPDC (2008)
Google Scholar
Dolbeau, R., Bihan, S., Bodin, F.: HMPP: A hybrid multi-core parallel programming environment. In: Workshop on General Purpose Processing Using GPUs (2007)
Google Scholar
Gregg, C., Brantley, J., Hazelwood, K.: Contention-aware scheduling of parallel code for heterogeneous systems. Technical report, Department of Computer Science, University of Virginia (2010)
Google Scholar
Ibarra, O.H., Kim, C.E.: Heuristic algorithms for scheduling independent tasks on nonidentical processors. J. ACM 24(2) (1977)
Google Scholar
Jiménez, V.J., Vilanova, L., Gelado, I., Gil, M., Fursin, G., Navarro, N.: Predictive runtime code scheduling for heterogeneous architectures. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds.) HiPEAC 2009. LNCS, vol. 5409, pp. 19–33. Springer, Heidelberg (2009)
Chapter Google Scholar
Khokhar, A.A., Prasanna, V.K., Shaaban, M.E., Wang, C.-L.: Heterogeneous computing: Challenges and opportunities. IEEE Computer 26(6) (1993)
Google Scholar
Khronos. OpenCL: The open standard for parallel programming of heterogeneous systems (October 2010),http://www.khronos.org/opencl/
Kim, J.-K., Shivle, S., Siegel, H.J., Maciejewski, A.A., Braun, T.D., Schneider, M., Tideman, S., Chitta, R., Dilmaghani, R.B., Joshi, R., Kaul, A., Sharma, A., Sripada, S., Vangari, P., Yellampalli, S.S.: Dynamic mapping in a heterogeneous environment with tasks having priorities and multiple deadlines. In: IPDPS (2003)
Google Scholar
Kumar, R., Tullsen, D.M., Jouppi, N.P., Ranganathan, P.: Heterogeneous chip multiprocessors. IEEE Computer 38(11) (2005)
Google Scholar
Linderman, M.D., Collins, J.D., Wang, H., Meng, T.H.Y.: Merge: a programming model for heterogeneous multi-core systems. In: ASPLOS (2008)
Google Scholar
Luk, C.-k., Hong, S., Kim, H.: Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: MICRO (2009)
Google Scholar
Maheswaran, M., Siegel, H.J.: A dynamic matching and scheduling algorithm for heterogeneous computing systems. In: Heterogeneous Computing Workshop (1998)
Google Scholar
NVIDIA Corp. NVIDIA CUDA (2010),http://developer.nvidia.com/object/cuda.html
University of Illinois at Urbana-Champaign. Parboil benchmark suite (2010),http://impact.crhc.illinois.edu/parboil.php
Ravi, V.T., Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: ICS (2010)
Google Scholar
Rifkin, R.M., Klautau, A.: In defense of one-vs-all classification. Journal of Machine Learning Research (2004)
Google Scholar
Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.-m.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP (2008)
Google Scholar
Venkatasubramanian, S., Vuduc, R.W.: Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems. In: ICS (2009)
Google Scholar
Wolfe, M.: Implementing the PGI accelerator model. In: GPGPU (2010)
Google Scholar
Yarmolenko, V., Duato, J., Panda, D.K., Sadayappan, P.: Characterization and enhancement of dynamic mapping heuristics for heterogeneous systems. In: ICPP Workshops (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics, The University of Edinburgh, UK
Dominik Grewe & Michael F. P. O’Boyle

Authors

Dominik Grewe
View author publications
You can also search for this author inPubMed Google Scholar
Michael F. P. O’Boyle
View author publications
You can also search for this author inPubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Institute of Computer Languages, TU Vienna, Argentinierstr. 8 / E185.1, 1040, Vienna, Austria
Jens Knoop

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grewe, D., O’Boyle, M.F.P. (2011). A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL. In: Knoop, J. (eds) Compiler Construction. CC 2011. Lecture Notes in Computer Science, vol 6601. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19861-8_16

Download citation

DOI:https://doi.org/10.1007/978-3-642-19861-8_16
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-642-19860-1
Online ISBN:978-3-642-19861-8
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics