Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL

  • Conference paper

Part of the book series:Lecture Notes in Computer Science ((LNTCS,volume 6601))

Included in the following conference series:

Abstract

Heterogeneous multi-core platforms are increasingly prevalent due to their perceived superior performance over homogeneous systems. The best performance, however, can only be achieved if tasks are accurately mapped to the right processors. OpenCL programs can be partitioned to take advantage of all the available processors in a system. However, finding the best partitioning for any heterogeneous system is difficult and depends on the hardware and software implementation.

We propose a portable partitioning scheme for OpenCL programs on heterogeneous CPU-GPU systems. We develop a purely static approach based on predictive modelling and program features. When evaluated over a suite of 47 benchmarks, our model achieves a speedup of 1.57 over a state-of-the-art dynamic run-time approach, a speedup of 3.02 over a purely multi-core approach and 1.55 over the performance achieved by using just the GPU.

Similar content being viewed by others

Keywords

References

  1. Clang: a C language family frontend for LLVM (2010),http://clang.llvm.org/

  2. AMD/ATI. ATI Stream SDK (2009),http://www.amd.com/stream/

  3. Augonnet, C., Thibault, S., Namyst, R.: Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures. In: Lin, H.-X., Alexander, M., Forsell, M., Knüpfer, A., Prodan, R., Sousa, L., Streit, A. (eds.) Euro-Par 2009. LNCS, vol. 6043, pp. 56–65. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.:starPU: A unified platform for task scheduling on heterogeneous multicore architectures. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 863–874. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus (2006)

    MATH  Google Scholar 

  6. Braun, T.D., Siegel, H.J., Beck, N., Bölöni, L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B., Hensgen, D.A., Freund, R.F.: A comparison study of static mapping heuristics for a class of meta-tasks on heterogeneous computing systems. In: Heterogeneous Computing Workshop (1999)

    Google Scholar 

  7. Buck, I., Foley, T., Horn, D.R., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph. 23(3) (2004)

    Google Scholar 

  8. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available athttp://www.csie.ntu.edu.tw/~cjlin/libsvm

  9. Danalis, A., Marin, G., McCurdy, C., Meredith, J.S., Roth, P.C., Spafford, K., Tipparaju, V., Vetter, J.S.: The scalable heterogeneous computing (SHOC) benchmark suite. In: GPGPU (2010)

    Google Scholar 

  10. Diamos, G.F., Yalamanchili, S.: Harmony: an execution model and runtime for heterogeneous many core systems. In: HPDC (2008)

    Google Scholar 

  11. Dolbeau, R., Bihan, S., Bodin, F.: HMPP: A hybrid multi-core parallel programming environment. In: Workshop on General Purpose Processing Using GPUs (2007)

    Google Scholar 

  12. Gregg, C., Brantley, J., Hazelwood, K.: Contention-aware scheduling of parallel code for heterogeneous systems. Technical report, Department of Computer Science, University of Virginia (2010)

    Google Scholar 

  13. Ibarra, O.H., Kim, C.E.: Heuristic algorithms for scheduling independent tasks on nonidentical processors. J. ACM 24(2) (1977)

    Google Scholar 

  14. Jiménez, V.J., Vilanova, L., Gelado, I., Gil, M., Fursin, G., Navarro, N.: Predictive runtime code scheduling for heterogeneous architectures. In: Seznec, A., Emer, J., O’Boyle, M., Martonosi, M., Ungerer, T. (eds.) HiPEAC 2009. LNCS, vol. 5409, pp. 19–33. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  15. Khokhar, A.A., Prasanna, V.K., Shaaban, M.E., Wang, C.-L.: Heterogeneous computing: Challenges and opportunities. IEEE Computer 26(6) (1993)

    Google Scholar 

  16. Khronos. OpenCL: The open standard for parallel programming of heterogeneous systems (October 2010),http://www.khronos.org/opencl/

  17. Kim, J.-K., Shivle, S., Siegel, H.J., Maciejewski, A.A., Braun, T.D., Schneider, M., Tideman, S., Chitta, R., Dilmaghani, R.B., Joshi, R., Kaul, A., Sharma, A., Sripada, S., Vangari, P., Yellampalli, S.S.: Dynamic mapping in a heterogeneous environment with tasks having priorities and multiple deadlines. In: IPDPS (2003)

    Google Scholar 

  18. Kumar, R., Tullsen, D.M., Jouppi, N.P., Ranganathan, P.: Heterogeneous chip multiprocessors. IEEE Computer 38(11) (2005)

    Google Scholar 

  19. Linderman, M.D., Collins, J.D., Wang, H., Meng, T.H.Y.: Merge: a programming model for heterogeneous multi-core systems. In: ASPLOS (2008)

    Google Scholar 

  20. Luk, C.-k., Hong, S., Kim, H.: Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: MICRO (2009)

    Google Scholar 

  21. Maheswaran, M., Siegel, H.J.: A dynamic matching and scheduling algorithm for heterogeneous computing systems. In: Heterogeneous Computing Workshop (1998)

    Google Scholar 

  22. NVIDIA Corp. NVIDIA CUDA (2010),http://developer.nvidia.com/object/cuda.html

  23. University of Illinois at Urbana-Champaign. Parboil benchmark suite (2010),http://impact.crhc.illinois.edu/parboil.php

  24. Ravi, V.T., Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: ICS (2010)

    Google Scholar 

  25. Rifkin, R.M., Klautau, A.: In defense of one-vs-all classification. Journal of Machine Learning Research (2004)

    Google Scholar 

  26. Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.-m.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP (2008)

    Google Scholar 

  27. Venkatasubramanian, S., Vuduc, R.W.: Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems. In: ICS (2009)

    Google Scholar 

  28. Wolfe, M.: Implementing the PGI accelerator model. In: GPGPU (2010)

    Google Scholar 

  29. Yarmolenko, V., Duato, J., Panda, D.K., Sadayappan, P.: Characterization and enhancement of dynamic mapping heuristics for heterogeneous systems. In: ICPP Workshops (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. School of Informatics, The University of Edinburgh, UK

    Dominik Grewe & Michael F. P. O’Boyle

Authors
  1. Dominik Grewe

    You can also search for this author inPubMed Google Scholar

  2. Michael F. P. O’Boyle

    You can also search for this author inPubMed Google Scholar

Editor information

Editors and Affiliations

  1. Faculty of Informatics, Institute of Computer Languages, TU Vienna, Argentinierstr. 8 / E185.1, 1040, Vienna, Austria

    Jens Knoop

Rights and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grewe, D., O’Boyle, M.F.P. (2011). A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL. In: Knoop, J. (eds) Compiler Construction. CC 2011. Lecture Notes in Computer Science, vol 6601. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19861-8_16

Download citation

Publish with us


[8]ページ先頭

©2009-2025 Movatter.jp