Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Generate, Test, and Aggregate

A Calculation-based Framework for Systematic Parallel Programming with MapReduce

  • Conference paper

Part of the book series:Lecture Notes in Computer Science ((LNPSE,volume 7211))

Included in the following conference series:

Abstract

MapReduce, being inspired by the map and reduce primitives available in many functional languages, is the de facto standard for large scale data-intensive parallel programming. Although it has succeeded in popularizing the use of the two primitives for hiding the details of parallel computation, little effort has been made to emphasize the programming methodology behind, which has been intensively studied in the functional programming and program calculation fields. We show that MapReduce can be equipped with a programming theory in calculational form. By integrating the generate-and-test programing paradigm and semirings for aggregation of results, we propose a novel parallel programming framework for MapReduce. The framework consists of two important calculation theorems: the shortcut fusion theorem of semiring homomorphisms bridges the gap between specifications and efficient implementations, and the filter-embedding theorem helps to develop parallel programs in a systematic and incremental way. We give nontrivial examples that demonstrate how to apply our framework.

Similar content being viewed by others

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Bird, R.: An introduction to the theory of lists. In: Broy, M. (ed.) Logic of Programming and Calculi of Discrete Design, pp. 5–42. Springer, Heidelberg (1987)

    Google Scholar 

  2. Bird, R.: Introduction to Functional Programming using Haskell. Prentice Hall (1998)

    Google Scholar 

  3. Bird, R., de Moor, O.: Algebras of Programming. Prentice Hall (1996)

    Google Scholar 

  4. Chin, W.N., Khoo, S.C., Hu, Z., Takeichi, M.: Deriving Parallel Codes via Invariants. In: SAS 2000. LNCS, vol. 1824, pp. 75–94. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  5. Cole, M.: Parallel programming with list homomorphisms. Parallel Processing Letters 5(2), 191–203 (1995)

    Article  Google Scholar 

  6. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Communications of the ACM 51, 107–113 (2008)

    Article  Google Scholar 

  7. Fisher, A.L., Ghuloum, A.M.: Parallelizing complex scans and reductions. In: Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI 1994), pp. 135–146. ACM (1994)

    Google Scholar 

  8. Gill, A., Launchbury, J., Peyton Jones, S.L.: A short cut to deforestation. In: Conference on Functional Programming Languages and Computer Architecture, pp. 223–232 (1993)

    Google Scholar 

  9. Goodman, J.: Semiring parsing. Computational Linguistics 25, 573–605 (1999)

    MathSciNet  Google Scholar 

  10. Gorlatch, S.: Systematic Efficient Parallelization of Scan and Other List Homomorphisms. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 401–408. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  11. Grant-Duff, Z., Harrison, P.: Parallelism via homomorphism. Parallel Processing Letters 6(2), 279–295 (1996)

    Article MathSciNet  Google Scholar 

  12. He, Y.: Extended viterbi algorithm for second order hidden markov process. In: 9th International Conference on Pattern Recognition, vol. 2, pp. 718–720. IEEE Press (1988)

    Google Scholar 

  13. Ho, T.J., Chen, B.S.: Novel extended viterbi-based multiple-model algorithms for state estimation of discrete-time systems with markov jump parameters. IEEE Transactions on Signal Processing 54(2), 393–404 (2006)

    Article  Google Scholar 

  14. Hu, Z., Takeichi, M., Chin, W.N.: Parallelization in calculational forms. In: 25th ACM Symposium on Principles of Programming Languages (POPL 1998), pp. 316–328. ACM Press, San Diego (1998)

    Chapter  Google Scholar 

  15. Hu, Z., Yokoyama, T., Takeichi, M.: Program Optimizations and Transformations in Calculation Form. In: Lämmel, R., Saraiva, J., Visser, J. (eds.) GTTSE 2005. LNCS, vol. 4143, pp. 144–168. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  16. Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for mapreduce. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pp. 938–948. SIAM (2010)

    Google Scholar 

  17. Lin, J., Dyer, C.: Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers (2010)

    Google Scholar 

  18. List, M.A.P. (2011),http://www.mendeley.com/groups/1058401/mapreduce-applications/papers/

  19. Liu, Y., Hu, Z., Matsuzaki, K.: Towards Systematic Parallel Programming over MapReduce. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 6853, pp. 39–50. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  20. Morita, K., Morihata, A., Matsuzaki, K., Hu, Z., Takeichi, M.: Automatic inversion generates divide-and-conquer parallel programs. In: ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 2007), pp. 146–155. ACM Press (2007)

    Google Scholar 

  21. Sato, S., Iwasaki, H.: Automatic parallelization via matrix multiplication. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2011), pp. 470–479. ACM (2011)

    Google Scholar 

  22. Skillicorn, D.B.: The Bird-Meertens Formalism as a Parallel Model. In: NATO ARW “Software for Parallel Computation” (1992)

    Google Scholar 

  23. Takano, A., Meijer, E.: Shortcut deforestation in calculational form. In: Proc. Conference on Functional Programming Languages and Computer Architecture, pp. 306–313. La Jolla, California (1995)

    Google Scholar 

  24. Takano, A., Hu, Z., Takeichi, M.: Program transformation in calculational form. ACM Computing Surveys 30(3) (1998)

    Google Scholar 

  25. Thomas, W.: Automata on infinite objects. In: Handbook of Theoretical Computer Science, Formal Models and Sematics, vol. B, pp. 133–192. Elsevier and MIT Press (1990)

    Google Scholar 

  26. Wadler, P.: Theorems for free! In: Proceedings of the Fourth International Conference on Functional Programming Languages and Computer Architecture, FPCA 1989, pp. 347–359. ACM (1989)

    Google Scholar 

  27. White, T.: Hadoop: The Definitive Guide. O’Reilly Media (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. The University of Tokyo, Japan

    Kento Emoto

  2. National Institute of Informatics, Tokyo, Japan

    Sebastian Fischer & Zhenjiang Hu

Authors
  1. Kento Emoto

    You can also search for this author inPubMed Google Scholar

  2. Sebastian Fischer

    You can also search for this author inPubMed Google Scholar

  3. Zhenjiang Hu

    You can also search for this author inPubMed Google Scholar

Editor information

Editors and Affiliations

  1. Technische Universität München, Boltzmannstrasse 3, 85748, Garching, Germany

    Helmut Seidl

Rights and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Emoto, K., Fischer, S., Hu, Z. (2012). Generate, Test, and Aggregate. In: Seidl, H. (eds) Programming Languages and Systems. ESOP 2012. Lecture Notes in Computer Science, vol 7211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28869-2_13

Download citation

Publish with us


[8]ページ先頭

©2009-2025 Movatter.jp