Part of the book series:Lecture Notes in Computer Science ((LNPSE,volume 7211))
Included in the following conference series:
Abstract
MapReduce, being inspired by the map and reduce primitives available in many functional languages, is the de facto standard for large scale data-intensive parallel programming. Although it has succeeded in popularizing the use of the two primitives for hiding the details of parallel computation, little effort has been made to emphasize the programming methodology behind, which has been intensively studied in the functional programming and program calculation fields. We show that MapReduce can be equipped with a programming theory in calculational form. By integrating the generate-and-test programing paradigm and semirings for aggregation of results, we propose a novel parallel programming framework for MapReduce. The framework consists of two important calculation theorems: the shortcut fusion theorem of semiring homomorphisms bridges the gap between specifications and efficient implementations, and the filter-embedding theorem helps to develop parallel programs in a systematic and incremental way. We give nontrivial examples that demonstrate how to apply our framework.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bird, R.: An introduction to the theory of lists. In: Broy, M. (ed.) Logic of Programming and Calculi of Discrete Design, pp. 5–42. Springer, Heidelberg (1987)
Bird, R.: Introduction to Functional Programming using Haskell. Prentice Hall (1998)
Bird, R., de Moor, O.: Algebras of Programming. Prentice Hall (1996)
Chin, W.N., Khoo, S.C., Hu, Z., Takeichi, M.: Deriving Parallel Codes via Invariants. In: SAS 2000. LNCS, vol. 1824, pp. 75–94. Springer, Heidelberg (2000)
Cole, M.: Parallel programming with list homomorphisms. Parallel Processing Letters 5(2), 191–203 (1995)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Communications of the ACM 51, 107–113 (2008)
Fisher, A.L., Ghuloum, A.M.: Parallelizing complex scans and reductions. In: Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation (PLDI 1994), pp. 135–146. ACM (1994)
Gill, A., Launchbury, J., Peyton Jones, S.L.: A short cut to deforestation. In: Conference on Functional Programming Languages and Computer Architecture, pp. 223–232 (1993)
Goodman, J.: Semiring parsing. Computational Linguistics 25, 573–605 (1999)
Gorlatch, S.: Systematic Efficient Parallelization of Scan and Other List Homomorphisms. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 401–408. Springer, Heidelberg (1996)
Grant-Duff, Z., Harrison, P.: Parallelism via homomorphism. Parallel Processing Letters 6(2), 279–295 (1996)
He, Y.: Extended viterbi algorithm for second order hidden markov process. In: 9th International Conference on Pattern Recognition, vol. 2, pp. 718–720. IEEE Press (1988)
Ho, T.J., Chen, B.S.: Novel extended viterbi-based multiple-model algorithms for state estimation of discrete-time systems with markov jump parameters. IEEE Transactions on Signal Processing 54(2), 393–404 (2006)
Hu, Z., Takeichi, M., Chin, W.N.: Parallelization in calculational forms. In: 25th ACM Symposium on Principles of Programming Languages (POPL 1998), pp. 316–328. ACM Press, San Diego (1998)
Hu, Z., Yokoyama, T., Takeichi, M.: Program Optimizations and Transformations in Calculation Form. In: Lämmel, R., Saraiva, J., Visser, J. (eds.) GTTSE 2005. LNCS, vol. 4143, pp. 144–168. Springer, Heidelberg (2006)
Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for mapreduce. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, pp. 938–948. SIAM (2010)
Lin, J., Dyer, C.: Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers (2010)
List, M.A.P. (2011),http://www.mendeley.com/groups/1058401/mapreduce-applications/papers/
Liu, Y., Hu, Z., Matsuzaki, K.: Towards Systematic Parallel Programming over MapReduce. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 6853, pp. 39–50. Springer, Heidelberg (2011)
Morita, K., Morihata, A., Matsuzaki, K., Hu, Z., Takeichi, M.: Automatic inversion generates divide-and-conquer parallel programs. In: ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 2007), pp. 146–155. ACM Press (2007)
Sato, S., Iwasaki, H.: Automatic parallelization via matrix multiplication. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2011), pp. 470–479. ACM (2011)
Skillicorn, D.B.: The Bird-Meertens Formalism as a Parallel Model. In: NATO ARW “Software for Parallel Computation” (1992)
Takano, A., Meijer, E.: Shortcut deforestation in calculational form. In: Proc. Conference on Functional Programming Languages and Computer Architecture, pp. 306–313. La Jolla, California (1995)
Takano, A., Hu, Z., Takeichi, M.: Program transformation in calculational form. ACM Computing Surveys 30(3) (1998)
Thomas, W.: Automata on infinite objects. In: Handbook of Theoretical Computer Science, Formal Models and Sematics, vol. B, pp. 133–192. Elsevier and MIT Press (1990)
Wadler, P.: Theorems for free! In: Proceedings of the Fourth International Conference on Functional Programming Languages and Computer Architecture, FPCA 1989, pp. 347–359. ACM (1989)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media (2009)
Author information
Authors and Affiliations
The University of Tokyo, Japan
Kento Emoto
National Institute of Informatics, Tokyo, Japan
Sebastian Fischer & Zhenjiang Hu
- Kento Emoto
You can also search for this author inPubMed Google Scholar
- Sebastian Fischer
You can also search for this author inPubMed Google Scholar
- Zhenjiang Hu
You can also search for this author inPubMed Google Scholar
Editor information
Editors and Affiliations
Technische Universität München, Boltzmannstrasse 3, 85748, Garching, Germany
Helmut Seidl
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Emoto, K., Fischer, S., Hu, Z. (2012). Generate, Test, and Aggregate. In: Seidl, H. (eds) Programming Languages and Systems. ESOP 2012. Lecture Notes in Computer Science, vol 7211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28869-2_13
Download citation
Publisher Name:Springer, Berlin, Heidelberg
Print ISBN:978-3-642-28868-5
Online ISBN:978-3-642-28869-2
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative