Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

A Frequency Scaling Based Performance Indicator Framework for Big Data Systems

  • Conference paper
  • First Online:

Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 11446))

  • 3762Accesses

Abstract

It is important for big data systems to identify their performance bottleneck. However, the popular indicators such as resource utilizations, are often misleading and incomparable with each other. In this paper, a novel indicator framework which can directly compare the impact of different indicators with each other is proposed to identify and analyze the performance bottleneck efficiently. A methodology which can construct the indicator from the performance change with the CPU frequency scaling is described. Spark is used as an example of a big data system and two typical SQL benchmarks are used as the workloads to evaluate the proposed method. Experimental results show that the proposed method is accurate compared with the resource utilization method and easy to implement compared with the white-box method. Meanwhile, the analysis with our indicators leads to some interesting findings and valuable performance optimization suggestions for big data systems.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11210
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14013
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

References

  1. Apache spark.http://spark.apache.org/

  2. Google vm rightsizing service.https://cloud.google.com/compute/docs/instances/viewing-sizing-recommendations-for-instances

  3. Linux perf subsystem.https://perf.wiki.kernel.org/index.php/Main_Page

  4. Parquet.http://parquet.apache.org/

  5. Spec.http://www.spec.org/

  6. Stream.http://www.cs.virginia.edu/stream/

  7. Trace-analysis.https://github.com/kayousterhout/trace-analysis

  8. Cantrill, B., Shapiro, M.W., Leventhal, A.H., et al.: Dynamic instrumentation of production systems. In: USENIX Annual Technical Conference, General Track, pp. 15–28 (2004)

    Google Scholar 

  9. Conley, M., Vahdat, A., Porter, G.: Achieving cost-efficient, data-intensive computing in the cloud. In: Proceedings of the Sixth ACM Symposium on Cloud Computing, pp. 302–314 (2015)

    Google Scholar 

  10. Dai, J., Huang, J., Huang, S., Huang, B., Liu, Y.: Hitune: dataflow-based performance analysis for big data cloud, pp. 87–100 (2011)

    Google Scholar 

  11. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of Operating Systems Design and Implementation, vol. 51, no. 1, pp. 107–113 (2004)

    Google Scholar 

  12. Dittrich, J.: Runtime measurements in the cloud: observing, analyzing, and reducing variance. VLDB Endow.3, 460–471 (2010)

    Google Scholar 

  13. Gao, F., Sair, S.: Long-term performance bottleneck analysis and prediction. In: International Conference on Computer Design, pp. 3–9 (2007)

    Google Scholar 

  14. Hackenberg, D., Molka, D.: Memory performance at reduced CPU clock speeds: an analysis of current x86\(\_\)64 processors. In: Workshop on Power-Aware Computing Systems, HotPower, pp. 5–9 (2012)

    Google Scholar 

  15. Koutoupis, P.: The linux ram disk. Linux+ Magzine, pp. 36–39 (2009)

    Google Scholar 

  16. Massie, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput.30(7), 817–840 (2004)

    Google Scholar 

  17. Nambiar, R.O., Poess, M.: The making of TPC-DS. In: International Conference on Very Large Data Bases, pp. 1049–1058 (2006)

    Google Scholar 

  18. Ousterhout, K., Rasti, R., Ratnasamy, S., Shenker, S., Chun, B.G.: Making sense of performance in data analytics frameworks. In: 12nd USENIX Symposium on Networked Systems Design and Implementation, pp. 293–307 (2015)

    Google Scholar 

  19. Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: ACM SIGMOD International Conference on Management of Data, pp. 165–178 (2009)

    Google Scholar 

  20. Sambasivan, R.R., et al.: Diagnosing performance changes by comparing request flows. In: USENIX Conference on Networked Systems Design and Implementation, pp. 43–56 (2011)

    Google Scholar 

  21. Shi, J., et al.: Clash of the titans: Mapreduce vs. spark for large scale data analytics. Proc. VLDB Endow.8(13), 2110–2121 (2015)

    Google Scholar 

  22. Sridharan, S., Patel, J.M.: Profiling R on a contemporary processor. Proc. VLDB Endow.8(2), 173–184 (2014)

    Article  Google Scholar 

  23. Venkataraman, S., Yang, Z., Franklin, M.J., Recht, B., Stoica, I.: Ernest: efficient performance prediction for large-scale advanced analytics. In: Proceedings of USENIX Symposium on Networked System Design and Implementation, pp. 363–378 (2016)

    Google Scholar 

  24. Wang, C., Meng, X., Guo, Q., Weng, Z., Yang, C.: Automating characterization deployment in distributed data stream management systems. IEEE Trans. Knowl. Data Eng.29(12), 2669–2681 (2017)

    Article  Google Scholar 

  25. Yoo, W., Larson, K., Baugh, L., Kim, S., Campbell, R.H.: ADP: automated diagnosis of performance pathologies using hardware events. In: ACM Sigmetrics/Performance Joint International Conference on Measurement and Modeling of Computer Systems, pp. 283–294 (2012)

    Google Scholar 

  26. Zhibin, Y., Xiong, W., Eeckhout, L., Bei, Z., Mendelson, A., Chengzhong, X.: Mia: metric importance analysis for big data workload characterization. IEEE Trans. Parallel Distrib. Syst.29(6), 1371–1384 (2018)

    Article  Google Scholar 

Download references

Acknowledgement

This research was partially supported by the grants from National Key Research and Development Program of China (No. 2016YFB1000602, 2016YFB1000603); Natural Science Foundation of China (No. 91646203, 61532016, 61532010, 61379050, 61762082); Fundamental Research Funds for the Central Universities, Research Funds of Renmin University (No. 11XNL010); and Science and Technology Opening up Cooperation project of Henan Province (172106000077).

Author information

Authors and Affiliations

  1. School of Information, Renmin University, Beijing, China

    Chen Yang, Xiaofeng Meng, Yongjie Du & Zhiqiang Duan

  2. Department of Computer Science and Technology, Tsinghua University, Beijing, China

    Zhihui Du

  3. School of Software, Zhengzhou University of Light Industry, Zhengzhou, China

    Chen Yang

Authors
  1. Chen Yang

    You can also search for this author inPubMed Google Scholar

  2. Zhihui Du

    You can also search for this author inPubMed Google Scholar

  3. Xiaofeng Meng

    You can also search for this author inPubMed Google Scholar

  4. Yongjie Du

    You can also search for this author inPubMed Google Scholar

  5. Zhiqiang Duan

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toXiaofeng Meng.

Editor information

Editors and Affiliations

  1. Tsinghua University, Beijing, China

    Guoliang Li

  2. Duke University, Durham, NC, USA

    Jun Yang

  3. University of Porto, Porto, Portugal

    Joao Gama

  4. Chiang Mai University, Chiang Mai, Thailand

    Juggapong Natwichai

  5. Beihang University, Beijing, China

    Yongxin Tong

Rights and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, C., Du, Z., Meng, X., Du, Y., Duan, Z. (2019). A Frequency Scaling Based Performance Indicator Framework for Big Data Systems. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11446. Springer, Cham. https://doi.org/10.1007/978-3-030-18576-3_2

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11210
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14013
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp