Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 11446))
Included in the following conference series:
3762Accesses
Abstract
It is important for big data systems to identify their performance bottleneck. However, the popular indicators such as resource utilizations, are often misleading and incomparable with each other. In this paper, a novel indicator framework which can directly compare the impact of different indicators with each other is proposed to identify and analyze the performance bottleneck efficiently. A methodology which can construct the indicator from the performance change with the CPU frequency scaling is described. Spark is used as an example of a big data system and two typical SQL benchmarks are used as the workloads to evaluate the proposed method. Experimental results show that the proposed method is accurate compared with the resource utilization method and easy to implement compared with the white-box method. Meanwhile, the analysis with our indicators leads to some interesting findings and valuable performance optimization suggestions for big data systems.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 11210
- Price includes VAT (Japan)
- Softcover Book
- JPY 14013
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Apache spark.http://spark.apache.org/
Google vm rightsizing service.https://cloud.google.com/compute/docs/instances/viewing-sizing-recommendations-for-instances
Linux perf subsystem.https://perf.wiki.kernel.org/index.php/Main_Page
Parquet.http://parquet.apache.org/
Spec.http://www.spec.org/
Trace-analysis.https://github.com/kayousterhout/trace-analysis
Cantrill, B., Shapiro, M.W., Leventhal, A.H., et al.: Dynamic instrumentation of production systems. In: USENIX Annual Technical Conference, General Track, pp. 15–28 (2004)
Conley, M., Vahdat, A., Porter, G.: Achieving cost-efficient, data-intensive computing in the cloud. In: Proceedings of the Sixth ACM Symposium on Cloud Computing, pp. 302–314 (2015)
Dai, J., Huang, J., Huang, S., Huang, B., Liu, Y.: Hitune: dataflow-based performance analysis for big data cloud, pp. 87–100 (2011)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of Operating Systems Design and Implementation, vol. 51, no. 1, pp. 107–113 (2004)
Dittrich, J.: Runtime measurements in the cloud: observing, analyzing, and reducing variance. VLDB Endow.3, 460–471 (2010)
Gao, F., Sair, S.: Long-term performance bottleneck analysis and prediction. In: International Conference on Computer Design, pp. 3–9 (2007)
Hackenberg, D., Molka, D.: Memory performance at reduced CPU clock speeds: an analysis of current x86\(\_\)64 processors. In: Workshop on Power-Aware Computing Systems, HotPower, pp. 5–9 (2012)
Koutoupis, P.: The linux ram disk. Linux+ Magzine, pp. 36–39 (2009)
Massie, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput.30(7), 817–840 (2004)
Nambiar, R.O., Poess, M.: The making of TPC-DS. In: International Conference on Very Large Data Bases, pp. 1049–1058 (2006)
Ousterhout, K., Rasti, R., Ratnasamy, S., Shenker, S., Chun, B.G.: Making sense of performance in data analytics frameworks. In: 12nd USENIX Symposium on Networked Systems Design and Implementation, pp. 293–307 (2015)
Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: ACM SIGMOD International Conference on Management of Data, pp. 165–178 (2009)
Sambasivan, R.R., et al.: Diagnosing performance changes by comparing request flows. In: USENIX Conference on Networked Systems Design and Implementation, pp. 43–56 (2011)
Shi, J., et al.: Clash of the titans: Mapreduce vs. spark for large scale data analytics. Proc. VLDB Endow.8(13), 2110–2121 (2015)
Sridharan, S., Patel, J.M.: Profiling R on a contemporary processor. Proc. VLDB Endow.8(2), 173–184 (2014)
Venkataraman, S., Yang, Z., Franklin, M.J., Recht, B., Stoica, I.: Ernest: efficient performance prediction for large-scale advanced analytics. In: Proceedings of USENIX Symposium on Networked System Design and Implementation, pp. 363–378 (2016)
Wang, C., Meng, X., Guo, Q., Weng, Z., Yang, C.: Automating characterization deployment in distributed data stream management systems. IEEE Trans. Knowl. Data Eng.29(12), 2669–2681 (2017)
Yoo, W., Larson, K., Baugh, L., Kim, S., Campbell, R.H.: ADP: automated diagnosis of performance pathologies using hardware events. In: ACM Sigmetrics/Performance Joint International Conference on Measurement and Modeling of Computer Systems, pp. 283–294 (2012)
Zhibin, Y., Xiong, W., Eeckhout, L., Bei, Z., Mendelson, A., Chengzhong, X.: Mia: metric importance analysis for big data workload characterization. IEEE Trans. Parallel Distrib. Syst.29(6), 1371–1384 (2018)
Acknowledgement
This research was partially supported by the grants from National Key Research and Development Program of China (No. 2016YFB1000602, 2016YFB1000603); Natural Science Foundation of China (No. 91646203, 61532016, 61532010, 61379050, 61762082); Fundamental Research Funds for the Central Universities, Research Funds of Renmin University (No. 11XNL010); and Science and Technology Opening up Cooperation project of Henan Province (172106000077).
Author information
Authors and Affiliations
School of Information, Renmin University, Beijing, China
Chen Yang, Xiaofeng Meng, Yongjie Du & Zhiqiang Duan
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Zhihui Du
School of Software, Zhengzhou University of Light Industry, Zhengzhou, China
Chen Yang
- Chen Yang
You can also search for this author inPubMed Google Scholar
- Zhihui Du
You can also search for this author inPubMed Google Scholar
- Xiaofeng Meng
You can also search for this author inPubMed Google Scholar
- Yongjie Du
You can also search for this author inPubMed Google Scholar
- Zhiqiang Duan
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toXiaofeng Meng.
Editor information
Editors and Affiliations
Tsinghua University, Beijing, China
Guoliang Li
Duke University, Durham, NC, USA
Jun Yang
University of Porto, Porto, Portugal
Joao Gama
Chiang Mai University, Chiang Mai, Thailand
Juggapong Natwichai
Beihang University, Beijing, China
Yongxin Tong
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, C., Du, Z., Meng, X., Du, Y., Duan, Z. (2019). A Frequency Scaling Based Performance Indicator Framework for Big Data Systems. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11446. Springer, Cham. https://doi.org/10.1007/978-3-030-18576-3_2
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-030-18575-6
Online ISBN:978-3-030-18576-3
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative