Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 12392))
Included in the following conference series:
931Accesses
Abstract
Data Wrangling (DW) is an essential component of any big data analytics job, encompassing a large variety of complex operations to transform, integrate and clean sets of unrefined data. The inherent complexity and execution cost associated with DW workflows make the provisioning of resources from a cloud provider a sensible solution for executing these workflows in a reasonable amount of time. However, the lack of detailed profiles of the input data and the operations composing these workflows makes the selection of resources to run these workflows on the cloud a hard task due to the large search space to select appropriate resources, their interactions, dependencies, trade-offs and prices that need to be considered. In this paper, we investigate the complex problem of provisioning cloud resources to DW workflows, by carrying out a case study on a specific Traffic DW workflow from the Smart Cities domain. We carry out a number of simulations where we change resource provisioning, focusing on what may impact the execution of the DW workflow most. The insights obtained from our results suggest that fine-grained cloud resource provisioning based on workflow execution profile and input data properties has the potential to improve resource utilization and prevent significant over- and under-provisioning.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 5719
- Price includes VAT (Japan)
- Softcover Book
- JPY 7149
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Usingworkflow-profiler (https://github.com/intel/workflow-profiler).
References
Bhavani, B.H., Guruprasad, H.S.: Resource provisioning techniques in cloud computing environment: a survey. Int. J. Res. Comput. Commun. Technol.3, 395–401 (2014)
Chen, W., Deelman, E.: Workflowsim: a toolkit for simulating scientific workflows in distributed environments. IEEE 8th International Conference on E-Science, pp. 1–8 (2012)
Sampaio, S.D.F.M., Dong, C., Sampaio, P.: DQ\({}^{\text{2 }}\)S - a framework for data quality-aware information management. Expert Syst. Appl.42(21), 8304–8326 (2015)
Furche, T., Gottlob, G., Libkin, L., Orsi, G., Paton, N.W.: Data wrangling for big data: challenges and opportunities. In: Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France, 15–16 March 2016, Bordeaux, France, pp. 473–478 (2016)
Gill, S.S., Buyya, R.: Resource provisioning based scheduling framework for execution of heterogeneous and clustered workloads in clouds: from fundamental to autonomic offering. J. Grid Comput.17(3), 385–417 (2019)
Gill, S.S., Chana, I., Singh, M., Buyya, R.: RADAR: self-configuring and self-healing in resource management for enhancing quality of cloud services. Concurrency and Computation: Practice and Experience31(1), (2019)
Gill, S.S., et al.: Holistic resource management for sustainable and reliable cloud computing: an innovative solution to global challenge. J. Syst. Softw.155, 104–129 (2019)
Hellerstein, J.M., et al.: Ground: a data context service. In: CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Online Proceedings, Chaminade, CA, USA, 8–11 January 2017 (2017)
Nahrstedt, K.: To overprovision or to share via QoS-aware resource management? In: Proceedings of the Eighth International Symposium on High Performance Distributed Computing (Cat. No. 99TH8469), Redondo Beach, CA, USA, 6 August, pp. 205–212 (1999)
Pietri, I., Sakellariou, R.: A Pareto-based approach for CPU provisioning of scientific workflows on clouds. Future Gener. Comput. Syst.94, 479–487 (2019)
Sampaio, S., Aljubairah, M., Permana, H.A., Sampaio, P.: A conceptual approach for supporting traffic data wrangling tasks. Comput. J.62(3), 461–480 (2019)
Singh, S., Chana, I.: Q-aware: quality of service based cloud resource provisioning. Comput. Electr. Eng.47, 138–160 (2015)
Singh, S., Chana, I.: Cloud resource provisioning: survey, status and future research directions. Knowl. Inf. Syst.49(3), 1005–1069 (2016).https://doi.org/10.1007/s10115-016-0922-3
Singh, S., Chana, I.: A survey on resource scheduling in cloud computing: issues and challenges. J. Grid Comput.14(2), 217–264 (2016)
Stonebraker, M., Ilyas, I.F.: Data integration: the current status and the way forward. IEEE Data Eng. Bull.41(2), 3–9 (2018)
Vassiliadis, P.: A survey of extract-transform-load technology. Int. J. Data Warehouse. Min.5, 1–27 (2009)
Acknowledgement
Partial support from the H2020 I-BiDaaS project (grant agreement No. 780787) is gratefully acknowledged.
Author information
Authors and Affiliations
Department of Computer Science, The University of Manchester, Manchester, UK
Abdullah Khalid A. Almasaud, Agresh Bharadwaj, Sandra Sampaio & Rizos Sakellariou
- Abdullah Khalid A. Almasaud
You can also search for this author inPubMed Google Scholar
- Agresh Bharadwaj
You can also search for this author inPubMed Google Scholar
- Sandra Sampaio
You can also search for this author inPubMed Google Scholar
- Rizos Sakellariou
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toSandra Sampaio.
Editor information
Editors and Affiliations
Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Johannes Kepler University of Linz, Linz, Austria
Josef Küng
Johannes Kepler University of Linz, Linz, Austria
Gabriele Kotsis
IFS, Vienna University of Technology, Vienna, Wien, Austria
A Min Tjoa
Johannes Kepler University of Linz, Linz, Austria
Ismail Khalil
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Almasaud, A.K.A., Bharadwaj, A., Sampaio, S., Sakellariou, R. (2020). Challenges in Resource Provisioning for the Execution of Data Wrangling Workflows on the Cloud: A Case Study. In: Hartmann, S., Küng, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2020. Lecture Notes in Computer Science(), vol 12392. Springer, Cham. https://doi.org/10.1007/978-3-030-59051-2_5
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-030-59050-5
Online ISBN:978-3-030-59051-2
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative