Part of the book series:Communications in Computer and Information Science ((CCIS,volume 613))
Included in the following conference series:
1349Accesses
Abstract
In this paper we investigate the problem of providing timely results for the Extraction, Transformation and Load (ETL) process and automatic scalability to the entire pipeline including the data warehouse. In general, data loading, transformation and integration are heavy tasks that are performed only periodically during specific offline time windows. Parallel architectures and mechanisms are able to optimize the ETL process by speeding-up each part of the pipeline process as more performance is needed. However, none of them allow the user to specify the ETL time and the framework scales automatically to assure it.
We propose an approach to enable the automatic scalability and freshness of any data warehouse and ETL process in time, suitable for smallData and bigData scenarios. A general framework for testing and implementing the system was developed to provide solutions for each part of the ETL automatic scalability in time. The results show that the proposed system is capable of handling scalability to provide the desired processing speed for both near-real-time results ETL processing.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 11439
- Price includes VAT (Japan)
- Softcover Book
- JPY 14299
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Fernandez, R.C., Pietzuch, P., Koshy, J., Kreps, J., Lin, D., Narkhede, N., Rao, J., Riccomini, C., Wang, G.: Liquid: unifying nearline and offline big data integration. In: Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA. ACM, January 2015
Liu, X.: Data warehousing technologies for large-scale and right-time data. Ph.D. thesis, dissertation, Faculty of Engineering and Science at Aalborg University, Denmark (2012)
Muñoz, L., Mazón, J.N., Trujillo, J.: Automatic generation of ETL processes from conceptual models. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP, pp. 33–40. ACM (2009)
O’Neil, P.E., O’Neil, E.J., Chen, X.: The star schema benchmark (ssb). Pat (2007)
Simitsis, A., Gupta, C., Wang, S., Dayal, U.: Partitioning real-time ETL workflows (2010)
Vassiliadis, P., Simitsis, A.: Near real time ETL. In: Kozielski, S., Wrembel, R. (eds.) New Trends in Data Warehousing and Data Analysis. Annals of Information Systems, vol. 3, pp. 1–31. Springer, New York (2009)
Acknowledgement
This project is part of a larger software prototype, partially financed by, Portugal, CISUC research group from the University of Coimbra and by the Foundation for Science and Technology.
Author information
Authors and Affiliations
Department of Computer Sciences, University of Coimbra, Coimbra, Portugal
Pedro Martins, Maryam Abbasi & Pedro Furtado
- Pedro Martins
You can also search for this author inPubMed Google Scholar
- Maryam Abbasi
You can also search for this author inPubMed Google Scholar
- Pedro Furtado
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toPedro Martins.
Editor information
Editors and Affiliations
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Dariusz Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Paweł Kasprowski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Bożena Małysiak-Mrozek
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Daniel Kostrzewa
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Martins, P., Abbasi, M., Furtado, P. (2016). AScale: Big/Small Data ETL and Real-Time Data Freshness. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_25
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-319-34098-2
Online ISBN:978-3-319-34099-9
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative