Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

AScale: Big/Small Data ETL and Real-Time Data Freshness

  • Conference paper
  • First Online:

Abstract

In this paper we investigate the problem of providing timely results for the Extraction, Transformation and Load (ETL) process and automatic scalability to the entire pipeline including the data warehouse. In general, data loading, transformation and integration are heavy tasks that are performed only periodically during specific offline time windows. Parallel architectures and mechanisms are able to optimize the ETL process by speeding-up each part of the pipeline process as more performance is needed. However, none of them allow the user to specify the ETL time and the framework scales automatically to assure it.

We propose an approach to enable the automatic scalability and freshness of any data warehouse and ETL process in time, suitable for smallData and bigData scenarios. A general framework for testing and implementing the system was developed to provide solutions for each part of the ETL automatic scalability in time. The results show that the proposed system is capable of handling scalability to provide the desired processing speed for both near-real-time results ETL processing.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

References

  1. Fernandez, R.C., Pietzuch, P., Koshy, J., Kreps, J., Lin, D., Narkhede, N., Rao, J., Riccomini, C., Wang, G.: Liquid: unifying nearline and offline big data integration. In: Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA. ACM, January 2015

    Google Scholar 

  2. Liu, X.: Data warehousing technologies for large-scale and right-time data. Ph.D. thesis, dissertation, Faculty of Engineering and Science at Aalborg University, Denmark (2012)

    Google Scholar 

  3. Muñoz, L., Mazón, J.N., Trujillo, J.: Automatic generation of ETL processes from conceptual models. In: Proceedings of the ACM Twelfth International Workshop on Data Warehousing and OLAP, pp. 33–40. ACM (2009)

    Google Scholar 

  4. O’Neil, P.E., O’Neil, E.J., Chen, X.: The star schema benchmark (ssb). Pat (2007)

    Google Scholar 

  5. Simitsis, A., Gupta, C., Wang, S., Dayal, U.: Partitioning real-time ETL workflows (2010)

    Google Scholar 

  6. Vassiliadis, P., Simitsis, A.: Near real time ETL. In: Kozielski, S., Wrembel, R. (eds.) New Trends in Data Warehousing and Data Analysis. Annals of Information Systems, vol. 3, pp. 1–31. Springer, New York (2009)

    Chapter  Google Scholar 

Download references

Acknowledgement

This project is part of a larger software prototype, partially financed by, Portugal, CISUC research group from the University of Coimbra and by the Foundation for Science and Technology.

Author information

Authors and Affiliations

  1. Department of Computer Sciences, University of Coimbra, Coimbra, Portugal

    Pedro Martins, Maryam Abbasi & Pedro Furtado

Authors
  1. Pedro Martins

    You can also search for this author inPubMed Google Scholar

  2. Maryam Abbasi

    You can also search for this author inPubMed Google Scholar

  3. Pedro Furtado

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toPedro Martins.

Editor information

Editors and Affiliations

  1. Institute of Informatics, Silesian University of Technology, Gliwice, Poland

    Stanisław Kozielski

  2. Institute of Informatics, Silesian University of Technology, Gliwice, Poland

    Dariusz Mrozek

  3. Institute of Informatics, Silesian University of Technology, Gliwice, Poland

    Paweł Kasprowski

  4. Institute of Informatics, Silesian University of Technology, Gliwice, Poland

    Bożena Małysiak-Mrozek

  5. Institute of Informatics, Silesian University of Technology, Gliwice, Poland

    Daniel Kostrzewa

Rights and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Martins, P., Abbasi, M., Furtado, P. (2016). AScale: Big/Small Data ETL and Real-Time Data Freshness. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_25

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp