Movatterモバイル変換

[0]ホーム

Jump to content

MonetDB

Deutsch

Edit links

From Wikipedia, the free encyclopedia

Open source column-oriented relational database management system

MonetDB

Developer(s)	MonetDB Foundation

Stable release	Aug2024 (11.51)^[1] / ()

Repository	www.monetdb.org/hg/MonetDB/file/
Written in	C
Operating system	Cross-platform
Type	Column-oriented DBMS RDBMS
License	Mozilla Public License, version 2.0
Website	www.monetdb.org

MonetDB is anopen-source column-oriented relational database management system (RDBMS) originally developed at theCentrum Wiskunde & Informatica (CWI) in theNetherlands.It is designed to provide high performance on complex queries against large databases, such as combiningtables with hundreds of columns and millions of rows.MonetDB has been applied in high-performance applications foronline analytical processing,data mining,geographic information system (GIS),^[2]Resource Description Framework (RDF),^[3] text retrieval andsequence alignment processing.^[4]

History

[edit]

Data mining projects in the 1990s required improved analytical database support. This resulted in aCWI spin-off called Data Distilleries, which used early MonetDB implementations in its analytical suite. Data Distilleries eventually became a subsidiary ofSPSS in 2003, which in turn was acquired byIBM in 2009.^[5]

MonetDB in its current form was first created in 2002 by doctoral studentPeter Boncz and professorMartin L. Kersten as part of the 1990s' MAGNUM research project atUniversity of Amsterdam.^[6] It was initially called simply Monet, after the French impressionist painterClaude Monet. The first version under anopen-source software license (a modified version of theMozilla Public License) was released on September 30, 2004. When MonetDB version 4 was released into the open-source domain, many extensions to the code base were added by the MonetDB/CWI team, including a new SQL front end, supporting theSQL:2003 standard.^[7]

MonetDB introduced innovations in all layers of theDBMS: a storage model based on vertical fragmentation, a modernCPU-tuned query execution architecture that often gave MonetDB a speed advantage over the samealgorithm over a typicalinterpreter-based RDBMS. It was one of the first database systems to tune query optimization forCPU caches. MonetDB includes automatic and self-tuning indexes, run-time query optimization, and a modular software architecture.^[8]^[9]

By 2008, a follow-on project called X100 (MonetDB/X100) started, which evolved into theVectorWise technology. VectorWise was acquired byActian Corporation, integrated with theIngres database and sold as a commercial product.^[10]^[11]

In 2011 a major effort to renovate the MonetDB codebase was started. As part of it, the code for the MonetDB 4 kernel and its XQuery components were frozen. In MonetDB 5, parts of the SQL layer were pushed into the kernel.^[7] The resulting changes created a difference in internalAPIs, as it transitioned from MonetDB Instruction Language (MIL) to MonetDB Assembly Language (MAL). Older, no-longer maintained top-level query interfaces were also removed. First wasXQuery, which relied on MonetDB 4 and was never ported to version 5.^[12] The experimentalJaql interface support was removed with the October 2014 release.^[13] With the July 2015 release, MonetDB gained support for read-onlydata sharding and persistent indices. In this release the deprecated streaming data module DataCell was also removed from the main codebase in an effort to streamline the code.^[14] In addition, the license has been changed into theMozilla Public License, version 2.0.

Architecture

[edit]

This article needs to beupdated. Please help update this section to reflect recent events or newly available information.(June 2017)

MonetDB architecture is represented in three layers, each with its own set of optimizers.^[15]The front end is the top layer, providing query interface forSQL, with SciQL andSPARQL interfaces under development. Queries are parsed into domain-specific representations, like relational algebra for SQL, and optimized. The generated logical execution plans are then translated into MonetDB Assembly Language (MAL) instructions, which are passed to the next layer. The middle or back-end layer provides a number of cost-based optimizers for the MAL. The bottom layer is the database kernel, which provides access to the data stored in Binary Association Tables (BATs). Each BAT is a table consisting of an Object-identifier and value columns, representing a single column in the database.^[15]

MonetDB internal data representation also relies on the memory addressing ranges of contemporary CPUs usingdemand paging of memory mapped files, and thus departing from traditional DBMS designs involving complex management of large data stores in limited memory.

Query Recycling

[edit]

Query recycling is an architecture for reusing the byproducts of the operator-at-a-time paradigm in a column store DBMS. Recycling makes use of the generic idea of storing and reusing the results of expensive computations. Unlike low-level instruction caches, query recycling uses an optimizer to pre-select instructions to cache. The technique is designed to improve query response times and throughput, while working in a self-organizing fashion.^[16] The authors from theCWI Database Architectures group, composed of Milena Ivanova,Martin Kersten, Niels Nes and Romulo Goncalves, won the "Best Paper Runner Up" at theACM SIGMOD 2009 conference for their work on Query Recycling.^[17]^[18]

Database Cracking

[edit]

MonetDB was one of the first databases to introduce Database Cracking. Database Cracking is an incremental partial indexing and/or sorting of the data. It directly exploits the columnar nature of MonetDB. Cracking is a technique that shifts the cost of index maintenance from updates to query processing. The query pipeline optimizers are used to massage the query plans to crack and to propagate this information. The technique allows for improved access times and self-organized behavior.^[19] Database Cracking received theACM SIGMOD 2011 J.Gray best dissertation award.^[20]

Components

[edit]

A number of extensions exist for MonetDB that extend the functionality of the database engine. Due to the three-layer architecture, top-level query interfaces can benefit from optimizations done in the backend and kernel layers.

SQL

[edit]

MonetDB/SQL is a top-level extension, which provides complete support for transactions in compliance with theSQL:2003 standard.^[15]

GIS

[edit]

MonetDB/GIS is an extension to MonetDB/SQL with support for theSimple Features Access standard ofOpen Geospatial Consortium (OGC).^[2]

SciQL

[edit]

SciQL an SQL-based query language for science applications with arrays as first class citizens. SciQL allows MonetDB to effectively function as anarray database. SciQL is used in theEuropean Union PlanetData Archived 2014-05-30 at theWayback Machine andTELEIOS project, together with the Data Vault technology, providing transparent access to large scientific data repositories.^[21] Data Vaults map the data from the distributed repositories to SciQL arrays, allowing for improved handling ofspatio-temporal data in MonetDB.^[22] SciQL will be further extended for theHuman Brain Project.^[23]

Data Vaults

[edit]

Data Vault is a database-attached external file repository for MonetDB, similar to theSQL/MED standard. The Data Vault technology allows for transparent integration with distributed/remote file repositories. It is designed for scientific datadata exploration andmining, specifically forremote sensing data.^[22] There is support for theGeoTIFF (Earth observation),FITS (astronomy), MiniSEED (seismology) andNetCDF formats.^[22]^[24]The data is stored in the file repository in the original format, and loaded in the database in alazy fashion, only when needed. The system can also process the data upon ingestion, if the data format requires it.^[25]As a result, even very large file repositories can be efficiently analyzed, as only the required data is processed in the database. The data can be accessed through either the MonetDB SQL or SciQL interfaces. The Data Vault technology was used in theEuropean Union'sTELEIOS project, which was aimed at building avirtual observatory for Earth observation data.^[24] Data Vaults for FITS files have also been used for processingastronomical survey data forThe INT Photometric H-Alpha Survey (IPHAS)^[26]^[27]

SAM/BAM

[edit]

MonetDB has aSAM/BAM module for efficient processing ofsequence alignment data. Aimed at thebioinformatics research, the module has a SAM/BAM data loader and a set of SQL UDFs for working withDNA data.^[4] The module uses the popularSAMtools library.^[28]

RDF/SPARQL

[edit]

MonetDB/RDF is aSPARQL-based extension for working with linked data, which adds support forRDF and allowing MonetDB to function as atriplestore. Under development for theLinked Open Data 2 project.^[3]

R integration

[edit]

MonetDB/R module allows forUDFs written inR to be executed in the SQL layer of the system. This is done using the native R support for running embedded in another application, inside the RDBMS in this case. Previously theMonetDB.R connector allowed the using MonetDB data sources and process them in an R session. The newer R integration feature of MonetDB does not require data to be transferred between the RDBMS and the R session, reducing overhead and improving performance. The feature is intended to give users access to functions of the R statistical software for in-line analysis of data stored in the RDBMS. It complements the existing support forC UDFs and is intended to be used forin-database processing.^[29]

Python integration

[edit]

Similarly to the embedded R UDFs in MonetDB, the database now has support for UDFs written inPython/NumPy. The implementation uses Numpy arrays (themselves Python wrappers for C arrays), as a result there is limited overhead - providing a functional Python integration with speed matching native SQL functions. The Embedded Python functions also support mapped operations, allowing user to execute Python functions in parallel within SQL queries. The practical side of the feature gives users access to Python/NumPy/SciPy libraries, which can provide a large selection of statistical/analytical functions.^[30]

MonetDB embedded

[edit]

Following the release of an embedded driver for R and R UDFs in MonetDB (MonetDB/R), the authors created an embedded version of MonetDB in R calledMonetDBLite, embedded versions for Python and Java followed. They are distributed as embeddable packages, removing the need to manage a database server, required for the previous API integrations. The DBMS runs within the process itself, eliminating socket communication and serialisation overhead - greatly improving efficiency. The idea behind it is to easily embed anSQLite-like package with the performance of an in-memory optimized columnar store.^[31]

Former extensions

[edit]

A number of former extensions have been deprecated and removed from the stable code base over time. Some notable examples include anXQuery extension removed in MonetDB version 5; aJAQL extension, and astreaming data extension calledData Cell.^[15]^[32]^[33]

MonetDB Foundation

[edit]

The MonetDB Foundation is the independent non-profit organisation behind MonetDB. The foundation holds the intellectual property (IP) of MonetDB and is dedicated to advance the development and long-term maintenance of MonetDB. The foundation is funded by charitable donations.^[34]

References

[edit]

^"MonetDB Release Notes". 19 August 2024.
^^a ^b"GeoSpatial - MonetDB". 25 July 2023.
^^a ^b"MonetDB - LOD2 - Creating Knowledge out of Interlined Data". 6 March 2014.
^^a ^b"Life Sciences in MonetDB". 25 July 2023.
^"A short history about us - MonetDB". 6 March 2014.
^Boncz, Peter (May 2002).Monet: A Next-Generation DBMS Kernel For Query-Intensive Applications(PDF) (Ph.D. Thesis). Universiteit van Amsterdam. Archived fromthe original(PDF) on 13 August 2011.
^^a ^bMonetDB historic background
^Stefan Manegold (June 2006)."An Empirical Evaluation of XQuery Processors"(PDF).Proceedings of the International Workshop on Performance and Evaluation of Data Management Systems (ExpDB).33 (2). ACM:203–220.doi:10.1016/j.is.2007.05.004. RetrievedDecember 11, 2013.
^P. A. Boncz, T. Grust, M. van Keulen, S. Manegold, J. Rittinger, J. Teubner.MonetDB/XQuery: A Fast XQuery Processor Powered by a Relational Engine Archived 2008-05-19 at theWayback Machine. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, IL, USA, June 2006.
^Marcin Zukowski; Peter Boncz (May 20, 2012). "From x100 to vectorwise".Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM. pp. 861–862.doi:10.1145/2213836.2213967.ISBN 978-1-4503-1247-9.S2CID 9187072.
^Inkster, D.; Zukowski, M.; Boncz, P. A. (September 20, 2011)."Integration of VectorWise with Ingres"(PDF).ACM SIGMOD Record.40 (3). ACM: 45.CiteSeerX 10.1.1.297.4985.doi:10.1145/2070736.2070747.S2CID 6372175.
^"XQuery". 12 December 2014.
^"MonetDB Oct2014 Release Notes". 12 December 2014.
^"MonetDB July 2015 Released". 31 August 2015.
^^a ^b ^c ^dIdreos, S.; Groffen, F. E.; Nes, N. J.; Manegold, S.; Mullender, K. S.; Kersten, M. L. (March 2012)."MonetDB: Two Decades of Research in Column-oriented Database Architectures"(PDF).IEEE Data Engineering Bulletin. IEEE:40–45. RetrievedMarch 6, 2014.
^*Ivanova, Milena G; Kersten, Martin L; Nes, Niels J; Goncalves, Romulo AP (2010)."An architecture for recycling intermediates in a column-store".ACM Transactions on Database Systems.35 (4). ACM: 24.doi:10.1145/1862919.1862921.S2CID 52811192.
^"CWI database team wins Best Paper Runner Up at SIGMOD 2009". CWI Amsterdam. Retrieved2009-07-01.
^"SIGMOD Awards".ACM SIGMOD. Retrieved2014-07-01.
^Idreos, Stratos; Kersten, Martin L; Manegold, Stefan (2007).Database cracking.Proceedings of CIDR.
^"SIGMOD Awards".ACM SIGMOD. Retrieved2014-12-12.
^Zhang, Y.; Scheers, L. H. A.; Kersten, M. L.; Ivanova, M.; Nes, N. J. (2011). "Astronomical Data Processing Using SciQL, an SQL Based Query Language for Array Data".Astronomical Data Analysis Software and Systems.
^^a ^b ^cIvanova, Milena; Kersten, Martin; Manegold, Stefan (2012). "Data vaults: a symbiosis between database technology and scientific file repositories".Scientific and Statistical Database Management. SSDBM 20212. Springer Berlin Heidelberg. pp. 485–494.
^"SciQL". 4 March 2014.
^^a ^bIvanova, Milena; Kargin, Yagiz; Kersten, Martin; Manegold, Stefan; Zhang, Ying; Datcu, Mihai; Molina, Daniela Espinoza (2013). "Data Vaults: A Database Welcome to Scientific File Repositories".Proceedings of the 25th International Conference on Scientific and Statistical Database Management. SSDBM. ACM.doi:10.1145/2484838.2484876.ISBN 978-1-4503-1921-8.
^Kargin, Yagiz; Ivanova, Milena; Zhang, Ying; Manegold, Stefan; Kersten, Martin (August 2013)."Lazy ETL in Action: ETL Technology Dates Scientific Data"(PDF).Proceedings of the VLDB Endowment.6 (12):1286–1289.doi:10.14778/2536274.2536297.ISSN 2150-8097.
^"Astronomical data analysis with MonetDB Data Vaults". 2015-09-09.
^"Data Vaults". 2015-09-09.
^"SAM/BAM installation". 24 November 2014.
^"Embedded R in MonetDB". 13 November 2014. Archived fromthe original on 13 November 2014. Retrieved12 November 2014.
^"Embedded Python/NumPy in MonetDB". 11 January 2015.
^"MonetDBLite for R". 25 November 2015.
^"Xquery (obsolete)". MonetDB. Retrieved2015-05-26.
^"Announcement: New Oct2014 Feature release of MonetDB suite". MonetDB. Retrieved2015-05-26.
^"MonetDB Foundation". Retrieved2025-01-23.

Bibliography

[edit]

Boncz, Peter; Manegold, Stefan; Kersten, Martin (1999).Database architecture optimized for the new bottleneck: Memory access.Proceedings of International Conference on Very Large Data Bases. pp. 54–65.
Schmidt, Albrecht; Kersten, Martin; Windhouwer, Menzo; Waas, Florian (2001)."Efficient Relational Storage and Retrieval of XML Documents".The World Wide Web and Databases. Lecture Notes in Computer Science. Vol. 1997. Springer. pp. 137–150.doi:10.1007/3-540-45271-0_9.ISBN 978-3-540-41826-9.
Idreos, Stratos; Kersten, Martin L; Manegold, Stefan (2007).Database cracking.Proceedings of CIDR.
Boncz, Peter A; Kersten, Martin L; Manegold, Stefan (2008)."Breaking the memory wall in MonetDB".Communications of the ACM.51 (12). ACM:77–85.doi:10.1145/1409360.1409380.S2CID 5633935.
Sidirourgos, Lefteris; Goncalves, Romulo; Kersten, Martin; Nes, Niels; Manegold, Stefan (2008). "Column-store support for RDF data management: not all swans are white".Proceedings of the VLDB Endowment.1 (2):1553–1563.doi:10.14778/1454159.1454227.
Ivanova, Milena G.; Kersten, Martin L.; Nes, Niels J.; Goncalves, Romulo A.P. (2009). "An Architecture for Recycling Intermediates in a Column-store".Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. SIGMOD '09. ACM. pp. 309–320.doi:10.1145/1559845.1559879.ISBN 978-1-60558-551-2.
Manegold, Stefan; Boncz, Peter A.; Kersten, Martin L. (Dec 2000)."Optimizing Database Architecture for the New Bottleneck: Memory Access".The VLDB Journal.9 (3). Springer-Verlag New York, Inc.:231–246.doi:10.1007/s007780000031.ISSN 1066-8888.S2CID 1688757.
Ivanova, Milena G; Kersten, Martin L; Nes, Niels J; Goncalves, Romulo AP (2010)."An architecture for recycling intermediates in a column-store".ACM Transactions on Database Systems.35 (4). ACM: 24.doi:10.1145/1862919.1862921.S2CID 52811192.
Goncalves, Romulo & Kersten, Martin (2011)."The data cyclotron query processing scheme".ACM Transactions on Database Systems.36 (4). ACM: 27.doi:10.1145/2043652.2043660.S2CID 6707515.
Kersten, Martin L; Idreos, Stratos; Manegold, Stefan; Liarou, Erietta (2011). "The researcher's guide to the data deluge: Querying a scientific database in just a few seconds".PVLDB Challenges and Visions.
Kersten, M; Zhang, Ying; Ivanova, Milena; Nes, Niels (2011). "SciQL, a query language for science applications".Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases. ACM. pp. 1–12.
Sidirourgos, Lefteris; Kersten, Martin; Boncz, Peter (2011).SciBORQ: Scientific data management with Bounds On Runtime and Quality. CIDR 2011: 5th Biennial Conference on Innovative Data Systems Research. Creative Commons.
Liarou, Erietta; Idreos, Stratos; Manegold, Stefan; Kersten, Martin (2012). "MonetDB/DataCell: online analytics in a streaming column-store".Proceedings of the VLDB Endowment.5 (12):1910–1913.doi:10.14778/2367502.2367535.S2CID 545154.
Ivanova, Milena; Kersten, Martin; Manegold, Stefan (2012). "Data vaults: a symbiosis between database technology and scientific file repositories".Scientific and Statistical Database Management. SSDBM 2012. Springer Berlin Heidelberg. pp. 485–494.
Kargin, Yagiz; Ivanova, Milena; Zhang, Ying; Manegold, Stefan; Kersten, Martin (August 2013)."Lazy ETL in Action: ETL Technology Dates Scientific Data"(PDF).Proceedings of the VLDB Endowment.6 (12):1286–1289.doi:10.14778/2536274.2536297.ISSN 2150-8097.
Sidirourgos, Lefteris & Kersten, Martin (2013). "Column imprints: a secondary index structure".Proceedings of the 2013 international conference on Management of data. ACM. pp. 893–904.
Ivanova, Milena; Kargin, Yagiz; Kersten, Martin; Manegold, Stefan; Zhang, Ying; Datcu, Mihai; Molina, Daniela Espinoza (2013). "Data Vaults: A Database Welcome to Scientific File Repositories".Proceedings of the 25th International Conference on Scientific and Statistical Database Management. SSDBM. ACM.doi:10.1145/2484838.2484876.ISBN 978-1-4503-1921-8.

External links

[edit]

Data warehouses

Creating a data warehouse

Concepts	Database Dimension Dimensional modeling Fact OLAP Star schema Snowflake schema Reverse star schema Aggregate Single version of the truth
Variants	Column-oriented DBMS Data hub Data mesh Ensemble modeling patterns Anchor modeling Data vault modeling Focal point modeling HOLAP MOLAP ROLAP Operational data store
Elements	Data dictionary/Metadata Data mart Sixth normal form Surrogate key
Fact	Fact table Early-arriving fact Measure
Dimension	Dimension table Degenerate Slowly changing
Filling	Extract, transform, load (ETL) Extract, load, transform (ELT) Extract Transform Load

Using a data warehouse

Concepts	Business intelligence Dashboard Data mining Decision support system (DSS) OLAP cube Data warehouse automation
Languages	Data Mining Extensions (DMX) MultiDimensional eXpressions (MDX) XML for Analysis (XMLA)
Tools	Business intelligence software Reporting software Spreadsheet

People	Bill Inmon Information factory Ralph Kimball Enterprise bus Dan Linstedt
Products	Comparison of OLAP servers Data warehousing products and their producers