Movatterモバイル変換

[0]ホーム

Jump to content

Data orientation

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromColumn-oriented DBMS)

Tabular data representation in memory

Data orientation is the representation oftabular data in a linear memory model such asin-disk orin-memory. The two most common representations arecolumn-oriented (columnar format) androw-oriented (row format).^[1]^[2]

The choice of data orientation is atrade-off and anarchitectural decision indatabases, query engines, and numerical simulations.^[1] As a result of these tradeoffs, row-oriented formats are more commonly used inonline transaction processing (OLTP) and column-oriented formats are more commonly used inonline analytical processing (OLAP).^[2]

Examples of column-oriented formats includeApache ORC,^[3]Apache Parquet,^[4]Apache Arrow,^[5] formats used byBigQuery,Amazon Redshift andSnowflake. Predominant examples of row-oriented formats include CSV, formats used in mostrelational databases (Oracle,MySQL etc.), the in-memory format ofApache Spark, andApache Avro.^[6]

Description

[edit]

Tabular data is two dimensional — data is modeled as rows and columns. However, computer systems represent data in alinear memory model, both in-disk and in-memory.^[7]^[8]^[9] Therefore, a table in a linear memory model requires mapping its two-dimensional scheme into a one-dimensional space. Data orientation is to the decision taken in this mapping. There are two prominent mappings: row-oriented and column-oriented.^[1]^[2]

Row-oriented

[edit]

In a row-oriented database, also known as a rowstore, the elements of the table

column 1	column 2	column 3
item 11	item 12	item 13
item 21	item 22	item 23

are stored linearly as

item 11

item 12

item 13

item 21

item 22

item 23

I.e. each row of the table is located one after the other. In this orientation, values in the same row are close in space (e.g. similar address in an addressable space).

Examples

[edit]

CSV
Postgres in-disk and in-memory formats
Apache Spark in-memory format
Apache Avro
MySQL

Column-oriented

[edit]

In a column-oriented database, also known as a columnstore, the elements of the table

column 1	column 2	column 3
item 11	item 12	item 13
item 21	item 22	item 23

are stored linearly as

item 11

item 21

item 12

item 22

item 13

item 23

I.e. each column of the table is located one after the other. In this orientation, values on the same column are close in space (e.g. similar address in an addressable space).

Examples

[edit]

BigQuery's in-memory and storage formats
Apache Parquet
Apache ORC
Apache Arrow
DuckDB in-memory format
Pandas in-memory format
R dataframes^[10]

Seelist of column-oriented DBMSes for more examples.

Tradeoff

[edit]

Data orientation is an importantarchitectural decision of systems handling data because it results in importanttradeoffs inperformance andstorage.^[8] Below are selected dimensions of this tradeoff.

Random access

[edit]

Row-oriented benefits from fast random access of rows. Column-oriented benefits from fast random access of columns.In both cases, this is the result of fewer page or cache misses when accessing the data.^[8]

Insert

[edit]

Row-oriented benefits from fast insertion of a new row. Column-oriented benefits from fast insertion of a new column.

This dimension is an important reason why row-oriented formats are more commonly used inonline transaction processing (OLTP), as it results in faster transactions in comparison to column-oriented.^[2]

Conditional access

[edit]

Row-oriented benefits from fast access under a filter. Column-oriented benefits from fast access under aprojection.^[4]^[3]

Compute performance

[edit]

Column-oriented benefits from fast analytics operations. This is the result of being able to leverageSIMD instructions.^[5]

Uncompressed size

[edit]

Column-oriented benefits from smaller uncompressed size. This is the result of the possibility that this orientation offers to represent certain data types with dedicated encodings.^[4]^[3]

For example, a table of 128 rows with a Boolean column requires 128 bytes in a row-oriented format (one byte per Boolean) but 128 bits (16 bytes) in a column-oriented format (via a bitmap). Another example is the use ofrun-length encoding to encode a column.

Compressed size

[edit]

Column-oriented benefits from smaller compressed size. This is the result of a higher homogeneity within a column than within multiple rows.^[4]^[3]

Conversion and interchange

[edit]

Because both orientations represent the same data, it is possible to convert a row-oriented dataset to a column-oriented dataset and vice versa at the expense of compute. In particular, advanced query engines often leverage each orientation's advantages, and convert from one orientation to the other as part of their execution. As an example, anApache Spark query may

Read data fromApache Parquet (column-oriented)
Load it into the Spark internal in-memory format (row-oriented)
Convert it toApache Arrow for a specific computation (column-oriented)
Write it toApache Avro for streaming (row-oriented)

References

[edit]

^^a ^b ^cAbadi, Daniel J.; Madden, Samuel R.; Hachem, Nabil (2008). "Column-stores vs. Row-stores: How different are they really?".Proceedings of the 2008 ACM SIGMOD international conference on Management of data. pp. 967–980.doi:10.1145/1376616.1376712.ISBN 978-1-60558-102-6.
^^a ^b ^c ^dFunke, Florian; Kemper, Alfons; Neumann, Thomas (2012). "Compacting Transactional Data in Hybrid OLTP&OLAP Databases".Proceedings of the VLDB Endowment.5 (11):1424–1435.doi:10.14778/2350229.2350258.
^^a ^b ^c ^d"Apache ORC". Retrieved2024-05-21.
^^a ^b ^c ^d"Apache Parquet". Retrieved2024-05-21.
^^a ^b"Apache Arrow". Retrieved2024-05-21.
^"Apache Avro". Retrieved2024-05-21.
^Richard, Golden G.; Case, Andrew (2014)."In lieu of swap: Analyzing compressed RAM in Mac OS X and Linux".Digital Investigation.11:S3–S12.doi:10.1016/j.diin.2014.05.011.
^^a ^b ^cM. Frans Kaashoek, Jerome H. Saltzer (2009).Principles of Computer System Design. Morgan Kaufmann.ISBN 978-0-12-374957-4.
^"Chapter 4 Process Address Space (Linux kernel documentation)". Retrieved2024-05-21.
^"R Coding Basics - 9 Data Frames".www.gastonsanchez.com. Retrieved2025-01-19.

v t e Database models
Common models	Flat Hierarchical Dimensional Network Relational Entity–relationship Enhanced Graph Object-oriented Entity–attribute–value
Other models	Multi-dimensional Array Semantic Star schema XML database
Implementations	Flat file Column-oriented Document-oriented Object–relational Deductive Temporal Valid time Transaction time Decision time XML data store Key–value store Ordered Key-Value Store Triplestore

Retrieved from "https://en.wikipedia.org/w/index.php?title=Data_orientation&oldid=1321435790"

Category:

Database models

Hidden categories:

[8]ページ先頭