Movatterモバイル変換

[0]ホーム

Jump to content

Shard (database architecture)

Edit links

From Wikipedia, the free encyclopedia

Horizontal partition of data in a database or search engine

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Shard" database architecture – news ·newspapers ·books ·scholar ·JSTOR(March 2021) (Learn how and when to remove this message)

Adatabase shard, or simply ashard, is ahorizontal partition of data in adatabase orsearch engine. Each shard may be held on a separatedatabase server instance, to spread load.

Some data in a database remains present in all shards,^[a] but some appears only in a single shard. Each shard acts as the single source for this subset of data.^[1]

Database architecture

[edit]

Horizontal partitioning is a database design principle wherebyrows of a database table are held separately, rather than being split intocolumns (which is whatnormalization andvertical partitioning do, to differing extents). Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.

There are numerous advantages to the horizontal partitioning of data. Since tables are divided and distributed into multiple servers, the total number of rows in each table in each database is reduced. This reducesindex size, which generally improves search performance. A database shard can be placed on separate hardware, and multiple shards can be placed on multiple machines. This enables a distribution of the database over a large number of machines, greatly improving performance. In addition, if the database shard is based on some real-world segmentation of the data (e.g., European customers v. American customers) then it may be possible to infer the appropriate shard membership easily and automatically, and query only the relevant shard.^[2]

In practice, sharding is complex. Although it has been done for a long time by hand-coding (especially where rows have an obvious grouping, as in the customer region example above), this is often inflexible. There is a desire to support sharding automatically, both in terms of adding code support for it, and for identifying candidates to be sharded separately.Consistent hashing is a technique used in sharding to spread large loads across multiple smaller services and servers.^[3]

Wheredistributed computing is used to separate load between multiple servers (either for performance or reliability reasons), a shard approach may also be useful. In the 2010s, sharding ofexecution capacity, as well as the more traditional sharding ofdata, has emerged as a potential approach to overcome performance and scalability problems inblockchains.^[4]^[5]

To mitigate the latency associated with cross-shard locking, novel consensus architectures have been proposed. One such approach, detailed in theProceedings of the VLDB Endowment, utilizes a method known as "braided synchronization". This technique, implemented in the Cerberus protocol, couples consensus instances to specific transaction sets rather than a linear ledger, allowing foratomic composability across multiple shards without a global lock.^[6]

Compared to horizontal partitioning

[edit]

Horizontal partitioning splits one or more tables by row, usually within asingle instance of aschema and a database server. It may offer an advantage by reducing index size (and thus search effort) provided that there is some obvious, robust, implicit way to identify in which partition a particular row will be found, without first needing to search the index, e.g., the classic example of the 'CustomersEast' and 'CustomersWest' tables, where theirZIP code already indicates where they will be found.

Sharding goes beyond this. It partitions the problematic table(s) in the same way, but it does this across potentiallymultiple instances of the schema. The obvious advantage would be that search load for the large partitioned table can now be split across multiple servers (logical or physical), not just multiple indexes on the same logical server.

Splitting shards across multiple isolated instances requires more than simple horizontal partitioning. The hoped-for gains in efficiency would be lost, if querying the database requiredmultiple instances to be queried, just to retrieve a simpledimension table. Beyond partitioning, sharding thus splits large partitionable tables across the servers, while smaller tables are replicated as complete units.^{[clarification needed]}

This is also why sharding is related to ashared-nothing architecture—once sharded, each shard can live in a totally separate logical schema instance / physical database server /data center /continent. There is no ongoing need to retain shared access (from between shards) to the other unpartitioned tables in other shards.^[7]

This makes replication across multiple servers easy (simple horizontal partitioning does not). It is also useful for worldwide distribution of applications, where communications links between data centers would otherwise be a bottleneck.^{[citation needed]}

There is also a requirement for some notification and replication mechanism between schema instances, so that the unpartitioned tables remain as closely synchronized as the application demands. This is a complex choice in the architecture of sharded systems: approaches range from making these effectively read-only (updates are rare and batched), to dynamicallyreplicated tables (at the cost of reducing some of the distribution benefits of sharding) and many options in between.^{[citation needed]}

Implementations

[edit]

Altibase provides combined (client-side and server-side) sharding architecture transparent to client applications.
ApacheHBase can shard automatically.^[8]
Azure SQL Database Elastic Database tools shards to scale out and in the data-tier of an application.^[9]
ClickHouse, a fast open-source OLAP database management system, shards.
Couchbase shards automatically and transparently.
CUBRID shards since version 9.0
Db2 Data Partitioning Feature (MPP) which is a shared-nothing database partitions running on separate nodes.
DRDS (Distributed Relational Database Service) ofAlibaba Cloud does database/table sharding,^[10] and supportsSingles' Day.^[11]
Elasticsearch enterprise search server shards.^[12]
eXtreme Scale is a cross-process in-memory key/value data store (aNoSQL data store). It uses sharding to achieve scalability across processes for both data andMapReduce-style parallel processing.^[13]
Hibernate shards, but has had little development since 2007.^[14]^[15]
IBMInformix shards since version 12.1 xC1 as part of the MACH11 technology. Informix 12.10 xC2 added full compatibility with MongoDB drivers, allowing the mix of regular relational tables with NoSQL collections, while still allowing sharding, fail-over and ACID properties.^[16]^[17]
Kdb+ shards since version 2.0.
MariaDB Spider, an storage engine that supports table federation, table sharding, XA transactions, and ODBC data sources. The MariaDB Spider engine is bundled in MariaDB server since version 10.0.4.^[18]
MonetDB, an open-sourcecolumn-store, does read-only sharding in its July 2015 release.^[19]
MongoDB shards since version 1.6.
MySQL Cluster automatically and transparently shards across low-cost commodity nodes, allowing scale-out of read and write queries, without requiring changes to the application.^[20]
MySQL Fabric (part of MySQL utilities) shards.^[21]
Oracle Database shards since 12c Release 2 and in one liner: Combination of sharding advantages with well-known capabilities of enterprise ready multi-model Oracle Database.^[22]
Oracle NoSQL Database has automatic sharding and elastic, online expansion of the cluster (adding more shards).
OrientDB shards since version 1.7
Solr enterprise search server shards.^[23]
ScyllaDB runs sharded on each core in a server, across all the servers in a cluster
Spanner, Google's global-scale distributed database, shards across multiplePaxos state machines to scale to "millions of machines across hundreds of data centers and trillions of database rows".^[24]
SQLAlchemy ORM, a data-mapper for thePython programming language shards.^[25]
SQL Server, since SQL Server 2005 shards with help of 3rd party tools.^[26]
Teradata markets a massive parallel database management system as a "data warehouse"
Vault, acryptocurrency, shards to drastically reduce the data that users need to join the network and verify transactions. This allows the network to scale much more.^[27]
Vitess open-source database clustering system shards MySQL. It is aCloud Native Computing Foundation project.^[28]
ShardingSphere related to a database clustering system providing data sharding, distributed transactions, and distributed database management. It is anApache Software Foundation (ASF) project.^[29]

Disadvantages

[edit]

Sharding a database table before it has been optimized locally causes premature complexity. Sharding should be used only when all other options for optimization are inadequate.^{[according to whom?]} The introduced complexity of database sharding causes the following potential problems:^{[citation needed]}

SQL complexity - Increased bugs because the developers have to write more complicated SQL to handle sharding logic
Additional software - that partitions, balances, coordinates, and ensures integrity can fail
Single point of failure - Corruption of one shard due to network/hardware/systems problems causes failure of the entire table.
Fail-over server complexity - Fail-over servers must have copies of the fleets of database shards.
Backups complexity - Database backups of the individual shards must be coordinated with the backups of the other shards.
Operational complexity - Adding/removing indexes, adding/deleting columns, modifying the schema becomes much more difficult.

Etymology

[edit]

In a database context, most recognize the term "shard" is most likely derived from either one of two sources:Computer Corporation of America's "A System for Highly Available Replicated Data",^[30] which utilized redundant hardware to facilitate datareplication (as opposed to horizontal partitioning), or the 1997MMORPG video gameUltima Online.^[31]^[32]

Richard Garriott, creator ofUltima Online, recollects the term being coined during production phase when they attempted to create a self-regulating virtual ecology system, whereby players may leverage new internet access (a revolutionary technology at the time) to interact and harvest in-game resources.^[32] Although the virtual ecology functioned as intended during in-house testing, its natural balance failed "almost instantaneously" due to players killing off every living wildlife across the playable area faster than the spawning system could operate. Garriott's production team attempted to mitigate this issue by separating the global player base into separate sessions, and rewriting part ofUltima Online's fictional connection to the end ofUltima I: The First Age of Darkness, where the defeat of its antagonistMondain also led to the creation ofmultiverse "shards". This modification provided Garriott's team with the fictional basis needed to justify creating copies of the virtual environment. However, the game's sharp rise to critical acclaim also meant that the new multiverse virtual ecology system was quickly overwhelmed as well. After several months of testing, Garriott's team decided to abandon the feature altogether, and stripped the game of its functionality.^[32]

Today, the term "shard" refers to the deployment and use of redundant hardware across database systems.^{[citation needed]}

Notes

[edit]

^Typically 'supporting' data such asdimension tables

References

[edit]

^Sadalage, Pramod J.;Fowler, Martin (2012). "4: Distribution Models".NoSQL Distilled. Pearson Education.ISBN 978-0321826626.
^Rahul Roy (July 28, 2008)."Shard - A Database Design".
^Ries, Eric."Sharding for Startups".
^Wang, Gang; Shi, Zhijie Jerry; Nixon, Mark; Han, Song (21 October 2019)."SoK".Proceedings of the 1st ACM Conference on Advances in Financial Technologies. pp. 41–61.doi:10.1145/3318041.3355457.ISBN 9781450367325.S2CID 204749727.
^Yu, Mingchao; Sahraei, Saeid; Nixon, Mark; Han, Song (18 July 2020). "SoK: Sharding on Blockchain".Proceedings of the 1st ACM Conference on Advances in Financial Technologies. pp. 114–134.doi:10.1145/3318041.3355457.ISBN 9781450367325.S2CID 204749727.
^Hellings, Jelle; Sadoghi, Mohammad (2021)."Cerberus: Minimalistic Multi-shard Byzantine-resilient Transaction Processing"(PDF).Proceedings of the VLDB Endowment.14 (11):2230–2243.doi:10.14778/3476249.3476274.
^"Understanding Database Sharding".DigitalOcean Community Tutorials. 2022-03-16. Retrieved2025-10-09.Database shards exemplify a shared-nothing architecture. This means that the shards are autonomous; they don't share any of the same data or resources.
^"Apache HBase – Apache HBase™ Home".hbase.apache.org.
^"Introducing Elastic Scale preview for Azure SQL Database".azure.microsoft.com. 2 October 2014.
^"Alibaba Cloud Help Center - Cloud Definition and Explanation of Cloud Based Services - Alibaba Cloud".www.alibabacloud.com.
^"Focuses on Large-Scale Online Databases - Alibaba Cloud".www.alibabacloud.com.
^"Index Shard Allocation | Elasticsearch Guide [7.13] | Elastic".www.elastic.co.
^"IBM Docs".
^"Hibernate Shards". 2007-02-08.
^"Hibernate Shards". Archived fromthe original on 2008-12-16. Retrieved2011-03-30.
^"New Grid queries for Informix".
^"NoSQL support in Informix (JSON storage, Mongo DB API)". September 24, 2013.
^"Spider".MariaDB KnowledgeBase. Retrieved2022-12-20.
^"MonetDB July2015 Released". 31 August 2015.
^"MySQL Cluster Features & Benefits". 2012-11-23.
^"MySQL Fabric sharding quick start guide".
^"Oracle Sharding".Oracle. 2018-05-24. Retrieved2021-07-10.
^"DistributedSearch - SOLR - Apache Software Foundation".cwiki.apache.org.
^Corbett, James C; Dean, Jeffrey; Epstein, Michael; Fikes, Andrew; Frost, Christopher; Furman, JJ; Ghemawat, Sanjay; Gubarev, Andrey; Heiser, Christopher; Hochschild, Peter; Hsieh, Wilson; Kanthak, Sebastian; Kogan, Eugene; Li, Hongyi; Lloyd, Alexander; Melnik, Sergey; Mwaura, David; Nagle, David; Quinlan, Sean; Rao, Rajesh; Rolig, Lindsay; Saito, Yasushi; Szymaniak, Michal; Taylor, Christopher; Wang, Ruth; Woodford, Dale."Spanner: Google's Globally-Distributed Database"(PDF).Proceedings of OSDI 2012. Retrieved24 February 2014.
^"sqlalchemy/sqlalchemy". July 9, 2021 – via GitHub.
^"Partitioning and Sharding Options for SQL Server and SQL Azure".infoq.com.
^"A faster, more efficient cryptocurrency".MIT News. 24 January 2019. Retrieved2019-01-30.
^"Vitess".vitess.io.
^"ShardingSphere".shardingsphere.apache.org.
^Sarin, DeWitt & Rosenberg,Overview of SHARD: A System for Highly Available Replicated Data, Technical Report CCA-88-01, Computer Corporation of America, May 1988
^Koster, Raph (2009-01-08)."Database "sharding" came from UO?".Raph Koster's Website. Retrieved2015-01-17.
^^a ^b ^c"Ultima Online: The Virtual Ecology | War Stories".Ars Technica Videos. 21 December 2017.

External links

[edit]

Informix JSON data sharding

v t e Database management systems
Types	Object-oriented comparison Relational list comparison Key–value Column-oriented list Document-oriented Wide-column store Graph NoSQL NewSQL In-memory list Multi-model comparison Cloud Blockchain-based database
Concepts	Database ACID Armstrong's axioms Codd's 12 rules CAP theorem CRUD Null Candidate key Foreign key PACELC design principle Superkey Surrogate key Unique key
Objects	Relation table column row View Transaction Transaction log Trigger Index Stored procedure Cursor Partition
Components	Concurrency control Data dictionary JDBC XQJ ODBC Query language Query optimizer Query rewriting system Query plan
Functions	Administration Query optimization Replication Sharding
Related topics	Database models Database normalization Database storage Distributed database Federated database system Referential integrity Relational algebra Relational calculus Relational model Object–relational database Transaction processing List of SQL software and tools
Category Outline

Software design patterns

Gang of Four
patterns

Creational	Abstract factory Builder Factory method Prototype Singleton
Structural	Adapter Bridge Composite Decorator Facade Flyweight Proxy
Behavioral	Chain of responsibility Command Interpreter Iterator Mediator Memento Observer State Strategy Template method Visitor

Concurrency
patterns

Architectural
patterns

Other
patterns

Books

People

Communities

Movatterモバイル変換

Database architecture

Compared to horizontal partitioning

Implementations

Disadvantages

Etymology

See also

Notes

References

External links