Movatterモバイル変換

[0]ホーム

Jump to content

Snowflake schema

Edit links

From Wikipedia, the free encyclopedia

Logical arrangement of computing tables in a multidimensional database

Not to be confused withSnowflake Inc. orSnowflake ID.

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Snowflake schema" – news ·newspapers ·books ·scholar ·JSTOR(October 2012) (Learn how and when to remove this message)

The snowflake schema is a variation of the star schema, featuring normalization of dimension tables.

Incomputing, asnowflake schema orsnowflake model is alogical arrangement of tables in amultidimensional database such that theentity relationship diagram resembles asnowflake shape. The snowflake schema is represented by centralizedfact tables which are connected to multipledimensions. "Snowflaking" is a method of normalizing the dimension tables in astar schema. When it is completely normalized along all the dimension tables, the resultant structure resembles a snowflake with thefact table in the middle. The principle behind snowflaking is normalization of the dimension tables by removing low cardinality attributes and forming separate tables.^[1]

The snowflake schema is similar to the star schema. However, in the snowflake schema, dimensions arenormalized into multiple related tables, whereas the star schema's dimensions are denormalized with each dimension represented by a single table. A complex snowflake shape emerges when the dimensions of a snowflake schema are elaborate, having multiple levels of relationships, and the child tables have multiple parent tables ("forks in the road").

Common uses

[edit]

Star and snowflake schemas are most commonly found in dimensionaldata warehouses anddata marts where speed of data retrieval is more important than the efficiency of data manipulations. As such, the tables in these schemas are not normalized much, and are frequently designed at a level of normalization short ofthird normal form.^[2]

Data normalization and storage

[edit]

Normalization splits up data to avoid redundancy (duplication) by moving commonly repeating groups of data into new tables. Normalization therefore tends to increase the number of tables that need to be joined in order to perform a given query, but reduces the space required to hold the data and the number of places where it needs to be updated if the data changes.

From a space storage point of view, dimensional tables are typically small compared to fact tables. This often negates the potential storage-space benefits of the snowflake schema as compared to the star schema. Example: One million sales transactions in 300 shops in 220 countries would result in 1,000,300 records in a star schema (1,000,000 records in the fact table and 300 records in the dimensional table where each country would be listed explicitly for each shop in that country). A more normalized snowflake schema with country keys referring to a country table would consist of the same 1,000,000 record fact table, a 300 record shop table with references to a country table with 220 records. In this case, the star schema, although further denormalized, would only reduce the number or records by a (negligible) ~0.02% (=[1,000,000+300] instead of [1,000,000+300+220])

Some database developers compromise by creating an underlying snowflake schema withviews built on top of it that perform many of the necessary joins to simulate a star schema. This provides the storage benefits achieved through the normalization of dimensions with the ease of querying that the star schema provides. The tradeoff is that requiring the server to perform the underlying joins automatically can result in a performance hit when querying as well as extra joins to tables that may not be necessary to fulfill certain queries.^{[citation needed]}

Benefits

[edit]

In dimensional modeling, asnowflake schema normalizes dimension hierarchies into multiple related tables. In some situations it can offer benefits; however, they come with notable trade-offs.

Potential benefits

Storage efficiency for very large hierarchies. Normalizing low-cardinality rollups (e.g., product category/brand, geography) removes repeated text and can reduce redundancy and update effort in large dimensions.^[3]
Accurate representation of hierarchical relationships. A snowflake reflects the hierarchy explicitly by linking secondary tables to the base dimension.^[4]

Trade-offs and limitations

Ease-of-use and query performance. Snowflaked designs are harder for business users to understand and navigate and can negatively impact query performance due to additional joins.^[4]^[5]
ETL and maintenance complexity. Managing keys and slowly changing hierarchies across multiple linked dimension tables increases ETL complexity; any space savings are often negligible relative to the added complexity.^[5]
More joins required. While normalization reduces redundancy, it typically increases the number of joins needed at query time.^[3]

Disadvantages

[edit]

The primary disadvantage of the snowflake schema is that the additional levels of attribute normalization adds complexity to source query joins, when compared to thestar schema.

Snowflake schemas, in contrast to flat single table dimensions, have been heavily criticised. Their goal is assumed to be an efficient and compact storage of normalised data but this is at the significant cost of poor performance when browsing the joins required in this dimension.^[6] This disadvantage may have reduced in the years since it was first recognized, owing to better query performance within the browsing tools.

Examples

[edit]

The example schema shown to the right is a snowflaked version of the star schema example provided in thestar schema article.

The following example query is the snowflake schema equivalent of the star schema example code which returns the total number of television units sold by brand and by country for 1997. Notice that the snowflake schema query requires many more joins than the star schema version in order to fulfill even a simple query. The benefit of using the snowflake schema in this example is that the storage requirements are lower since the snowflake schema eliminates many duplicate values from the dimensions themselves.

SELECTB.Brand,G.Country,SUM(F.Units_Sold)FROMFact_SalesFINNERJOINDim_DateDONF.Date_Id=D.IdINNERJOINDim_StoreSONF.Store_Id=S.IdINNERJOINDim_GeographyGONS.Geography_Id=G.IdINNERJOINDim_ProductPONF.Product_Id=P.IdINNERJOINDim_BrandBONP.Brand_Id=B.IdINNERJOINDim_Product_CategoryCONP.Product_Category_Id=C.IdWHERED.Year=1997ANDC.Product_Category='tv'GROUPBYB.Brand,G.Country

References

[edit]

^Paulraj Ponniah.Data Warehousing Fundamentals for IT Professionals. Wiley, 2010, pp. 29–32.ISBN 0470462078.
^Han, Jiawei (2012).Data Mining - Concepts and Techniques. Massachusetts, USA: Morgan Kauffmann Publishers.ISBN 9780123814791.
^^a ^b"Star and Snowflake Schemas".Oracle. Oracle. Retrieved2025-08-15.
^^a ^b"Snowflaked Dimensions".Kimball Group. Retrieved2025-08-15.
^^a ^bRoss, Margy (2008-09-03)."Design Tip #105: Snowflakes, Outriggers, and Bridges".Kimball Group. Retrieved2025-08-15.
^Kimball, Ralph (1996)."6: The Big Dimensions".The Data Warehouse Toolkit (1st ed.). Wiley. pp. 95–98.ISBN 0-471-15337-0.Do not snowflake your dimensions, even if they are large

Bibliography

[edit]

Anahory, S.; D. Murray.Data Warehousing in the Real World: A Practical Guide for Building Decision Support Systems. Addison Wesley Professional.
Kimball, Ralph (1996).The Data Warehousing Toolkit. John Wiley.ISBN 978-0-471-15337-5.

External links

[edit]

Data warehouses

Creating a data warehouse

Concepts	Database Dimension Dimensional modeling Fact OLAP Star schema Snowflake schema Reverse star schema Aggregate Single version of the truth
Variants	Column-oriented DBMS Data hub Data mesh Ensemble modeling patterns Anchor modeling Data vault modeling Focal point modeling HOLAP MOLAP ROLAP Operational data store
Elements	Data dictionary/Metadata Data mart Sixth normal form Surrogate key
Fact	Fact table Early-arriving fact Measure
Dimension	Dimension table Degenerate Slowly changing
Filling	Extract, transform, load (ETL) Extract, load, transform (ELT) Extract Transform Load

Using a data warehouse

Concepts	Business intelligence Dashboard Data mining Decision support system (DSS) OLAP cube Data warehouse automation
Languages	Data Mining Extensions (DMX) MultiDimensional eXpressions (MDX) XML for Analysis (XMLA)
Tools	Business intelligence software Reporting software Spreadsheet

People	Bill Inmon Information factory Ralph Kimball Enterprise bus Dan Linstedt
Products	Comparison of OLAP servers Data warehousing products and their producers

Retrieved from "https://en.wikipedia.org/w/index.php?title=Snowflake_schema&oldid=1319360812"

Categories:

Hidden categories:

[8]ページ先頭