Incomputing, thestar schema orstar model is the simplest style ofdata martschema and is the approach most widely used to develop data warehouses and dimensional data marts.[1] The star schema consists of one or morefact tables referencing any number ofdimension tables. The star schema is an important special case of thesnowflake schema, and is more effective for handling simpler queries.[2]
The star schema gets its name from thephysical model's[3] resemblance to astar shape with a fact table at its center and the dimension tables surrounding it representing the star's points.
The star schema separates business process data into facts, which hold the measurable, quantitative data about a business, and dimensions which are descriptive attributes related to fact data. Examples of fact data include sales price, sale quantity, and time, distance, speed and weight measurements. Related dimension attribute examples include product models, product colors, product sizes, geographic locations, and salesperson names.
A star schema that has many dimensions is sometimes called acentipede schema.[4] Having dimensions of only a few attributes, while simpler to maintain, results in queries with many table joins and makes the star schema less easy to use.
Fact tables record measurements or metrics for a specific event. Fact tables generally consist of numeric values, and foreign keys to dimensional data where descriptive information is kept.[4]Fact tables are designed to a low level of uniform detail (referred to as "granularity" or "grain"), meaning facts can record events at a very atomic level. This can result in the accumulation of a large number of records in a fact table over time. Fact tables are defined as one of three types:
Fact tables are generally assigned asurrogate key to ensure each row can be uniquely identified.This key is a simple primary key.
Dimension tables usually have a relatively small number of records compared to fact tables, but each record may have a very large number of attributes to describe the fact data. Dimensions can define a wide variety of characteristics, but some of the most common attributes defined by dimension tables include:
Dimension tables are generally assigned asurrogate primary key, usually a single-column integer data type, mapped to the combination of dimension attributes that form the natural key.
Star schemas aredenormalized, meaning the typical rules of normalization applied to transactional relational databases are relaxed during star-schema design and implementation. The benefits of star-schema denormalization are:
Consider a database of sales, perhaps from a store chain, classified by date, store and product. The image of the schema to the right is a star schema version of the sample schema provided in thesnowflake schema article.
Fact_Sales
is the fact table and there are three dimension tablesDim_Date
,Dim_Store
andDim_Product
.
Each dimension table has a primary key on itsId
column, relating to one of the columns (viewed as rows in the example schema) of theFact_Sales
table's three-column (compound) primary key (Date_Id
,Store_Id
,Product_Id
). The non-primary keyUnits_Sold
column of the fact table in this example represents a measure or metric that can be used in calculations and analysis. The non-primary key columns of the dimension tables represent additional attributes of the dimensions (such as theYear
of theDim_Date
dimension).
For example, the following query answers how many TV sets have been sold, for each brand and country, in 1997:
SELECTP.Brand,S.CountryASCountries,SUM(F.Units_Sold)FROMFact_SalesFINNERJOINDim_DateDON(F.Date_Id=D.Id)INNERJOINDim_StoreSON(F.Store_Id=S.Id)INNERJOINDim_ProductPON(F.Product_Id=P.Id)WHERED.Year=1997ANDP.Product_Category='tv'GROUPBYP.Brand,S.Country