This article has multiple issues. Please helpimprove it or discuss these issues on thetalk page.(Learn how and when to remove these messages) (Learn how and when to remove this message)
|
Dimensional modeling is part of theBusiness Dimensional Lifecycle methodology developed byRalph Kimball which includes a set of methods, techniques and concepts for use indata warehouse design.[1]: 1258–1260 [2] The approach focuses on identifying the keybusiness processes within a business and modelling and implementing these first before adding additional business processes, as abottom-up approach.[1]: 1258–1260 An alternative approach fromInmon advocates a top down design of the model of all the enterprise data using tools such asentity-relationship modeling (ER).[1]: 1258–1260
Dimensional modeling always uses the concepts of facts (measures), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts. For example, sales amount is a fact; timestamp, product, register#, store#, etc. are elements of dimensions. Dimensional models are built by business process area, e.g. store sales, inventory, claims, etc. Because the differentbusiness process areas share some but not all dimensions, efficiency in design, operation, and consistency, is achieved usingconformed dimensions, i.e. using one copy of the shared dimension across subject areas.[citation needed]
Dimensional modeling does not necessarily involve a relational database. The same modeling approach, at the logical level, can be used for any physical form, such as multidimensional database or even flat files. It is oriented around understandability and performance.[citation needed]
The dimensional model is built on astar-like schema orsnowflake schema, with dimensions surrounding the fact table.[3][4] To build the schema, the following design model is used:
The process of dimensional modeling builds on a 4-step design method that helps to ensure the usability of the dimensional model and the use of thedata warehouse. The basics in the design build on the actual business process which thedata warehouse should cover. Therefore, the first step in the model is to describe the business process which the model builds on. This could for instance be a sales situation in a retail store. To describe the business process, one can choose to do this in plain text or use basicBusiness Process Model and Notation (BPMN) or other design guides like theUnified Modeling Language |UML).
After describing the business process, the next step in the design is to declare the grain of the model. The grain of the model is the exact description of what the dimensional model should be focusing on. This could for instance be “An individual line item on a customer slip from a retail store”. To clarify what the grain means, you should pick the central process and describe it with one sentence. Furthermore, the grain (sentence) is what you are going to build your dimensions and fact table from. You might find it necessary to go back to this step to alter the grain due to new information gained on what your model is supposed to be able to deliver.
The third step in the design process is to define the dimensions of the model. The dimensions must be defined within the grain from the second step of the 4-step process. Dimensions are the foundation of the fact table, and is where the data for the fact table is collected. Typically dimensions are nouns like date, store, inventory etc. These dimensions are where all the data is stored. For example, the date dimension could contain data such as year, month and weekday.
After defining the dimensions, the next step in the process is to make keys for the fact table. This step is to identify the numeric facts that will populate each fact table row. This step is closely related to the business users of the system, since this is where they get access to data stored in thedata warehouse. Therefore, most of the fact table rows are numerical, additive figures such as quantity or cost per unit, etc.
Dimensional normalization or snowflaking removes redundant attributes, which are known in the normal flatten de-normalized dimensions. Dimensions are strictly joined together in sub dimensions.
Snowflaking has an influence on the data structure that differs from many philosophies of data warehouses.[4]Single data (fact) table surrounded by multiple descriptive (dimension) tables
Developers often don't normalize dimensions due to several reasons:[5]
There are some arguments on why normalization can be useful.[4] It can be an advantage when part of hierarchy is common to more than one dimension. For example, a geographic dimension may be reusable because both the customer and supplier dimensions use it.
Commonly cited benefits of dimensional modeling include:[6]
We still get the benefits of dimensional models onHadoop and similarbig data frameworks. However, some features of Hadoop require us to slightly adapt the standard approach to dimensional modelling.[citation needed]