Adata model is anabstract model that organizes elements ofdata andstandardizes how they relate to one another and to the properties of real-worldentities.[2][3] For instance, a data model may specify that the data element representing a car be composed of a number of other elements which, in turn, represent the color and size of the car and define its owner.
The corresponding professional activity is called generallydata modeling or, more specifically,database design.Data models are typically specified by a data expert, data specialist, data scientist, data librarian, or a data scholar. A datamodeling language and notation are often represented in graphical form as diagrams.[4]
A data model can sometimes be referred to as adata structure, especially in the context ofprogramming languages. Data models are often complemented byfunction models, especially in the context ofenterprise models.
A data model explicitly determines thestructure of data; conversely,structured data is data organized according to an explicit data model or data structure. Structured data is in contrast tounstructured data andsemi-structured data.
The termdata model can refer to two distinct but closely related concepts. Sometimes it refers to an abstract formalization of theobjects and relationships found in a particular application domain: for example the customers, products, and orders found in a manufacturing organization. At other times it refers to the set of concepts used in defining such formalizations: for example concepts such as entities, attributes, relations, or tables. So the "data model" of a banking application may be defined using the entity–relationship "data model". This article uses the term in both senses.
Managing large quantities of structured andunstructured data is a primary function ofinformation systems. Data models describe the structure, manipulation, and integrity aspects of the data stored in data management systems such as relational databases. They may also describe data with a looser structure, such asword processing documents,email messages, pictures, digital audio, and video:XDM, for example, provides a data model forXML documents.
The main aim of data models is to support the development ofinformation systems by providing the definition and format of data. According to West and Fowler (1999) "if this is done consistently across systems then compatibility of data can be achieved. If the same data structures are used to store and access data then different applications can share data. The results of this are indicated above. However, systems and interfaces often cost more than they should, to build, operate, and maintain. They may also constrain the business rather than support it. A major cause is that the quality of the data models implemented in systems and interfaces is poor".[5]
The reason for these problems is a lack of standards that will ensure that data models will both meet business needs and be consistent.[5]
A data model explicitly determines the structure of data. Typical applications of data models include database models, design of information systems, and enabling exchange of data. Usually, data models are specified in a data modeling language.[3]
A data modelinstance may be one of three kinds according toANSI in 1975:[6]
The significance of this approach, according to ANSI, is that it allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual model. The table/column structure can change without (necessarily) affecting the conceptual model. In each case, of course, the structures must remain consistent with the other model. The table/column structure may be different from a direct translation of the entity classes and attributes, but it must ultimately carry out the objectives of the conceptual entity class structure. Early phases of many software development projects emphasize the design of aconceptual data model. Such a design can be detailed into alogical data model. In later stages, this model may be translated intophysical data model. However, it is also possible to implement a conceptual model directly.
One of the earliest pioneering works in modeling information systems was done by Young and Kent (1958),[7][8] who argued for "a precise and abstract way of specifying the informational and time characteristics of adata processing problem". They wanted to create "a notation that should enable theanalyst to organize the problem around any piece ofhardware". Their work was the first effort to create an abstract specification and invariant basis for designing different alternative implementations using different hardware components. The next step in IS modeling was taken byCODASYL, an IT industry consortium formed in 1959, who essentially aimed at the same thing as Young and Kent: the development of "a proper structure for machine-independent problem definition language, at the system level of data processing". This led to the development of a specific ISinformation algebra.[8]
In the 1960s data modeling gained more significance with the initiation of themanagement information system (MIS) concept. According to Leondes (2002), "during that time, the information system provided the data and information for management purposes. The first generationdatabase system, calledIntegrated Data Store (IDS), was designed byCharles Bachman at General Electric. Two famous database models, thenetwork data model and thehierarchical data model, were proposed during this period of time".[9] Towards the end of the 1960s,Edgar F. Codd worked out his theories of data arrangement, and proposed therelational model for database management based onfirst-order predicate logic.[10]
In the 1970sentity–relationship modeling emerged as a new type of conceptual data modeling, originally formalized in 1976 byPeter Chen. Entity–relationship models were being used in the first stage ofinformation system design during therequirements analysis to describe information needs or the type ofinformation that is to be stored in adatabase. This technique can describe anyontology, i.e., an overview and classification of concepts and their relationships, for a certainarea of interest.
In the 1970sG.M. Nijssen developed "Natural Language Information Analysis Method" (NIAM) method, and developed this in the 1980s in cooperation withTerry Halpin intoObject–Role Modeling (ORM). However, it was Terry Halpin's 1989 PhD thesis that created the formal foundation on which Object–Role Modeling is based.
Bill Kent, in his 1978 bookData and Reality,[11] compared a data model to a map of a territory, emphasizing that in the real world, "highways are not painted red, rivers don't have county lines running down the middle, and you can't see contour lines on a mountain". In contrast to other researchers who tried to create models that were mathematically clean and elegant, Kent emphasized the essential messiness of the real world, and the task of the data modeler to create order out of chaos without excessively distorting the truth.
In the 1980s, according to Jan L. Harrington (2000), "the development of theobject-oriented paradigm brought about a fundamental change in the way we look at data and the procedures that operate on data. Traditionally, data and procedures have been stored separately: the data and their relationship in a database, the procedures in an application program. Object orientation, however, combined an entity's procedure with its data."[12]
During the early 1990s, three Dutch mathematicians Guido Bakema, Harm van der Lek, and JanPieter Zwart, continued the development on the work ofG.M. Nijssen. They focused more on the communication part of the semantics. In 1997 they formalized the method Fully Communication Oriented Information ModelingFCO-IM.
A database model is a specification describing how a database is structured and used.
Several such models have been suggested. Common models include:
A data structure diagram (DSD) is adiagram and data model used to describeconceptual data models by providing graphical notations which documententities and theirrelationships, and theconstraints that bind them. The basic graphic elements of DSDs areboxes, representing entities, andarrows, representing relationships. Data structure diagrams are most useful for documenting complex data entities.
Data structure diagrams are an extension of theentity–relationship model (ER model). In DSDs,attributes are specified inside the entity boxes rather than outside of them, while relationships are drawn as boxes composed of attributes which specify the constraints that bind entities together. DSDs differ from the ER model in that the ER model focuses on the relationships between different entities, whereas DSDs focus on the relationships of the elements within an entity and enable users to fully see the links and relationships between each entity.
There are several styles for representing data structure diagrams, with the notable difference in the manner of definingcardinality. The choices are between arrow heads, inverted arrow heads (crow's feet), or numerical representation of the cardinality.
An entity–relationship model (ERM), sometimes referred to as an entity–relationship diagram (ERD), could be used to represent an abstractconceptual data model (orsemantic data model or physical data model) used insoftware engineering to represent structured data. There are several notations used for ERMs. Like DSD's,attributes are specified inside the entity boxes rather than outside of them, while relationships are drawn as lines, with the relationship constraints as descriptions on the line. The E-R model, while robust, can become visually cumbersome when representing entities with several attributes.
There are several styles for representing data structure diagrams, with a notable difference in the manner of defining cardinality. The choices are between arrow heads, inverted arrow heads (crow's feet), or numerical representation of the cardinality.
A data model inGeographic information systems is a mathematical construct for representing geographic objects or surfaces as data. For example,
Generic data models are generalizations of conventional data models. They define standardized general relation types, together with the kinds of things that may be related by such a relation type. Generic data models are developed as an approach to solving some shortcomings of conventional data models. For example, different modelers usually produce different conventional data models of the same domain. This can lead to difficulty in bringing the models of different people together and is an obstacle for data exchange and data integration. Invariably, however, this difference is attributable to different levels of abstraction in the models and differences in the kinds of facts that can be instantiated (the semantic expression capabilities of the models). The modelers need to communicate and agree on certain elements that are to be rendered more concretely, in order to make the differences less significant.
A semantic data model in software engineering is a technique to define the meaning of data within the context of its interrelationships with other data. A semantic data model is an abstraction that defines how the stored symbols relate to the real world.[13] A semantic data model is sometimes called aconceptual data model.
The logical data structure of adatabase management system (DBMS), whetherhierarchical,network, orrelational, cannot totally satisfy therequirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. Therefore, the need to define data from aconceptual view has led to the development of semantic data modeling techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in the figure. The real world, in terms of resources, ideas, events, etc., are symbolically defined within physical data stores. A semantic data model is an abstraction that defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world.[13]
Data architecture is the design of data for use in defining the target state and the subsequent planning needed to hit the target state. It is usually one of severalarchitecture domains that form the pillars of anenterprise architecture orsolution architecture.
A data architecture describes the data structures used by a business and/or its applications. There are descriptions of data in storage and data in motion; descriptions of data stores, data groups, and data items; and mappings of those data artifacts to data qualities, applications, locations, etc.
Essential to realizing the target state, Data architecture describes how data is processed, stored, and utilized in a given system. It provides criteria for data processing operations that make it possible to design data flows and also control the flow of data in the system.
Data modeling insoftware engineering is the process of creating a data model by applying formal data model descriptions using data modeling techniques. Data modeling is a technique for defining businessrequirements for a database. It is sometimes calleddatabase modeling because a data model is eventually implemented in a database.[16]
The figure illustrates the way data models are developed and used today. Aconceptual data model is developed based on the datarequirements for the application that is being developed, perhaps in the context of anactivity model. The data model will normally consist of entity types, attributes, relationships, integrity rules, and the definitions of those objects. This is then used as the start point for interface ordatabase design.[5]
Some important properties of data for which requirements need to be met are:
Another kind of data model describes how to organize data using adatabase management system or other data management technology. It describes, for example, relational tables and columns or object-oriented classes and attributes. Such a data model is sometimes referred to as thephysical data model, but in the original ANSI three schema architecture, it is called "logical". In that architecture, the physical model describes the storage media (cylinders, tracks, and tablespaces). Ideally, this model is derived from the more conceptual data model described above. It may differ, however, to account for constraints like processing capacity and usage patterns.
Whiledata analysis is a common term for data modeling, the activity actually has more in common with the ideas and methods ofsynthesis (inferring general concepts from particular instances) than it does withanalysis (identifying component concepts from more general ones). {Presumably we call ourselvessystems analysts because no one can saysystems synthesists.} Data modeling strives to bring the data structures of interest together into a cohesive, inseparable, whole by eliminating unnecessary data redundancies and by relating data structures withrelationships.
A different approach is to useadaptive systems such asartificial neural networks that can autonomously create implicit models of data.
A data structure is a way of storing data in a computer so that it can be used efficiently. It is an organization of mathematical and logical concepts of data. Often a carefully chosen data structure will allow the mostefficientalgorithm to be used. The choice of the data structure often begins from the choice of anabstract data type.
A data model describes the structure of the data within a given domain and, by implication, the underlying structure of that domain itself. This means that a data model in fact specifies a dedicatedgrammar for a dedicated artificial language for that domain. A data model represents classes of entities (kinds of things) about which a company wishes to hold information, the attributes of that information, and relationships among those entities and (often implicit) relationships among those attributes. The model describes the organization of the data to some extent irrespective of how data might be represented in a computer system.
The entities represented by a data model can be the tangible entities, but models that include such concrete entity classes tend to change over time. Robust data models often identifyabstractions of such entities. For example, a data model might include an entity class called "Person", representing all the people who interact with an organization. Such anabstract entity class is typically more appropriate than ones called "Vendor" or "Employee", which identify specific roles played by those people.
The term data model can have two meanings:[17]
A data model theory has three main components:[17]
For example, in therelational model, the structural part is based on a modified concept of themathematical relation; the integrity part is expressed infirst-order logic and the manipulation part is expressed using therelational algebra,tuple calculus anddomain calculus.
A data model instance is created by applying a data model theory. This is typically done to solve some business enterprise requirement. Business requirements are normally captured by a semanticlogical data model. This is transformed into a physical data model instance from which is generated a physical database. For example, a data modeler may use a data modeling tool to create anentity–relationship model of the corporate data repository of some business enterprise. This model is transformed into arelational model, which in turn generates arelational database.
Patterns[18] are common data modeling structures that occur in many data models.
A data-flow diagram (DFD) is a graphical representation of the "flow" of data through aninformation system. It differs from theflowchart as it shows thedata flow instead of thecontrol flow of the program. A data-flow diagram can also be used for thevisualization ofdata processing (structured design). Data-flow diagrams were invented byLarry Constantine, the original developer of structured design,[20] based on Martin and Estrin's "data-flow graph" model of computation.
It is common practice to draw acontext-level data-flow diagram first which shows the interaction between the system and outside entities. TheDFD is designed to show how a system is divided into smaller portions and to highlight the flow of data between those parts. This context-level data-flow diagram is then "exploded" to show more detail of the system being modeled
An Information model is not a type of data model, but more or less an alternative model. Within the field of software engineering, both a data model and an information model can be abstract, formal representations of entity types that include their properties, relationships and the operations that can be performed on them. The entity types in the model may be kinds of real-world objects, such as devices in a network, or they may themselves be abstract, such as for the entities used in a billing system. Typically, they are used to model a constrained domain that can be described by a closed set of entity types, properties, relationships and operations.
According to Lee (1999)[21] an information model is a representation of concepts, relationships, constraints, rules, andoperations to specifydata semantics for a chosen domain of discourse. It can provide sharable, stable, and organized structure of information requirements for the domain context.[21] More in general the terminformation model is used for models of individual things, such as facilities, buildings, process plants, etc. In those cases the concept is specialised toFacility Information Model,Building Information Model, Plant Information Model, etc. Such an information model is an integration of a model of the facility with the data and documents about the facility.
An information model provides formalism to the description of a problem domain without constraining how that description is mapped to an actual implementation in software. There may be many mappings of the information model. Such mappings are called data models, irrespective of whether they areobject models (e.g. usingUML),entity–relationship models orXML schemas.
An object model in computer science is a collection of objects or classes through which a program can examine and manipulate some specific parts of its world. In other words, the object-oriented interface to some service or system. Such an interface is said to be theobject model of the represented service or system. For example, theDocument Object Model (DOM)[1] is a collection of objects that represent apage in aweb browser, used byscript programs to examine and dynamically change the page. There is aMicrosoft Excel object model[22] for controlling Microsoft Excel from another program, and theASCOM Telescope Driver[23] is an object model for controlling an astronomical telescope.
Incomputing the termobject model has a distinct second meaning of the general properties ofobjects in a specific computerprogramming language, technology, notation ormethodology that uses them. For example, theJava object model, theCOM object model, orthe object model ofOMT. Such object models are usually defined using concepts such asclass,message,inheritance,polymorphism, andencapsulation. There is an extensive literature on formalized object models as a subset of theformal semantics of programming languages.
Object–Role Modeling (ORM) is a method forconceptual modeling, and can be used as a tool for information and rules analysis.[25]
Object–Role Modeling is a fact-oriented method for performingsystems analysis at the conceptual level. The quality of a database application depends critically on its design. To help ensure correctness, clarity, adaptability and productivity, information systems are best specified first at the conceptual level, using concepts and language that people can readily understand.
The conceptual design may include data, process and behavioral perspectives, and the actual DBMS used to implement the design might be based on one of many logical data models (relational, hierarchic, network, object-oriented, etc.).[26]
The Unified Modeling Language (UML) is a standardized general-purposemodeling language in the field ofsoftware engineering. It is agraphical language for visualizing, specifying, constructing, and documenting theartifacts of a software-intensive system. The Unified Modeling Language offers a standard way to write a system's blueprints, including:[27]
UML offers a mix offunctional models, data models, anddatabase models.