This articleduplicates the scope of other articles, specificallyData model. Pleasediscuss this issue and help introduce asummary style to the article.(March 2023) |

| Part of a series on |
| Software development |
|---|
Paradigms and models |
Standards and bodies of knowledge |
Data modeling insoftware engineering is the process of creating adata model for aninformation system by applying certain formal techniques. It may be applied as part of broaderModel-driven engineering (MDE) concept.
Data modeling is aprocess used to define and analyze datarequirements needed to support thebusiness processes within the scope of corresponding information systems in organizations. Therefore, the process of data modeling involves professional data modelers working closely with business stakeholders, as well as potential users of the information system.
There are three different types of data models produced while progressing from requirements to the actual database to be used for the information system.[2] The data requirements are initially recorded as aconceptual data model which is essentially a set of technology independent specifications about the data and is used to discuss initial requirements with the business stakeholders. Theconceptual model is then translated into alogical data model, which documents structures of the data that can be implemented in databases. Implementation of one conceptual data model may require multiple logical data models. The last step in data modeling is transforming the logical data model to aphysical data model that organizes the data into tables, and accounts for access, performance and storage details. Data modeling defines not just data elements, but also their structures and the relationships between them.[3]
Data modeling techniques and methodologies are used to model data in a standard, consistent, predictable manner in order to manage it as a resource. The use of data modeling standards is strongly recommended for all projects requiring a standard means of defining and analyzing data within an organization, e.g., using data modeling:
Data modelling may be performed during various types of projects and in multiple phases of projects. Data models are progressive; there is no such thing as the final data model for a business or application. Instead, a data model should be considered a living document that will change in response to a changing business. The data models should ideally be stored in a repository so that they can be retrieved, expanded, and edited over time.Whitten et al. (2004) determined two types of data modelling:[4]
Data modelling is also used as a technique for detailing businessrequirements for specificdatabases. It is sometimes calleddatabase modelling because adata model is eventually implemented in a database.[4]

Data models provide a framework fordata to be used withininformation systems by providing specific definitions and formats. If a data model is used consistently across systems then compatibility of data can be achieved. If the same data structures are used to store and access data then different applications can share data seamlessly. The results of this are indicated in the diagram. However, systems and interfaces are often expensive to build, operate, and maintain. They may also constrain the business rather than support it. This may occur when the quality of the data models implemented in systems and interfaces is poor.[1]
Some common problems found in data models are:

In 1975ANSI described three kinds of data-modelinstance:[5]
According to ANSI, this approach allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual schema. The table/column structure can change without (necessarily) affecting the conceptual schema. In each case, of course, the structures must remain consistent across all schemas of the same data model.

In the context ofbusiness process integration (see figure), data modeling complementsbusiness process modeling, and ultimately results in database generation.[6]
The process of designing a database involves producing the previously described three types of schemas – conceptual, logical, and physical. The database design documented in these schemas is converted through aData Definition Language, which can then be used to generate a database. A fully attributed data model contains detailed attributes (descriptions) for every entity within it. The term "database design" can describe many different parts of the design of an overalldatabase system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In therelational model these are thetables andviews. In anobject database the entities and relationships map directly to object classes and named relationships. However, the term "database design" could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within theDatabase Management System or DBMS.
In the process, systeminterfaces account for 25% to 70% of the development and support costs of current systems. The primary reason for this cost is that these systems do not share acommon data model. If data models are developed on a system by system basis, then not only is the same analysis repeated in overlapping areas, but further analysis must be performed to create the interfaces between them. Most systems within an organization contain the same basic data, redeveloped for a specific purpose. Therefore, an efficiently designed basic data model can minimize rework with minimal modifications for the purposes of different systems within the organization[1]
Data models represent information areas of interest. While there are many ways to create data models, according toLen Silverston (1997)[7] only two modeling methodologies stand out, top-down and bottom-up:
Sometimes models are created in a mixture of the two methods: by considering the data needs and structure of an application and by consistently referencing a subject-area model. In many environments, the distinction between a logical data model and a physical data model is blurred. In addition, someCASE tools don't make a distinction between logical andphysical data models.[7]

There are several notations for data modeling. The actual model is frequently called "entity–relationship model", because it depicts data in terms of the entities and relationships described in thedata.[4] An entity–relationship model (ERM) is an abstract conceptual representation of structured data. Entity–relationship modeling is a relational schemadatabase modeling method, used insoftware engineering to produce a type ofconceptual data model (orsemantic data model) of a system, often arelational database, and its requirements in atop-down fashion.
These models are being used in the first stage ofinformation system design during therequirements analysis to describe information needs or the type ofinformation that is to be stored in adatabase. Thedata modeling technique can be used to describe anyontology (i.e. an overview and classifications of used terms and their relationships) for a certainuniverse of discourse i.e. the area of interest.
Several techniques have been developed for the design of data models. While these methodologies guide data modelers in their work, two different people using the same methodology will often come up with very different results. Most notable are:

Generic data models are generalizations of conventionaldata models. They define standardized general relation types, together with the kinds of things that may be related by such a relation type. The definition of the generic data model is similar to the definition of a natural language. For example, a generic data model may define relation types such as a 'classification relation', being abinary relation between an individual thing and a kind of thing (a class) and a 'part-whole relation', being a binary relation between two things, one with the role of part, the other with the role of whole, regardless the kind of things that are related.
Given an extensible list of classes, this allows the classification of any individual thing and to specification of part-whole relations for any individual object. By standardization of an extensible list of relation types, a generic data model enables the expression of an unlimited number of kinds of facts and will approach the capabilities of natural languages. Conventional data models, on the other hand, have a fixed and limited domain scope, because the instantiation (usage) of such a model only allows expressions of kinds of facts that are predefined in the model.
The logical data structure of a DBMS, whether hierarchical, network, or relational, cannot totally satisfy the requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. That is unless the semantic data model is implemented in the database on purpose, a choice which may slightly impact performance but generally vastly improves productivity.

Therefore, the need to define data from a conceptual view has led to the development ofsemantic data modeling techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in the figure the real world, in terms of resources, ideas, events, etc., is symbolically defined by its description within physical data stores. A semantic data model is anabstraction which defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world.[8]
The purpose of semantic data modeling is to create a structural model of a piece of the real world, called "universe of discourse". For this, three fundamental structural relations are considered:
A semantic data model can be used to serve many purposes, such as:[8]
The overall goal of semantic data models is to capture more meaning of data by integrating relational concepts with more powerfulabstraction concepts known from theartificial intelligence field. The idea is to provide high-level modeling primitives as integral parts of a data model in order to facilitate the representation of real-world situations.[10]