This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Federated database system" – news ·newspapers ·books ·scholar ·JSTOR(November 2023) (Learn how and when to remove this message) |
Afederated database system (FDBS) is a type ofmeta-database management system (DBMS), which transparently maps multiple autonomousdatabase systems into a singlefederated database. The constituentdatabases are interconnected via acomputer network and may be geographically decentralized. Since the constituent database systems remain autonomous, a federated database system is a contrastable alternative to the (sometimes daunting) task of merging several disparate databases. A federated database, orvirtual database, is a composite of all constituent databases in a federated database system. There is no actual data integration in the constituent disparate databases as a result of data federation.
Throughdata abstraction, federated database systems can provide a uniformuser interface, enablingusers andclients to store and retrievedata from multiple noncontiguousdatabases with a singlequery—even if the constituent databases areheterogeneous. To this end, a federated database system must be able to decompose the query into subqueries for submission to the relevant constituentDBMSs, after which the system must composite theresult sets of the subqueries. Because various database management systems employ differentquery languages, federated database systems can applywrappers to the subqueries to translate them into the appropriatequery languages.
McLeod and Heimbigner[1] were among the first to define a federated database system in the mid-1980s.
A FDBS is one which "define[s] the architecture and interconnect[s] databases that minimize central authority yet support partial sharing and coordination among database systems".[1] This description might not accurately reflect the McLeod/Heimbigner[1] definition of a federated database. Rather, this description fits what McLeod/Heimbigner called acomposite database. McLeod/Heimbigner's federated database is a collection of autonomous components that make their data available to other members of the federation through the publication of an export schema and access operations; there is no unified, central schema that encompasses the information available from the members of the federation.
Among other surveys,[2] practitioners define a Federated Database as a collection of cooperating component systems which are autonomous and are possiblyheterogeneous.
The three important components of an FDBS are autonomy,heterogeneity and distribution.[2] Another dimension which has also been considered is the Networking EnvironmentComputer Network, e.g., many DBSs over aLAN or many DBSs over aWAN update related functions of participating DBSs (e.g., no updates, nonatomic transitions,atomic updates).
ADBMS can be classified as either centralized or distributed. A centralized system manages a single database while distributed manages multiple databases. A componentDBS in a DBMS may be centralized or distributed. A multiple DBS (MDBS) can be classified into two types depending on the autonomy of the component DBS as federated and non federated. A nonfederated database system is an integration of componentDBMS that are not autonomous.A federated database system consists of componentDBS that are autonomous yet participate in a federation to allow partial and controlled sharing of their data.
Federated architectures differ based on levels of integration with the component database systems and the extent of services offered by the federation. A FDBS can be categorized as loosely or tightly coupled systems.
Multiple DBS of which FDBS are a specific type can be characterized along three dimensions: Distribution, Heterogeneity and Autonomy. Another characterization could be based on the dimension of networking, for example single databases or multiple databases in a LAN or WAN.
Distribution of data in an FDBS is due to the existence of a multiple DBS before an FDBS is built. Data can be distributed among multiple databases which could be stored in a single computer or multiple computers. These computers could be geographically located in different places but interconnected by a network. The benefits of data distribution help in increased availability and reliability as well as improved access times.
Heterogeneities in databases arise due to factors such as differences in structures, semantics of data, the constraints supported orquery language. Differences in structure occur when twodata models provide different primitives such asobject oriented (OO) models that support specialization and inheritance andrelational models that do not. Differences due to constraints occur when two models support two different constraints. For example, the set type inCODASYLschema may be partially modeled as a referential integrity constraint in a relationship schema.CODASYL supports insertion and retention that are not captured by referential integrity alone. The query language supported by oneDBMS can also contribute toheterogeneity between other componentDBMSs. For example, differences in query languages with the samedata models or different versions of query languages could contribute toheterogeneity.
Semantic heterogeneities arise when there is a disagreement about meaning, interpretation or intended use ofdata. At the schema and data level, classification of possible heterogeneities include:
In creating a federated schema, one has to resolve such heterogeneities before integrating the component DB schemas.
Dealing with incompatible data types or query syntax is not the only obstacle to a concrete implementation of an FDBS. In systems that are not planned top-down, a generic problem lies in matchingsemantically equivalent, but differently named parts from differentschemas (=data models) (tables, attributes). A pairwise mapping betweenn attributes would result in mapping rules (given equivalence mappings) - a number that quickly gets too large for practical purposes. A common way out is to provide a global schema that comprises the relevant parts of all member schemas and provide mappings in the form ofdatabase views. Two principal approaches depend on the direction of the mapping:
Both are examples ofdata integration, called theschema matching problem.
Fundamental to the difference between an MDBS and an FDBS is the concept of autonomy. It is important to understand the aspects of autonomy for component databases and how they can be addressed when a component DBS participates in an FDBS.There are four kinds of autonomies addressed:
Heterogeneities in an FDBS are primarily due to design autonomy.
The ANSI/X3/SPARC Study Group outlined a three level data description architecture, the components of which are the conceptual schema, internal schema and external schema of databases. The three level architecture is however inadequate to describing the architectures of an FDBS. It was therefore extended to support the three dimensions of the FDBS namely Distribution, Autonomy and Heterogeneity. The five level schema architecture is explained below.
TheHeterogeneity andAutonomy requirements pose special challenges concerningconcurrency control in an FDBS, which is crucial for the correct execution of its concurrenttransactions (see alsoGlobal concurrency control). Achievingglobal serializability, the major correctness criterion, under these requirements has been characterized as very difficult and unsolved.[2]
The five level schema architecture includes the following:
While accurately representing the state of the art in data integration, the Five Level Schema Architecture above does suffer from a major drawback, namely IT imposed look and feel. Modern data users demand control over how data is presented; their needs are somewhat in conflict with such bottom-up approaches to data integration.