











Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The concept of distributed databases, their reasons for use, and the differences between distributed and decentralized databases. It covers various options for distributed databases, such as homogeneous and heterogeneous environments, and their major objectives and significant trade-offs. The document also compares distributed databases to centralized databases and discusses the advantages and disadvantages of distributing a database through data replication and partitioning.
Typology: Study notes
1 / 19
This page cannot be seen from the preview
Don't miss anything!
Distributed and Centralized computing
database spread physically across computers in multiple locations that are connected by a data communications link Decentralized Database: A collection of independent databases on non-networked computers They are NOT the same thing!
Figure 13-1 Distributed database environments (adapted from Bell and Grimson, 1992)
(^) Homogeneous–same DBMS at each node (^) Autonomous–independent DBMSs (^) Non-autonomous–central, coordinating DBMS (^) Easy to manage, difficult to enforce (^) Heterogeneous–different DBMSs at different nodes (^) Systems–with full or partial DBMS functionality (^) Gateways–simple paths are created to other databases without the benefits of one logical database (^) Difficult to manage, preferred by independent organizations
Identical DBMSs Figure 13-2 Homogeneous Distributed Database Environment Source : adapted from Bell and Grimson, 1992.
Data distributed across all the nodes Different DBMSs may be used at each node Local access is done using the local DBMS and schema Remote access is done using the global schema
(^) Location Transparency (^) User does not have to know the location of the data (^) Data requests automatically forwarded to appropriate sites (^) Local Autonomy (^) Local site can operate with its database when network connections fail (^) Each site controls its own data, security, logging, recovery
(^) Synchronous Distributed Database (^) All copies of the same data are always identical (^) Data updates are immediately applied to all copies throughout network (^) Good for data integrity (^) High overhead slow response times (^) Asynchronous Distributed Database (^) Some data inconsistency is tolerated (^) Data update propagation is delayed (^) Lower data integrity (^) Less overhead faster response time NOTE: all this assumes replicated data (to be discussed lat
Software cost and complexity Processing overhead Data integrity exposure Slower response for certain queries
Data replication (^) Copies of data distributed to different sites Horizontal partitioning (^) Different rows of a table distributed to different sites Vertical partitioning (^) Different columns of a table distributed to different sites Combinations of the above
Disadvantages: (^) Additional requirements for storage space (^) Additional time for update operations Complexity and cost of updating Integrity exposure of getting incorrect data if replicated data is not updated simultaneously Therefore, better when used for non-volatile Therefore, better when used for non-volatile (read-only) (read-only) datadata
Push Replication– updating site sends changes to other sites Pull Replication– receiving sites control when update messages will be processed
Funding, autonomy, security Site data referencing patterns Growth and expansion needs Technological capabilities Costs of managing complex technologies Need for reliable service