Functions of Distributed Database System in DBMS

A distributed database is essentially a database that is dispersed across numerous sites, i.e., on various computers or over a network of computers, and is not restricted to a single system. A distributed database system is spread across several locations with distinct physical components. This can be necessary when different people from all over the world need to access a certain database. It must be handled such that, to users, it seems to be a single database.

Types:

1. Homogeneous Database: A homogeneous database stores data uniformly across all locations. All sites utilize the same operating system, database management system, and data structures. They are therefore simple to handle.

2. Heterogeneous Database: With a heterogeneous distributed database, many locations may employ various software and schema, which may cause issues with queries and transactions. Moreover, one site could not be even aware of the existence of the other sites. Various operating systems and database applications may be used by various machines. They could even employ separate database data models. Translations are therefore necessary for communication across various sites.

Distributed Data Storage

Data may be stored on several places in two ways using distributed data storage are:

1. Replication - With this strategy, every aspect of the connection is redundantly kept at two or more locations. It is a completely redundant database if the entire database is accessible from every location. Systems preserve copies of the data as a result of replication.

This has advantages since it makes more data accessible at many locations. Moreover, query requests can now be handled in parallel.

But, there are some drawbacks as well. Data must be updated often. All changes performed at one site must be documented at every site where that relation is stored in order to avoid inconsistent results. There is a tone of overhead here. Moreover, since concurrent access must now be monitored across several sites, concurrency management becomes far more complicated.

2. Fragmentation- In this method, the relationships are broken up into smaller pieces and each fragment is kept in the many locations where it is needed. To ensure there is no data loss, the pieces must be created in a way that allows for the reconstruction of the original relation. As fragmentation doesn't result in duplicate data, consistency is not a concern.

Relationships can be fragmented in one of two ways:

Separating the relation into groups of tuples using rows results in horizontal fragmentation, where each tuple is allocated to at least one fragment.
Vertical fragmentation, also known as splitting by columns, occurs when a relation's schema is split up into smaller schemas. A common candidate key must be present in each fragment in order to guarantee a lossless join.

Sometimes a strategy that combines fragmentation and replication is employed.

Uses for distributed databases

The corporate management information system makes use of it.
Multimedia apps utilize it.
Used in hotel chains, military command systems, etc.
The production control system also makes use of it.

A form of database management system known as a distributed database system stores data across several computers or sites that are linked by a network. Each location in a distributed database system has its own database, which are linked together to create a single, integrated system.

A distributed database system's key benefit is that it can offer more availability and dependability than a centralized database system. As the data is spread over numerous locations, the system can still operate even if one or more of the locations fail. Also, by dispersing the data and processing burden across several sites, a distributed database system can offer superior performance.

For distributed database systems, there are several possible architectures, including:

Client-server architecture: Users connect to a central server, which controls a distributed database system. The server is in charge of maintaining data storage, controlling access, and organizing transactions.

Peer-to-peer architecture: Under this design, every distributed database system site is linked to every other site. Each website is in charge of overseeing its own data management and organizing business with other websites.

Federated architecture: In this architecture, each site in the distributed database system maintains a separate, independent database. Nevertheless, the databases are connected via a middleware layer that offers a standard interface for accessing and querying the data.

Applications for distributed database systems include e-commerce, financial services, and telephony. Thoughtful thought must be given to issues like data dissemination, replication, and consistency when creating and administering a distributed database system.

A process used in distributed or decentralized multi-agent platforms to come to a consensus. The mechanism for passing messages depends on it.

Example:

A network's processes collectively choose the leader. Every procedure starts with a leadership bid. Consensus is used in classic or conventional distributed systems to provide dependability and fault tolerance. It indicates that in a decentralized environment, where each party is independent and has the power to decide for themselves, it is possible for certain nodes or parties to act deliberately or improperly. So, in certain particular situations, it is crucial to reach a conclusion or share a viewpoint. So, the biggest challenge is coming to a consensus in a setting where people might act deliberately or improperly crash the operation. Hence, in this type of distributed system, our goal is to maintain dependability, which refers to making sure that things work well even when there are bad people around.

Ways to reach distributed consensus:

In order to reach distributed consensus, several requirements must be met:

Every non- faulty process finally must make a decision.
Consensus- The outcome of each non-defective procedure must be the same.
Validity- Every non-faulty process must have the same value at both the start and the finish.
Integrity- Each right person determines just one value, and that value must be put forth by someone else.

Basically, we should get to a conclusion with a value that must represent the starting value of some process since it is absurd to come to a decision where the agreed value does not reflect anyone's first preference.

The Distributed Consensus Protocol's accuracy

It can be characterized by the next two characteristics.

Safety Property: This feature guarantees that neither you nor the right people in a network will ever converge on the wrong value.
Every proper value must ultimately be accepted, according to the liveness Property, which implies that eventually something positive will occur.

Distributed consensus application:

Election of a leader in a fault-tolerant environment to start a worldwide initiative without adding a single point of failure.
Consistency upkeep in a dispersed network. Assume that various nodes are keeping an eye on the same environment. A consensus procedure guarantees resilience against such defects in the event that one of the nodes crashes.

In essence, distribution causes the system's design and execution to become more complicated. To accomplish the following potential benefits:

Transparencies in the network
Greater dependability
Performance Improvements
Facilitate Expansion

Function of Centralized DBMS:

The fundamental purpose of centralized DBMS is to provide us a full picture of our data. For instance, we may run a query to see how many people are eager to purchase globally.
The ease of management compared to other distributed systems is the second fundamental characteristic of centralized DBMS.

In addition to centralized DBMS functions, the distributed database must be able to do the following tasks.

Distributed database system features:

Data tracking - By growing the DDBMS catalogue, DDBMS's primary purpose is to monitor data dissemination, fragmentation, and replication.
Distributed Query Processing - A DDBMS's primary role is essentially its capacity to connect to remote locations and send data and queries back and forth between them across a communication network.
Replicated Data Management: The fundamental job of a DDBMS is to select which copy of a duplicated data item to access and to ensure that those copies are consistent.
Recovering from individual site crashes and brand-new sorts of failures, including broken communication lines, is possible with distributed databases.
Security - The fundamental job of a DDBMS is to carry out distributed transactions while properly managing user authorization and data security.
Distributed Directory Management - A directory, in essence, is a repository for data from a database. The directory may be local for each site or global for the whole DDB. There might be design and policy difficulties with the directory's placement and dissemination.
The fundamental task of a distributed database management system (DDBMS) is to design execution plans for queries and transactions that access data from many sites, synchronize the access to distributed data, and, in essence, preserve the integrity of the whole database.

However these features essentially make a DDBMS more difficult than a centralized DBMS.

Several interconnected databases that are geographically dispersed across many sites make up a distributed database system. A distributed database system performs the following tasks:

Data distribution: Distributing data among several sites is one of a distributed database system's main tasks. This minimizes the quantity of data that needs to be sent across the network and ensures that data is stored near to where it is needed.
Replication of data across different sites is possible in a distributed database system. Replication makes data available even if one of the locations fails, which increases system availability and dependability.
Data fragmentation is the process of dividing a huge database into smaller pieces and distributing them to many websites. By lowering the quantity of data that needs to be carried over the network, this can aid in enhancing system performance.
Processing user requests and obtaining information from the distributed database system is known as query processing. It may be necessary to aggregate data from several sites in order to respond to user requests, making this a challenging process.
Management of transactions: In a distributed database system, transactions may take place across several locations. Coordination of these transactions is necessary to guarantee their accurate and effective completion.
Security and access control: It's crucial to make sure that data in a distributed database system is safe and that access is restricted. To safeguard data from illegal access or alteration, adequate security measures and access control mechanisms must be put in place.
The ability to manage enormous amounts of data and user requests depends on the efficiency of a distributed database system, which must be tuned. This might entail enhancing network performance, query processing algorithm optimization, or database parameter adjustment.
System administration: System administration is the process of overseeing and maintaining the distributed database system. This may entail doing things like backing up data, keeping track of system performance, and fixing system problems.

DBMS Requires a Distributed Database

Let's begin with databases and their many kinds.

An organized collection of information is called a database. In a database, the data may be conveniently accessed, managed, amended, updated, controlled, and organized.

The two main categories of databases are distributed and centralized databases. Why do we even need a Distributed Database in DBMS is the question at hand. For the time being, let's imagine that we only have centralized databases.

All of the data will be entered into a single database. Increasing its size to the point that it will take a long time to query even a single entry.
As we only have one database, if a problem happens, we are no longer able to fulfill user requests.
Even if we wanted to, there is no way to scale, and availability is also reduced, which lowers throughput.

Throughput, latency, scalability, availability, fault tolerance, and many other difficulties that may occur while utilizing a single system and a single database are all resolved by distributed databases.

Next TopicStarvation in DBMS

← prev next →