Types of Data Warehouses
There are different types of data warehouses, which are as follows:
Host-Based Data Warehouses
There are two types of host-based data warehouses which can be implemented:
Data Extraction and transformation tools allow the automated extraction and cleaning of data from production systems. It is not applicable to enable direct access by query tools to these categories of methods for the following reasons:
Host-Based (MVS) Data Warehouses
Those data warehouse uses that reside on large volume databases on MVS are the host-based types of data warehouses. Often the DBMS is DB2 with a huge variety of original source for legacy information, including VSAM, DB2, flat files, and Information Management System (IMS).
Before embarking on designing, building and implementing such a warehouse, some further considerations must be given because
To make such data warehouses building successful, the following phases are generally followed:
An integrated Metadata repository is central to any data warehouse environment. Such a facility is required for documenting data sources, data translation rules, and user areas to the warehouse. It provides a dynamic network between the multiple data source databases and the DB2 of the conditional data warehouses.
A metadata repository is necessary to design, build, and maintain data warehouse processes. It should be capable of providing data as to what data exists in both the operational system and data warehouse, where the data is located. The mapping of the operational data to the warehouse fields and end-user access techniques. Query, reporting, and maintenance are another indispensable method of such a data warehouse. An MVS-based query and reporting tool for DB2.
Host-Based (UNIX) Data Warehouses
Oracle and Informix RDBMSs support the facilities for such data warehouses. Both of these databases can extract information from MVS¬ based databases as well as a higher number of other UNIX¬ based databases. These types of warehouses follow the same stage as the host-based MVS data warehouses. Also, the data from different network servers can be created. Since file attribute consistency is frequent across the inter-network.
LAN-Based Workgroup Data Warehouses
A LAN based workgroup warehouse is an integrated structure for building and maintaining a data warehouse in a LAN environment. In this warehouse, we can extract information from a variety of sources and support multiple LAN based warehouses, generally chosen warehouse databases to include DB2 family, Oracle, Sybase, and Informix. Other databases that can also be contained through infrequently are IMS, VSAM, Flat File, MVS, and VH.
Designed for the workgroup environment, a LAN based workgroup warehouse is optimal for any business organization that wants to build a data warehouse often called a data mart. This type of data warehouse generally requires a minimal initial investment and technical training.
Data Delivery: With a LAN based workgroup warehouse, customer needs minimal technical knowledge to create and maintain a store of data that customized for use at the department, business unit, or workgroup level. A LAN based workgroup warehouse ensures the delivery of information from corporate resources by providing transport access to the data in the warehouse.
Host-Based Single Stage (LAN) Data Warehouses
Within a LAN based data warehouse, data delivery can be handled either centrally or from the workgroup environment so business groups can meet process their data needed without burdening centralized IT resources, enjoying the autonomy of their data mart without comprising overall data integrity and security in the enterprise.
Both DBMS and hardware scalability methods generally limit LAN based warehousing solutions.
Many LAN based enterprises have not implemented adequate job scheduling, recovery management, organized maintenance, and performance monitoring methods to provide robust warehousing solutions.
Often these warehouses are dependent on other platforms for source record. Building an environment that has data integrity, recoverability, and security require careful design, planning, and implementation. Otherwise, synchronization of transformation and loads from sources to the server could cause innumerable problems.
A LAN based warehouse provides data from many sources requiring a minimal initial investment and technical knowledge. A LAN based warehouse can also work replication tools for populating and updating the data warehouse. This type of warehouse can include business views, histories, aggregation, versions in, and heterogeneous source support, such as
A single store frequently drives a LAN based warehouse and provides existing DSS applications, enabling the business user to locate data in their data warehouse. The LAN based warehouse can support business users with complete data to information solution. The LAN based warehouse can also share metadata with the ability to catalog business data and make it feasible for anyone who needs it.
Multi-Stage Data Warehouses
It refers to multiple stages in transforming methods for analyzing data through aggregations. In other words, staging of the data multiple times before the loading operation into the data warehouse, data gets extracted form source systems to staging area first, then gets loaded to data warehouse after the change and then finally to departmentalized data marts.
This configuration is well suitable to environments where end-clients in numerous capacities require access to both summarized information for up to the minute tactical decisions as well as summarized, a commutative record for long-term strategic decisions. Both the Operational Data Store (ODS) and the data warehouse may reside on host-based or LAN Based databases, depending on volume and custom requirements. These contain DB2, Oracle, Informix, IMS, Flat Files, and Sybase.
Usually, the ODS stores only the most up-to-date records. The data warehouse stores the historical calculation of the files. At first, the information in both databases will be very similar. For example, the records for a new client will look the same. As changes to the user record occur, the ODs will be refreshed to reflect only the most current data, whereas the data warehouse will contain both the historical data and the new information. Thus the volume requirement of the data warehouse will exceed the volume requirements of the ODS overtime. It is not familiar to reach a ratio of 4 to 1 in practice.
Stationary Data Warehouses
In this type of data warehouses, the data is not changed from the sources, as shown in fig:
Instead, the customer is given direct access to the data. For many organizations, infrequent access, volume issues, or corporate necessities dictate such as approach. This schema does generate several problems for the customer such as
Such a warehouse will need highly specialized and sophisticated 'middleware' possibly with a single interaction with the client. This may also be essential for a facility to display the extracted record for the user before report generation. An integrated metadata repository becomes an absolute essential under this environment.
Distributed Data Warehouses
The concept of a distributed data warehouse suggests that there are two types of distributed data warehouses and their modifications for the local enterprise warehouses which are distributed throughout the enterprise and a global warehouses as shown in fig:
Characteristics of Local data warehouses
Virtual Data Warehouses
Virtual Data Warehouses is created in the following stages:
This strategy defines that end users are allowed to get at operational databases directly using whatever tools are implemented to the data access network. This method provides ultimate flexibility as well as the minimum amount of redundant information that must be loaded and maintained. The data warehouse is a great idea, but it is difficult to build and requires investment. Why not use a cheap and fast method by eliminating the transformation phase of repositories for metadata and another database. This method is termed the 'virtual data warehouse.'
To accomplish this, there is a need to define four kinds of data: