Javatpoint Logo
Javatpoint Logo

Data Warehouse Interview Questions

A list of frequently asked Data Warehouse Interview Questions and Answers are given below.

1) What is a data warehouse?

A data warehouse is a huge store of data accumulated from a broad range of sources within an organization and used to guide business decisions.

Click here for more information

2) What is a dimensional table?

Dimensional tables include textual attributes of measurement saved in the fact tables. A dimensional table is a group of hierarchies, categories, and logic which can be used for the customer to traverse in hierarchy nodes.

3) What is a fact table?

The fact table includes the measurement of the business process. Fact table includes the foreign keys for the dimension tables.

Example: If we are business phase is "paper production," "normal production of paper by one device, "or "weekly production of paper" will be treated as a measurement of the business process.

4) What are the different methods of loading dimension tables?

There are two different methods to load data in dimension tables:

  • Conventional (slow): All the constraints and keys are validated against the information before, it is loaded, and this method data integrity is maintained.
  • Direct (fast): All the constraints and keys are disabled before the information is loaded. Once the information is loaded, it is validated against all the constraints and keys. If the data is found invalid, it is not contained in the index, and all the future processes are skipped in this data.

5) Describe the foreign key columns in fact tables and dimension tables?

Foreign keys of dimension tables are the primary key of entity tables.

Foreign keys of fact tables are the primary key of dimension tables.

6) What is Data Mining?

Data mining is the phase of analysing data from several perspectives and summarizing it into useful data.

7) What is Business Intelligence?

Business Intelligence defines the technologies, functions, and systems for the collection, integration, analysis, and demonstration of business data and sometimes to the data itself. The purpose of business intelligence is to provide better business decision making. Thus, BI is also defined as a decision support system (DSS).

8) What is OLTP?

OLTP stands for online transaction processing. This system is a function that modifies data the instance it receives and has a huge number of concurrent users.

9) What is OLAP?

OLAP stands for online analytical processing. This system is a function that collects, manages, processes, and presents multidimensional data for analysis and management process.

10) What is the difference between OLTP and OLAP?

Meaning OLTP stands for online transaction processing. OLAP stands for online analytical processing.
Data Source Operational data is initial data source of data. Consolidation data is from different sources.
Process Goal Snapshot of business processes which does essential business tasks Multidimensional view of business events of planning and decision making
Queries and Process scripts Simple quick running queries ran by customers. Complex long-running queries by a scheme to update the aggregate data.
Database Design Normalize small databases. Speed will not be an issue due to the smaller database, and normalization will not degrade performance. This adopts an entity-relationship (ER) model and function-oriented database design. De-normalized large databases. Speed is an issue due to the larger databases, and de-normalizing will improve performance as there will be lesser tables to scan while performing tasks. This adopts star, snowflake, or fact constellation mode of subject-oriented database design.
Back-up and system administration Regular database back-up and system administration can do the job. Reloading the OLTP data is well treated as a good backup option.
Click here for more information

11) What is ODS?

ODS stands for Operational data store. A database architecture that is a repository for near real-time operational records rather than long term trend data. The ODS may further become the enterprise shared an operational database, allowing operational functions that are being re-engineered to use the ODS as there operational databases.

Click here for more information

12) What is ETL?

ETL stands for extraction, transformation, and loading process. ETL is software that allows the business to develop their disparate records while moving it from place to place, and it doesn't really matter that data is in several forms or formats. The data can come from any source. ETL is powerful enough to manage such data disparities.

First, the extract function reads data from a particular source database and extracts a desired subset of data.

Second, the transform function works with the acquired record using rules or lookup tables, or creating a combination with other records to convert it to the desired state.

Finally, the load function is used to write the resulting information to a target database.

Click here for more information

13) What is VLDB?

VLDB stands for a Very large database. A one terabyte database would generally be considered to be a VLDB. Typically, there are decision support applications or transaction processing applications serving a huge number of users.

14) What is real-time data warehousing?

Data warehousing capture business event data. Real-time data warehousing capture business event data as it occurs. As soon as the business event is complete, and there is data about it, the completed event data flows into the data warehouse and becomes feasible instantly.

15) What are conformed dimensions?

Conformed dimension defines the exact same thing with every possible fact table to which they are joined. They are simple to the cubes.

16) What are non-additive facts?

Non-additive facts are the facts that cannot be examined for any of the dimensions present in the fact table. They are not treated as useless. If there is a transformation in dimensions, the same facts can be useful.

17) What is Star Schema?

Star schema is a type of organizing the tables such that we can fetch the result from the database instantly in the warehouse environment.

Click here for more information

18) What is a Snowflake Schema?

Any dimension with extended dimensions is called snowflake schema, the dimension may be interlinked or may have one too many relationships with other tables. This schema is normalized and outcome in complex join and very complex queries as well as slower results.

Click here for more information

19) What is a surrogate key?

A surrogate key is a substitution for the essential primary key. It is just a unique identifier or statistic for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table. It is useful because the essential primary key can change, and this makes updates more difficult. Surrogate keys are always integer or numeric.

20) What is a junk dimension?

A number of very small dimension may be lumped together to form a single dimension, a junk dimension is the attributes are not closely related. Grouping of random flags and text attributes in dimensions and changing them to a separate subdimension is called the junk dimension.

21) What is dimensional modeling?

Dimensional data model concept contains two types of tables, and it is different from the third normal form. This concept uses a fact table, which includes the measurement of the business and Dimensional tables, which includes the context (dimension of the calculations) of the dimensions.

Click here for more information

22) What is BUS Schema?

BUS schema is collected from a master suite of confirmed size and a standardized description of facts.

23) What is active data warehousing?

An active data warehouses provide data that allow decision-makers within an organization to handle customer relationships efficiently and proactively.

24) What is the difference between data warehousing and Business Intelligence?

Data warehousing handle with all methods of managing the development, implementation and applications of a data warehouse or data mart containing metadata management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational documenting, analytical documenting, security management, backup/recovery planning, etc.

Business Intelligence is a set of software tools that allows an organization to analyse measurable methods of their business, such as sales performance, profitability, operational efficiency, effectiveness, of marketing campaigns, market penetration among certain user groups, cost trends, anomalies and exceptions, etc. Business Intelligence is used to encompass OLAP, data visualization, data mining, and query document tools.

25) Which one is faster, Multidimensional OLAP, or Relational OLAP?

Multidimensional OLAP (MOLAP) is faster than Relational OLAP (ROLAP).

  • MOLAP: Here, data is saved in a multidimensional cube. The storage is not in the relational database but in a proprietary plan (one example is PowerOLAP's .olp file). MOLAP products are compatible with Excel, which can make record interactions easy to learn.
  • ROLAP: ROLAP products approach a relational database by using SQL (structured query language), which is the standard language that is used to describe and manipulate data in an RDBMS. Subsequent processing may occur in the RDBMS or within a middle-tier server, which accepts requests from users, translates them into SQL statements, and passes them on to the RDBMS.