ETL Testing Interview Questions
A list of frequently asked ETL Testing Interview Questions and Answers are given below.
1) What is ETL? Explain it.
ETL stands for Extraction, Transformation, and Loading. It is an essential concept in Data Warehousing systems. There are three basics steps in Data Integration Process. Extraction stands for extracting the data from different data sources such as transactional systems or applications. Transformation stands to apply the conversion rules on data so that it becomes suitable for analytical reporting. Loading process involves, to move the data into the target system, i.e., Data Warehouse.
2) Explain the concept of Extraction, Transformation, and Loading?
Extracted the data from an external source and move it to the data Warehouse pre-processor database.
Transform data task allows point to point generating, modifying, and transforming the data.
In this task, the data is added to the database table in a warehouse.
3) What is the three-layer architecture of an ETL cycle?
The three layers in the ETL are:
4) What is BI?
Business Intelligence is the process for collecting raw business data and transforming it into a meaningful vision that is more useful for business.
5) What are the differences between ETL and BI tools?
6) What are the ETL tools available in the market?
The popular ETL tools available in the market are:
7) When we need the staging area in the ETL process?
Staging area is a central area which is available between the data sources and data warehouse/data marts systems. It is a place where data is stored temporarily in the process of data integration. In the staging, area data is cleansed and checked for any duplication. The staging area is designed to provide many benefits, but the primary goal is to use the staging area. It is used to increase efficiency, ensure the data integrity, and support the data quality operations.
8) What is the difference between the data warehouse and data mining?
Data warehousing is a broad concept as compared to data mining. Data Mining involves extracting the hidden information from the data and interpreting it for future forecasting. In contrast, data warehousing includes operations such as analytical reporting to generate detailed reports and ad-hoc reports, information processing to generate interactive dashboards and charts.
9) What are the differences between data warehousing and data mining?
10) What is a dimension table and how it is different from the fact table?
Here, we are taking an example to describe how the dimension table is distinguishing from the fact table.
Suppose a company sells its products to its customer. Every sale is a fact which occurs within the company, and the fact table is used to record these facts. Each fact table stores the primary key that joins fact table with the dimension table and measures/ facts.
Example: Fact Units
A dimension table which store attributes or dimensions describe the objects in a fact table. It is a set of companion tables to a fact table.
11) What is a Data Mart?
Data Mart is a simple form of Data Warehouse, and it is focused on a single functional area. It gets the only from few sources.
For example: In an organization, data marts may exist for marketing, finance, human resource, and other individual departments which stores the data related to their specific functions.
12) What is the difference between Manual Testing and ETL Testing?
The difference between Manual testing and ETL testing is:
13) What is ETL Testing?
ETL stands for Extraction, Transform, and Loading the information. ETL testing is done to ensure that the data is loaded from different source to destination after the accurately business transformation. It involves data verification at multiple stages that are being used between the source and the destination.
14) What is the responsibility of ETL tester?
The responsibility of ETL Tester is divided into three major categories:
Responsibilities of ETL tester are:
15) What is the need for ETL Testing?
In today's time, we are migrating the lots of system from old technology to new technology. At the time of migration activities, we also need to migrate the data as well from old DBMS to latest DBMS. So there is a lot of need to test whether the data is correct from the target side.
Here, are some important points where the need for ETL testing is arising:
16) Where the user use ETL concepts? Explain it.
Here is the following example where we are using the ETL:
Example: Data Warehousing
ETL is used in data warehousing concept. Here, we need to fetch the data from multiple different systems and loads it in the data warehouse database. ETL concept is used here to extract the data from the source, transform the data, and load it into the target system.
Example: Data Migration
Data migrations are a difficult task if we are using PLSQL. If we want to migrate the data using a simple way, we will use different ETL tools.
Example: Mergers and Acquisitions
In today's time, lots of companies are merging into different MNCs. To move the data from one company to another, the need for ETL concept arises.
17) How we use ETL in third party management?
The big organization always gives different application development to different kind of vendors. A single vendor cannot manage everything. Here we are taking an example of a telecommunication project where billing is handled by one company, and another company manages CRM. If CRM company needs the data from the company, who is managing the billing, now the company will receive the data feed from other company. To load the data from the ETL process is used.
18) How we use ETL in Data Warehousing?
Most commonly, the ETL used in Data Warehousing. User fetches the historical data as well as current data for developing the data warehouse. Data in the data warehouse is the combination of historical data as well as transactional data. Data Source of data warehouse might be different. We need to fetch the data from multiple different systems and load it into a single target system, which is also called a data warehouse.
19) What is the difference between ETL Testing and Database Testing?
The differences between the ETL testing and Database testing are:
20) What are the characteristics of Data Warehouse?
21) What are the types of Data Warehouse systems?
22) What are the steps followed in ETL testing process?
The different steps followed in ETL testing process are:
Step 1. Requirement Analyzing
In this step, we understand the business structure and the requirement.
Step 2. Validation and Test Estimation
An estimation of time and expertise is required in this step.
Step 3. Test Planning and designing the testing environment
This step is based on the validation and test estimation. In this step, the environment of ETL is planned according to the input which is used in the test estimation and worked according to that.
Step 4. Test Data Preparation and Execution
As per the test, data is prepared and executed as per the requirement.
Step 5. Summary Report
On the completion of the test run, a summary report is prepared for concluding and improvising.
23) How is ETL used in Data Migration Project? Explain it.
ETL tools are generally used in Data Migration Project. If any organization is managing the data in Oracle 10g previously, now the organization wants to use SQL server cloud database, then there is a need to move the data from source to target. For this kind of movement, ETL tools are very useful. If we want to write the code for ETL, it is a very time-consuming process. To make this simple, we use ETL tool, which makes the coding simple PL SQL or T- SQL code. So the ETL process is useful in Data Migration Projects.
24) What are the steps followed to choose the ETL process?
It is a very difficult task to choose the ETL tools. To select the correct ETL tool, we need to consider a lot of factors according to the project. To choose the ETL tool for a specific project is a very strategic move, even we need it for a small project.
Here are some points which will help us to choose the ETL tool.
25) What are the ETL bugs?
Here are the following ETL bugs:
26) What is Operation Data Source?
27) What is the data extraction phase in ETL?
Data Extraction is nothing, but it is extracting the data from multiple different sources using ETL tools.
Here are two types of data extraction.
Source System Performance: The extraction strategies of data should not affect the performance of the source system.
28) What are the ETL Tools?
The popular tools are:
1. Enterprise ETL tools
2. Open Source ETL tools
29) What is partitioning in ETL?
Transactions are always needed to be divided for better performance. The same processes are known as Partitioning. It merely makes sure that the server can directly access the sources through multiple connections.
30) What is ETL Pipeline?
ETL Pipeline refers to a set of processes to extract the data from one system, transform it, and load it into some database or data warehouse. ETL pipelines are built for data warehousing applications, which includes both enterprise data warehouse as well as subject-specific data marts. ETL pipelines are also used for data migration solutions. Data warehouse/ business intelligence engineers build ETL pipelines.
31) What is the Data Pipeline?
Data Pipeline refers to any set of processes elements that move data from one system to another. Data Pipeline can be built for any kind of application which uses data to bring the value. It can be used for integrating the data across the applications, build the data-driven web products and carrying out the data mining activities. Data engineers build the data pipeline.
32) What is the staging place in the ETL Testing?
Staging place is the temporary storage area that is used during the data integration process. In this place, data is analyzed carefully for redundancy and duplication.
33) What is ETL mapping sheet? Define its significance.
ETL mapping sheet contains all the necessary information from the source file and stores the details in rows and column. Mapping sheets help in writing the SQL queries to speed up the testing process.
34) What is the transformation in ETL Testing?
35) What is the use of dynamic cache and static cache in transformation?
Dynamic cache is used to update the dimension or master table slowly. The static cache is used in flat files.
36) What is a mapping, Session, Worklet, and Mapplet?
37) What is full load and incremental or refresh load?
Full Load: Full load completely erase the content of one or more tables and reload with fresh data.
Incremental Load: In this, we apply the ongoing changes to one or more table, which is based on a predefined schedule.
38) What are joiner and lookup?
The joiner is used to join two or more tables to retrieve the data from tables.
Lookup is used to check and compare the source table and the target table.
39) What is data purging?
Data Purging is a term that is commonly used to describe the methods which remove and permanently erase the data from a storage space. In other words, it can be defined as deleting the data from the data warehouse is known as data purging. Usually, we have to clean up the junk data like rows which have null values or spaces. Data Purging is the process of cleaning the junk values.
40) What is the difference between ETL tools and OLAP tools?
ETL Tools is meant for extraction the data from the legacy system and load it into the specified database with some process of cleansing data.
For example: Informatica, data stage etc.
OLAP Tools: It is used for reporting purpose in OLAP data available in the multidirectional model. We can write a simple query to extract the data from the database.
Example: Business object, Cognos, etc.