ETL (Extraction, Transformation and Loading) Testing
ETL testing is done before data is moved to production data warehouse systems. It is also called as table balancing or product reconciliation. ETL testing is different from Database testing in terms of its scope and the steps followed during this testing.
ETL testing is to ensure that the data which has been loaded from a source to destination after transformation is accurate. It involves the verification of data at various stages, which is used between source and destination.
Process for ETL testing
Like other testing process, ETL testing also go through some testing processes.
ETL testing performed in five stages.
Types of ETL testing
The types of ETL testing are:
1. New Data Warehouse Testing: It is built and verified from the core. In this testing, the input is taken from the customer's requirement and different data sources. However, the new data warehouse is built and verified with the help of ETL tools.
Here are the responsibilities which are played by different groups:
2. Production Validation Testing: This testing is done on data when data is moved to production systems. Informatica Data Validation option provides the automation of ETL testing and management capabilities to ensure that the data do not compromise production systems.
3. Source to Target Testing (Validation): This type of testing is done to validate the data values transformed the expected data values.
4. Application Upgrade: This type of ETL testing is automatically generated, which saves the test development time. This type of testing checks the extracted data from an older application are precisely same as the data in a new application.
5. Metadata Testing: Metadata testing includes the measurement of types of data, length of data, and check index/constraint.
6. Data Accuracy Testing: This testing is done to ensure that the data is accurately loaded and transformed as expected.
7. Data Transformation Testing: Data transformation testing done in many cases. It cannot be achieved by writing one source SQL query and comparing the output with the target. Multiple SQL queries need to be run for each row to verify the transformation rules.
8. Data Quality Testing: Data Quality Tests includes syntax and reference test. To avoid any error due to date or order number during business process data quality is done. Syntax tests: It will report dirty data, based on invalid character, character pattern, incorrect upper or lower case order, etc. Reference Tests: It will check the data according to the data model.
For Example, Customer ID data quality testing includes number check, date check, precision check, date check, etc.
9. Incremental ETL Testing: This testing is done to check the data integrity of old and new data when the new data added. Incremental testing verifies that the system processes correctly even after the insertion and updating the data during an incremental ETL process.
10. GUI/Navigation Testing: This testing is done to check the navigation or GUI aspects of the front end reports.
11. Migration Testing: In this testing, the customer has an existing data warehouse, and ETL is performing the job. But customers are looking for tools to improve efficiency. It includes these steps:
12. Change Requests: In this case, data added to an existing data warehouse. There might be condition arises where customers require to change the present business rule, or they can integrate new rule.
13. Report Testing: The final result of the data warehouse, reported testing. Repots should test by validating the data, layout in the report. Reports are an essential resource for creating vital business decisions.
Tasks performed in ETL Testing
Tasks involved in ETL testing are:
Differences between the ETL and the Database Testing
ETL and database testing involve data validation, but both are not same. ETL testing is usually performed on data in a data warehouse, whereas, database testing is performed on transactional systems. Data comes into the transactional database from different applications.
Operations performed in ETL Testing
ETL testing involves the following operations:
The operation performed in Database Testing
Database testing focuses on data accuracy, the correctness of data, and valid values.
Database testing performs the following operations:
For example, the Number of month's column shouldn't have a value greater than 12.
ETL performance Testing
ETL performance testing is used to ensure if an ETL system can handle an expected load of multiple users and transactions. Performance testing involves server-side workload on the ETL system.
How to perform ETL testing performance?
Here are the following steps which are followed to test the performance of ETL testing:
Step 1: Find the load which transformed in production.
Step 2: New data will be created of the same load or move it from production data to a local server.
Step 3: Now, we will disable the ETL until the required code is generated.
Step 4: We will count the needed data from the database table.
Step 5: We will note down the last run of ETL and enable the ETL. It will get enough stress to transform the entire load which has created and run it.
Step 6: After the completion of ETL, we will count the created data.
Essential performance that should be noted:
Data Accuracy in ETL Testing
In ETL Testing, we focus on data accuracy to ensure whether the data is accurately loaded to the target system as per our expectations.
Here are the steps which should be followed to perform the data accuracy are:
Value Comparison: In value comparison, we compare the data in the source and target system with minimum or no transformation. ETL testing can be possible by using various ETL tools. For example, Source Qualifier Transformation in Informatica.
Expression Transformation can also be performed in data accuracy testing. Set of operators can be used in SQL statements to check the data accuracy in the source and the target systems.
Check the columns of critical data: Critical Data columns can be checked by comparing the distinct values in the source and the target system.
ETL testing in data transformation
It is quite complex to perform the data transformation because it cannot be achieved by writing a single SQL query and comparing the output with the target. To do the ETL testing for Data Transformation, we have to write multiple SQL queries for each row to verify the transformation rules.
To perform the successful ETL testing for data transformation, we have to pick the sufficient and sample data from the source system to apply the transformation rule.
The significant steps to perform ETL testing for data transformation are:
Step 1. The first step is to create a scenario for input data and the expected results. Now we will validate ETL testing with the business customer. ETL testing is the best approach to gather the requirements during designs and can be used as a part of testing.
Step 2. The second step is to create the test data according to the scenario. ETL developer will automate the entire process of populating the datasets with the scenario spreadsheet permit versatility and mobility for the reason that the situation is changed.
Step 3. Utilize the data profiling and the results will compare the range and submission of values in each field between the source and the target data.
Step 4. We will validate the accurate processing of ETL generated fields. For example, Surrogate keys.
Step 5. We will validate the data types within the warehouse that are the same as specified in the data model or design.
Step 6. Scenarios of data will be created between tables which test the referential integrity.
Step 7. We will validate the parent to child relationship in the data.
Step 8. And at the end, we will perform lookup transformation. Lookup query should be straight without any data gathering and expected to return only one value as per the source table. We can directly join the lookup table in the source qualifier. If this is not a case, we will write a query which will join the lookup table with the main table in the source and will compare the data in the corresponding column in the target.
ETL Test Cases
The objective of ETL testing is to assure that the loaded data from source to destination after business transformation is accurate.
ETL testing applies to different tools and databases in the information management industry.
During the ETL testing performance, two documents always used by the ETL tester which are:
1. ETL mapping sheets: ETL mapping sheets contain all the information of the source and destination tables, which includes every column and their lookup in the reference table. ETL tester needs to be comfortable with SQL queries as ETL testing may involve writing big queries with multiple joins to validate the data at any stage of ETL. ETL mapping sheets provide significant help when we write queries for data verification.
2. DB Schema of Source (Target): It should be kept accessible to verify any detail in mapping sheet.
ETL Test Scenarios and Test Cases:
Responsibility of ETL tester
ETL tester is responsible for validating the data sources, applying transformation logic, and loading the data in the target table, extraction of data.
The responsibilities of ETL tester are:
Verify the table in the source system. It involves the following types of operation:
Apply Transformation Logic
Transformation logic is applied before loading the data. It involves the following operations:
Data is loaded from the staging area to the target systems. It involves the following operations:
We will check if the aggregate values and calculated measures loaded in the fact table.
Testing of ETL Tools
ETL testers are required to test the test cases and tools as well. It involves the following operations:
Advantages of ETL Testing
Benefits of ETL testing are given below:
Disadvantages of ETL Testing
Disadvantages of ETL testing are given below:
ETL testers are required to test the test cases and tools as well. It involves the following operation:
Future Scope of ETL Testing
The scope of ETL testing is very bright. ETL tools like Informatica PowerCenter, Oracle Data Integrator, Microsoft SQL server integrated service, SAS, IBM infosphere information server, etc. all are in huge demand in the industry because of its demand. The scope of ETL testing will increase in the future.
ETL testing is a type of business testing in which developers, business analyst, end-users, and DBAs are involved. ETL testing requires the knowledge of SDLC and ETL policies, and the tester should know how to write the SQL queries. Many businesses consider ETL as a challenge, but the fact is that it is beneficial for the business. It is essential to protect the data from loss, and it is necessary to update the data to meet the requirements of the market.