Teradata Join Index
The Teradata Join index stores pre-joined tables. Join indexes are comparable to materialized views.
Join indexes are designed to permit queries to be resolved by accessing the index instead of accessing, and possibly joining, their underlying base tables.
The join index contains one or more tables and also provide pre-aggregated data. Join indexes are mainly used for improving performance.
The join indexes are an index structure. They have to be considered more as an added layer of pre-aggregated, pre-joined, or permanent tables with its row-level partitioning, Primary Index, and Secondary Indexes.
Even if a join index does not entirely cover a query, the optimizer can use it to join its underlying base tables in a way that provides better query optimization than scanning the base tables for all the columns specified in the request.
Join indexes are useful for queries that aggregate columns from tables with large cardinalities. These indexes play the role of pre-join and summary tables without denormalizing the logical design of the database. At the same time, indeed, denormalization often enhances the performance of a particular query or queries.
There are three types of join indexes available in Teradata.
Here are some essential rules of the Join Index, such as:
Single Table Join Index
Single-Table join indexes are created from exactly one base table. Their purpose is to have the same table available with a different primary index, partitioning, or smaller table (the join index table) with fewer columns to be spooled.
This improves the performance of joins as no distribution or duplication is needed. The user will query on the base table, but PE will decide whether to access the base table or single table join index.
Following is the syntax of a JOIN INDEX.
Consider the following Employee and Salary tables.
Following is an example of creating a Join index named Employee_JI on the Employee table.
If the user submits a query with a WHERE clause on Employee_Id, then the system will query the Employee table using the unique primary index.
If the user queries the employee table using employee_name, then the system can access the join index Employee_JI using employee_name.
The rows of the join index are hashed on the employee_name column. If the join index and the employee_name are not defined as a secondary index, then the system will perform a full table scan to access the rows, which is time-consuming.
We can run the following EXPLAIN plan and verify the optimizer plan. In the following example, the optimizer uses the Join Index instead of the base Employee table when the table queries using the Employee_Name column.
Multi-Table Join Index
A multi-table join index is created by joining more than one table. A multi-table join index can be used to store the result set of frequently joined tables to improve the performance.
A multi-table join index is used to hold a pre-join result set from the two or more columns. So during join processing, PE may decide to access data from the Multi-table join index rather than joining again underlying base tables. We need to remember that we should define a multi-table join index after lots of analysis based on the frequency and cost of joining.
Multi-Table Join Indexes allow us to move resource-intensive joins from the online to the batch window.
Shifting the workload does not reduce the total workload, but it turns it to a point in time that is beneficial for the system's overall performance.
The following example creates a JOIN INDEX named Employee_Salary_JI by joining Employee and Salary tables.
When the base tables Employee or Salary are updated, the Join index Employee_Salary_JI is also automatically updated.
If we are running a query joining Employee and Salary tables, then the optimizer may choose to access the data from Employee_Salary_JI directly instead of joining the tables. EXPLAIN plan on the query can be used to verify if the optimizer will choose the base table or Join index.
Aggregate Join Index
Aggregate join indexes method is used to resolving queries that frequently specify the same aggregate operations on the same column or columns.
When aggregate join indexes are available, then the system does not repeat aggregate calculations for every query. An aggregate join index can be defined on two or more tables or a single table.
If a table is consistently aggregated on specific columns, then the aggregate join index can be defined on the table to improve the performance. One limitation of the aggregate join index is that it supports only the following functions.
In the following example, the Employee and Salary are joined to identify the total salary per Department.