Teradata Join Index

The Teradata Join index stores pre-joined tables. Join indexes are comparable to materialized views.

Join indexes are designed to permit queries to be resolved by accessing the index instead of accessing, and possibly joining, their underlying base tables.

The join index contains one or more tables and also provide pre-aggregated data. Join indexes are mainly used for improving performance.

The join indexes are an index structure. They have to be considered more as an added layer of pre-aggregated, pre-joined, or permanent tables with its row-level partitioning, Primary Index, and Secondary Indexes.

Even if a join index does not entirely cover a query, the optimizer can use it to join its underlying base tables in a way that provides better query optimization than scanning the base tables for all the columns specified in the request.

Join indexes are useful for queries that aggregate columns from tables with large cardinalities. These indexes play the role of pre-join and summary tables without denormalizing the logical design of the database. At the same time, indeed, denormalization often enhances the performance of a particular query or queries.

There are three types of join indexes available in Teradata.

Single Table Join Index (STJI)
Multi-Table Join Index (MTJI)
Aggregate Join Index (AJI)

Rules

Here are some essential rules of the Join Index, such as:

The Usage of the FULL OUTER JOIN is not allowed.
At least one non-null column must be selected on the inner side of the LEFT or RIGHT JOIN.
OUTER JOIN is preferable for Join Index usage likelihood but not allowed for the Aggregate Join Index.
HAVING and QUALIFY keywords are not allowed.
Set Operations are not allowed, such as UNION, INTERSECT, and MINUS.
No Subqueries are allowed.
Inequality conditions in ON clauses of join index definitions are not allowed.
Only <, <=, >=, > comparison operators are allowed.
During the restoration of a base table or database, the join index is marked as invalid.
A maximum of 64 columns per table per Join Index is allowed.
The maximum of 32 Indexes can be defined on a table, including join indexes.
UPI only allowed for a single table join index.
Value ordered NUSI and Partition Expression on the same Join Index columns are not allowed.

Single Table Join Index

Single-Table join indexes are created from exactly one base table. Their purpose is to have the same table available with a different primary index, partitioning, or smaller table (the join index table) with fewer columns to be spooled.

This improves the performance of joins as no distribution or duplication is needed. The user will query on the base table, but PE will decide whether to access the base table or single table join index.

Syntax

Following is the syntax of a JOIN INDEX.

CREATE JOIN INDEX <index name> 
AS 
<SELECT Query> 
<Index Definition>;

Example

Consider the following Employee and Salary tables.

CREATE SET TABLE EMPLOYEE, FALLBACK 
( 
   Employee_Id INTEGER, 
   First_Name VARCHAR(10), 
   Last_Name VARCHAR(10), 
   DOB DATE FORMAT 'YYYY-MM-DD', 
   
   DepartmentNo BYTEINT 
) 
UNIQUE PRIMARY INDEX (Employee_Id);

CREATE SET TABLE SALARY, FALLBACK 
( 
   Employee_Id INTEGER, 
   Gross INTEGER, 
   Deduction INTEGER, 
   NetPay INTEGER 
) 
PRIMARY INDEX (Employee_Id) 
UNIQUE INDEX (Employee_Id);

Following is an example of creating a Join index named Employee_JI on the Employee table.

CREATE JOIN INDEX Employee_JI 
AS 
SELECT Employee_Id, First_Name, Last_Name, 
BirthDate, DepartmentNo 
FROM Employee 
PRIMARY INDEX (First_Name);

If the user submits a query with a WHERE clause on Employee_Id, then the system will query the Employee table using the unique primary index.

If the user queries the employee table using employee_name, then the system can access the join index Employee_JI using employee_name.

The rows of the join index are hashed on the employee_name column. If the join index and the employee_name are not defined as a secondary index, then the system will perform a full table scan to access the rows, which is time-consuming.

We can run the following EXPLAIN plan and verify the optimizer plan. In the following example, the optimizer uses the Join Index instead of the base Employee table when the table queries using the Employee_Name column.

EXPLAIN SELECT * FROM EMPLOYEE WHERE First_Name='Alexa'; 

       /* Help information returned. 8 rows. 
       /* Total elapsed time was 1 second. 

Explanation

1.	First, we do a single-AMP RETRIEVE operation from EMPLOYEE_JI by using the primary index "EMPLOYEE_JI.First_Name = 'Alexa' with no residual conditions into Spool 1 (one-amp) built locally on that AMP. The size of Spool 1 is estimated with low confidence to be 2 rows (232 bytes). The estimated time for this step is 0.01 seconds.

//The contents of Spool 1 are sent back to the user as the result of statement 1.  //The total estimated time is 0.01 seconds.

Multi-Table Join Index

A multi-table join index is created by joining more than one table. A multi-table join index can be used to store the result set of frequently joined tables to improve the performance.

A multi-table join index is used to hold a pre-join result set from the two or more columns. So during join processing, PE may decide to access data from the Multi-table join index rather than joining again underlying base tables. We need to remember that we should define a multi-table join index after lots of analysis based on the frequency and cost of joining.

Multi-Table Join Indexes allow us to move resource-intensive joins from the online to the batch window.

Shifting the workload does not reduce the total workload, but it turns it to a point in time that is beneficial for the system's overall performance.

Example

The following example creates a JOIN INDEX named Employee_Salary_JI by joining Employee and Salary tables.

CREATE JOIN INDEX Employee_Salary_JI 
AS 
SELECT a.Employee_Id,a.First_Name,a.Last_Name, 
a.BirthDate,a.DepartmentNo,b.Gross,b.Deduction,b.NetPay 
FROM Employee a 
INNER JOIN Salary b 
ON(a.Employee_Id = b.Employee_Id) 
PRIMARY INDEX(First_Name);

When the base tables Employee or Salary are updated, the Join index Employee_Salary_JI is also automatically updated.

If we are running a query joining Employee and Salary tables, then the optimizer may choose to access the data from Employee_Salary_JI directly instead of joining the tables. EXPLAIN plan on the query can be used to verify if the optimizer will choose the base table or Join index.

Aggregate Join Index

Aggregate join indexes method is used to resolving queries that frequently specify the same aggregate operations on the same column or columns.

When aggregate join indexes are available, then the system does not repeat aggregate calculations for every query. An aggregate join index can be defined on two or more tables or a single table.

If a table is consistently aggregated on specific columns, then the aggregate join index can be defined on the table to improve the performance. One limitation of the aggregate join index is that it supports only the following functions.

SUM function
COUNT function
GROUP BY clause

Example

In the following example, the Employee and Salary are joined to identify the total salary per Department.

CREATE JOIN INDEX Employee_Salary_JI 
AS 
SELECT a.Department_Id,SUM(b.NetPay) AS TotalPay 
FROM Employee a 
INNER JOIN Salary b 
ON(a.Employee_Id = b.Employee_Id)
GROUP BY a.DepartmentNo 
Primary Index(DepartmentNo); 

Next TopicPartitioned Primary Index

← prev next →