Teradata Partitioned Primary Index
Partitioned Primary Index (PPI) is one of the powerful features of Teradata, which allows a user to access a part of the table rather than full table access. PPI is an indexing mechanism that is useful in improving the performance of specific queries. PPI works the same as Primary Index for the data distribution but creates partitions according to range or case as specified in the table.
A partitioned primary index (PPI) allows rows to be assigned to user-defined data partitions on the AMPs, enabling enhanced performance for range queries predicated on primary index values.
When rows are inserted into a table, they are stored in an AMP and arranged by their row hash order. And when a table is defined with PPI, then the rows are sorted by their partition number.
Within each partition, they are arranged by their row hash. Rows are assigned to a partition based on the partition expression defined. Below are the following significant points of the PPI, such as:
Commonly queried rows are stored together in partitions so that we can restrict data retrieval to those partitions that contain the rows required by your SQL statement. Other partitions that do not provide requested rows are not read.
Incidentally, other database systems, such as Netezza, go the other way round by storing the information about the data blocks where the requested rows are not stored. Netezza called this method "Data Skipping".
Partitioning will not help to avoid table scans, but it will reduce the number of data blocks that each AMP must move to its local memory (FSG cache) because only relevant partitions need to be accessed.
Characteristics of PPI
Here are some characteristics of the Partitioned Primary Index, such as:
Advantages of PPI
Following are the advantages of the Partitioned Primary Index:
Partitioned Primary Index also has the following disadvantages, such as:
Types of PPI
There are four types of Partitioned Primary Index:
Consider the following Orders table with Primary Index on OrderNo.
Assume that the records are distributed between AMPs, as shown in the following tables. Recorded are stored in AMPs, sorted based on their row hash.
If we run a query to extract the orders for a particular date, then the optimizer may choose to use Full Table Scan, then all the records within the AMP may be accessed. We define the order date as Partitioned Primary Index to avoid the full table scan. When rows are inserted into the orders table, they are partitioned by the order date. Within each partition, they will be ordered by their row hash.
The following data shows how the records will be stored in AMPs if they are partitioned by Order Date. If a query is run to access the records by Order Date, only the partition that contains the records for that particular order will be obtained.
Following is an example of creating a table with a partition primary Index. PARTITION BY clause is used to define the partition.
In the above example, the table is partitioned by the OrderDate column. There will be one separate partition for each day.