Teradata Partitioned Primary Index

Partitioned Primary Index (PPI) is one of the powerful features of Teradata, which allows a user to access a part of the table rather than full table access. PPI is an indexing mechanism that is useful in improving the performance of specific queries. PPI works the same as Primary Index for the data distribution but creates partitions according to range or case as specified in the table.

A partitioned primary index (PPI) allows rows to be assigned to user-defined data partitions on the AMPs, enabling enhanced performance for range queries predicated on primary index values.

When rows are inserted into a table, they are stored in an AMP and arranged by their row hash order. And when a table is defined with PPI, then the rows are sorted by their partition number.

Within each partition, they are arranged by their row hash. Rows are assigned to a partition based on the partition expression defined. Below are the following significant points of the PPI, such as:

PPI is used to improve performance for large tables when we submit queries that specify a range constraint.
PPI allows reducing the number of rows to be processed by using partition elimination.
PPI will increase performance for incremental data loads, deletes, and data access when working with large tables with range constraints

Commonly queried rows are stored together in partitions so that we can restrict data retrieval to those partitions that contain the rows required by your SQL statement. Other partitions that do not provide requested rows are not read.

Incidentally, other database systems, such as Netezza, go the other way round by storing the information about the data blocks where the requested rows are not stored. Netezza called this method "Data Skipping".

Partitioning will not help to avoid table scans, but it will reduce the number of data blocks that each AMP must move to its local memory (FSG cache) because only relevant partitions need to be accessed.

Characteristics of PPI

Here are some characteristics of the Partitioned Primary Index, such as:

Partitioning can be done on volatile, global temporary, and permanent tables.
The partitioning column doesn't have to be part of the non-unique Primary Index.
The partitioning column must be part of a unique Primary Index.
The partitioned column will not decide which AMP data should go and dependent on the Primary Index.
Rows within AMP will be sorted by partitioned column and not by row Id.
PPI can be defined at multi-level.
Teradata uses a partition elimination strategy to avoid a Full Table Scan (FTS) for partitioned tables.
Partitioned tables may reduce input for range queries by reducing the number of rows to be processed.
Details about the partitions, constraints & conditions can be retrieved from dbc.PartitioningConstraintsV view.

Advantages of PPI

Following are the advantages of the Partitioned Primary Index:

It avoids a full table scan for specific queries.
PPI does not use a secondary index that requires additional physical structure and other input maintenance.
It can access a subset of a large table quickly.
PPI drop the old data quickly and add new data.
It can be created on a global temp table, volatile table, and non-compressed join indexes.
For range-based queries, we can remove SIs and use PPI, which will reduce the space of an overhead SI Subtable.

Disadvantages

Partitioned Primary Index also has the following disadvantages, such as:

Partitioning can make a single row or primary index accesses to the table slower if the partitioning column is not a part of the primary index column.
An additional of 2 bytes is added to each row and hence increases the perm space.
While joining a PPI table with a non-partitioned table takes a long time.
Access other than the PPI column takes more time.

Types of PPI

There are four types of Partitioned Primary Index:

Case partitioning
Range-based partitioning
Multi-level partitioning
Character-based partitioning

Example

Consider the following Orders table with Primary Index on OrderNo.

StoreNo	OrderNo	OrderDate	OrderTotal
2001	75	2017-11-02	900
2001	99	2017-12-10	1,200
2002	77	2017-06-09	3,000
2002	18	2017-09-05	2,454
2001	71	2017-11-03	1201
2003	70	2017-18-04	2,454
2001	80	2017-10-08	1201
2001	65	2017-15-05	1201

Assume that the records are distributed between AMPs, as shown in the following tables. Recorded are stored in AMPs, sorted based on their row hash.

AMP 1

RowHash	OrderNo	OrderDate
1	71	2017-11-03
2	18	2017-09-05
3	75	2017-11-02
4	65	2017-15-05

AMP 2

RowHash	OrderNo	OrderDate
1	80	2017-10-08
2	99	2017-12-10
3	70	2017-18-04
4	77	2017-06-09

If we run a query to extract the orders for a particular date, then the optimizer may choose to use Full Table Scan, then all the records within the AMP may be accessed. We define the order date as Partitioned Primary Index to avoid the full table scan. When rows are inserted into the orders table, they are partitioned by the order date. Within each partition, they will be ordered by their row hash.

The following data shows how the records will be stored in AMPs if they are partitioned by Order Date. If a query is run to access the records by Order Date, only the partition that contains the records for that particular order will be obtained.

AMP 1

Partition	RowHash	OrderNo	OrderDate
0	3	75	2017-11-02
1	1	71	2017-11-03
1	2	18	2017-09-05
2	4	65	2017-15-05

AMP 2

Partition	RowHash	OrderNo	OrderDate
0	2	99	2017-12-10
0	4	77	2017-06-09
1	3	70	2017-18-04
2	1	80	2017-10-08

Following is an example of creating a table with a partition primary Index. PARTITION BY clause is used to define the partition.

CREATE SET TABLE Orders 
(
   StoreNo SMALLINT, 
   OrderNo INTEGER, 
   OrderDate DATE FORMAT 'YYYY-MM-DD', 
   OrderTotal INTEGER 
) 
PRIMARY INDEX(OrderNo) 
PARTITION BY RANGE_N  
(
   OrderDate BETWEEN DATE '2015-01-01' AND '2020-12-31' EACH INTERVAL '1' DAY
);

In the above example, the table is partitioned by the OrderDate column. There will be one separate partition for each day.

Next TopicTeradata Secondary Indexes

← prev next →