Javatpoint Logo
Javatpoint Logo

Teradata Partitioned Primary Index

Partitioned Primary Index (PPI) is one of the powerful features of Teradata, which allows a user to access a part of the table rather than full table access. PPI is an indexing mechanism that is useful in improving the performance of specific queries. PPI works the same as Primary Index for the data distribution but creates partitions according to range or case as specified in the table.

A partitioned primary index (PPI) allows rows to be assigned to user-defined data partitions on the AMPs, enabling enhanced performance for range queries predicated on primary index values.

When rows are inserted into a table, they are stored in an AMP and arranged by their row hash order. And when a table is defined with PPI, then the rows are sorted by their partition number.

Within each partition, they are arranged by their row hash. Rows are assigned to a partition based on the partition expression defined. Below are the following significant points of the PPI, such as:

  • PPI is used to improve performance for large tables when we submit queries that specify a range constraint.
  • PPI allows reducing the number of rows to be processed by using partition elimination.
  • PPI will increase performance for incremental data loads, deletes, and data access when working with large tables with range constraints

Commonly queried rows are stored together in partitions so that we can restrict data retrieval to those partitions that contain the rows required by your SQL statement. Other partitions that do not provide requested rows are not read.

Incidentally, other database systems, such as Netezza, go the other way round by storing the information about the data blocks where the requested rows are not stored. Netezza called this method "Data Skipping".

Partitioning will not help to avoid table scans, but it will reduce the number of data blocks that each AMP must move to its local memory (FSG cache) because only relevant partitions need to be accessed.

Characteristics of PPI

Here are some characteristics of the Partitioned Primary Index, such as:

  1. Partitioning can be done on volatile, global temporary, and permanent tables.
  2. The partitioning column doesn't have to be part of the non-unique Primary Index.
  3. The partitioning column must be part of a unique Primary Index.
  4. The partitioned column will not decide which AMP data should go and dependent on the Primary Index.
  5. Rows within AMP will be sorted by partitioned column and not by row Id.
  6. PPI can be defined at multi-level.
  7. Teradata uses a partition elimination strategy to avoid a Full Table Scan (FTS) for partitioned tables.
  8. Partitioned tables may reduce input for range queries by reducing the number of rows to be processed.
  9. Details about the partitions, constraints & conditions can be retrieved from dbc.PartitioningConstraintsV view.

Advantages of PPI

Following are the advantages of the Partitioned Primary Index:

  • It avoids a full table scan for specific queries.
  • PPI does not use a secondary index that requires additional physical structure and other input maintenance.
  • It can access a subset of a large table quickly.
  • PPI drop the old data quickly and add new data.
  • It can be created on a global temp table, volatile table, and non-compressed join indexes.
  • For range-based queries, we can remove SIs and use PPI, which will reduce the space of an overhead SI Subtable.

Disadvantages

Partitioned Primary Index also has the following disadvantages, such as:

  • Partitioning can make a single row or primary index accesses to the table slower if the partitioning column is not a part of the primary index column.
  • An additional of 2 bytes is added to each row and hence increases the perm space.
  • While joining a PPI table with a non-partitioned table takes a long time.
  • Access other than the PPI column takes more time.

Types of PPI

There are four types of Partitioned Primary Index:

  1. Case partitioning
  2. Range-based partitioning
  3. Multi-level partitioning
  4. Character-based partitioning

Example

Consider the following Orders table with Primary Index on OrderNo.

StoreNo OrderNo OrderDate OrderTotal
2001 75 2017-11-02 900
2001 99 2017-12-10 1,200
2002 77 2017-06-09 3,000
2002 18 2017-09-05 2,454
2001 71 2017-11-03 1201
2003 70 2017-18-04 2,454
2001 80 2017-10-08 1201
2001 65 2017-15-05 1201

Assume that the records are distributed between AMPs, as shown in the following tables. Recorded are stored in AMPs, sorted based on their row hash.

AMP 1

RowHash OrderNo OrderDate
1 71 2017-11-03
2 18 2017-09-05
3 75 2017-11-02
4 65 2017-15-05

AMP 2

RowHash OrderNo OrderDate
1 80 2017-10-08
2 99 2017-12-10
3 70 2017-18-04
4 77 2017-06-09

If we run a query to extract the orders for a particular date, then the optimizer may choose to use Full Table Scan, then all the records within the AMP may be accessed. We define the order date as Partitioned Primary Index to avoid the full table scan. When rows are inserted into the orders table, they are partitioned by the order date. Within each partition, they will be ordered by their row hash.

The following data shows how the records will be stored in AMPs if they are partitioned by Order Date. If a query is run to access the records by Order Date, only the partition that contains the records for that particular order will be obtained.

AMP 1

Partition RowHash OrderNo OrderDate
0 3 75 2017-11-02
1 1 71 2017-11-03
1 2 18 2017-09-05
2 4 65 2017-15-05

AMP 2

Partition RowHash OrderNo OrderDate
0 2 99 2017-12-10
0 4 77 2017-06-09
1 3 70 2017-18-04
2 1 80 2017-10-08

Following is an example of creating a table with a partition primary Index. PARTITION BY clause is used to define the partition.

In the above example, the table is partitioned by the OrderDate column. There will be one separate partition for each day.






Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA