Teradata Statistics

Teradata optimizer gives an execution strategy for every SQL query. This execution strategy is based on the statistics collected on the tables used within the SQL query. Statistics on the table are collected using COLLECT STATISTICS command.

The COLLECT STATISTICS (Optimizer form) statement collects demographic data for one or more columns of a base table, hash index, or join index, computes a statistical profile of the collected data, and stores the synopsis in the Data Dictionary.

The Optimizer uses the synopsis data when it generates its table access and joins plans.

Environment Information

Teradata Statistics environment needs the following:

  • Number of Nodes, AMPs, and CPUs
  • Amount of memory

Data Demographics

Data Demographics consider the following:

  • Number of rows
  • Row size
  • Range of values in the table
  • Number of rows per value
  • Number of Nulls

Usage

We should collect statistics on newly created, empty data tables. An empty collection defines the columns, indexes, and synoptic data structure for loaded groups.

We can easily collect statistics again after the table is populated for prototyping, and back when it is in production.

We can collect statistics in the following ways.

  1. A unique index, which can be:
    • Primary or secondary
    • Single or multiple columns
    • Partitioned or non-partitioned
  2. A non-unique index, which can be:
    • Primary or secondary
    • Single or multiple columns
    • Partitioned or non-partitioned
    • With or without COMPRESS fields
  3. A non-indexed column or set of columns, which can be:
    • Partitioned or non-partitioned
    • With or without COMPRESS fields
  4. A temporary table
    • If we specify the TEMPORARY keyword but a materialized table does not exist, the system first materializes an instance based on the specified column names and indexes.
      This means that after a valid instance is created, we can re-collect statistics on the columns by entering COLLECT STATISTICS and the TEMPORARY keyword without having to specify the desired columns and index.
    • If we omit the TEMPORARY keyword, but the table is temporary, statistics are collected for an empty base table rather than the materialized instance.
  5. Sample (system-selected percentage) of the rows of a data table or index, to detect data skew and dynamically increase the sample size when found.
    • The system does not store both sampled and defined statistics for the same index or column set. Once sampled statistics have been collected, implicit recollection hits the same columns and indexes and operates in the same mode.
  6. Join index
  7. Hash index
  8. NoPI table

How to approach Collect Statistics

There are three approaches to collect statistics on the table.

  • Random AMP Sampling
  • Full statistics collection
  • Using the SAMPLE option

Collecting Statistics on Table

COLLECT STATISTICS command is used to collect statistics on a table.

Syntax

Following is the basic syntax to collect statistics on a table.

Example

Consider an Employee table with the following records, such as:

Emp_IdFirst_NameLast_NameDepartment_No
202001MikeRichard1
202002RobertWilliams2
202003PeterCollin2
202004AlexaStuart1
202005RobertPeterson1

We are going to run the following query to collect statistics for the Emp_Id, First_Name columns of the Employee table.

When the above query is executed, it produces the following output.

Update completed. 2 rows changed.

Viewing Statistics

We can view the collected statistics using the HELP STATISTICS command.

Syntax

Following is the syntax to view the statistics collected.

Example

Following is an example to view the statistics collected on the Employee table.

When the above query is executed, it produces the following table with updated columns and their values.

Date        Time        Unique Values    Column Names
6/2/20     10:05:02      5                  *
6/2/20     10:05:02      5                  Emp_Id, First_Name





Latest Courses