Teradata Compression

Compression reduces the physical size of the stored information. The goal of compression is to represent information accurately using the fewest number of bits.

The Compression methods are either logical or physical. Physical data compression re-encodes information independently of its meaning, and logical data compression substitutes one set of data with another, more compact set.

In Teradata, compression can compress up to 255 distinct values, including NULL. Since the storage is reduced, Teradata can store more records in a block. This results in improved query response time since any input operation can process more rows per block.

Compression can be added at table creation using CREATE TABLE or after table creation using ALTER TABLE command.

Compression has the following essential reasons, such as:

To reduce storage costs.
To enhance system performance.

Compression reduces storage costs by storing more logical data per unit of physical capacity. Compression produces smaller rows, resulting in more rows stored per data block and fewer data blocks.

Compression enhances system performance because there is less physical data to retrieve per row for queries. And compressed data remains compressed while in memory, the FSG cache can hold more rows, reducing the size of disk input.

Rules

Teradata Compression method has the following rules to compress stored data, such as:

Only 255 values can be compressed per column.
The Primary Index column cannot be compressed.
Volatile tables cannot be compressed.

Types of Compression

Teradata Database uses several types of compression.

Database element	Explanation
Column values	The storage of those values one time only in the table header, not in the row itself, and pointing to them using an array of presence bits in the row header. It applies to: Multi-value compression Algorithmic compression We cannot apply either multi-value compression or algorithmic compression to row-level security constraint columns.
Hash and Join indexes	A logical row compression in which multiple sets of non-repeating column values are appended to a single set of repeating column values. This allows the system to store the repeating value set only once, while any non-repeating column values are stored as logical segmental extensions of the base repeating set.
Data blocks	The storage of primary table data, or join or hash index subtable data.
Partition containers	The auto-compression method set determined by Teradata Database to apply to a container of a column-partitioned table or join index when we have not specified the NO AUTO COMPRESS option at the time the object was created.

Multi-Value Compression (MVC)

Multi-value compression (MVC) compresses repeating values in a column when we specify the value in a compression list in the column definition.

When data in the column matches a value specified in the compression list, the database stores it only once in the table header, regardless of how many times it occurs as a field value for the column. The database then stores a smaller substitute value, often as small as 2 bits, in each row where the value occurs.

MVC provides the best cost and benefit ratio compared to other methods because it requires minimal resources to uncompress the data during query processing. We can use MVC for frequently used data without compromising query and load performance.

MVC is also considered the easiest to implement of all the compression methods. MVC is a logical data compression form.

Multi-value compression has the following performance impacts, such as:

It improves table scan response times for most configurations and workloads.
It provides moderate to little CPU savings.

Example

The following example compresses the field Depatment_No for values 1, 2, and 3. When compression is applied on a column, the values for this column are not stored with the row.

Instead, the values are stored in the Table header in each AMP. And only presence bits are added to the row to indicate the value.

CREATE SET TABLE Employee 
( 
   Employee_No integer, 
   First_Name CHAR(10), 
   Last_Name CHAR(10), 
   BirthDate DATE FORMAT 'YYYY-MM-DD-', 
   
   Department_No CHAR(02) COMPRESS(1,2,3) 
) 
UNIQUE PRIMARY INDEX(Employee_No);

The system automatically compresses the specified values when data moves into the table and uncompressed when the containing rows are accessed.

NOTE

The system automatically compresses NULLs when specifying COMPRESS.
And cannot be applying MVC to row-level security columns.

Algorithmic Compression

Teradata Database software includes several standard compression algorithms, in the form of UDFs, which is used to compress many types of data by table column. We can also create custom compression and decompression algorithms in the UDF format.

When column values are unique, then algorithmic compression (ALC) provides better compression results than MVC.

If columns have repeated values, then we can use ALC and MVC concurrently on the same column, but the system does not apply ALC to any value covered by MVC.

NOTE

Specify only one set of compression or decompression UDFs for a particular column.
The system automatically compresses nulls when specifying COMPRESS.

Row Compression

Row compression is a form of logical data compression in which Teradata Database stores a repeating column value set only once.

And other non-repeating column values that belong to that set are stored as logical segmental extensions of the base repeating set. Row compression is a lossless method.

Like multi-value compression, there is no decompression necessary to access row compressed data values.

Block-Level Compression

A data block is a physical unit of input that defines how Teradata handles data when we specify block-level compression (BLC), Teradata stores data blocks in a compressed format to save storage space.

The goals of block-level compression are to save storage space and to reduce disk input bandwidth. Block-level compression can use significantly more CPU to compress and decompress data dynamically. Thus, whether query performance is enhanced with block-level compression depends on whether performance is more limited by disk input bandwidth or CPU usage.

Teradata Database offers software-based and hardware-based BLC. We select the BLC method when we specify the compression algorithm in the Compression Algorithm field in DBS Control.

Hardware-Based Block-Level Compression
Hardware-based BLC creates little or no CPU resource contention with other database operations.
Hardware-based block-level compression provides the same functions as software-based block-level compression.
Software-Based Block-Level Compression
Software-based BLC is designed for use on large volumes of seldom-accessed data. BLC can also achieve relatively good compression on unknown and unclassified data across different column types, including the internal data structures and the row header.
Block-level compression can be enabled at the system level using the DBS Control utility and at the table level using SQL DDL.

Auto Compression

When we create a column-partitioned table or join index, Teradata Database attempts to use one or more methods to compress the data. That inserts into the physical rows of the object unless we specify the NO AUTO COMPRESS option when we create it, or NO AUTO COMPRESS is set as the default.

The process of selecting and applying appropriate compression methods to the physical containers of a column-partitioned table or join index is referred to as Auto Compression.

Auto Compression is most useful for a column partition with a single column and COLUMN format.

Teradata Database Auto Compression only applies on column partitions with COLUMN format, and then if it reduces the size of a container. Or it is set as default with the following requirements, such as:

Minimal CPU resources are required to decompress the data for reading.
Teradata Database does not need to decompress many values to find a single value.

For some values, there are no applicable compression techniques that can reduce the size of the physical row, so Teradata Database does not compress the values for that physical row. Still, the system attempts to compress physical row values using one of the Auto Compression methods available to it.

Teradata Database automatically decompresses any compressed column partition values when we retrieve rows from a column-partitioned table.

Next TopicTeradata Explain

← prev next →