Compression reduces the physical size of the stored information. The goal of compression is to represent information accurately using the fewest number of bits.
The Compression methods are either logical or physical. Physical data compression re-encodes information independently of its meaning, and logical data compression substitutes one set of data with another, more compact set.
In Teradata, compression can compress up to 255 distinct values, including NULL. Since the storage is reduced, Teradata can store more records in a block. This results in improved query response time since any input operation can process more rows per block.
Compression can be added at table creation using CREATE TABLE or after table creation using ALTER TABLE command.
Compression has the following essential reasons, such as:
Compression reduces storage costs by storing more logical data per unit of physical capacity. Compression produces smaller rows, resulting in more rows stored per data block and fewer data blocks.
Compression enhances system performance because there is less physical data to retrieve per row for queries. And compressed data remains compressed while in memory, the FSG cache can hold more rows, reducing the size of disk input.
Teradata Compression method has the following rules to compress stored data, such as:
Types of Compression
Teradata Database uses several types of compression.
Multi-Value Compression (MVC)
Multi-value compression (MVC) compresses repeating values in a column when we specify the value in a compression list in the column definition.
When data in the column matches a value specified in the compression list, the database stores it only once in the table header, regardless of how many times it occurs as a field value for the column. The database then stores a smaller substitute value, often as small as 2 bits, in each row where the value occurs.
MVC provides the best cost and benefit ratio compared to other methods because it requires minimal resources to uncompress the data during query processing. We can use MVC for frequently used data without compromising query and load performance.
MVC is also considered the easiest to implement of all the compression methods. MVC is a logical data compression form.
Multi-value compression has the following performance impacts, such as:
The following example compresses the field Depatment_No for values 1, 2, and 3. When compression is applied on a column, the values for this column are not stored with the row.
Instead, the values are stored in the Table header in each AMP. And only presence bits are added to the row to indicate the value.
The system automatically compresses the specified values when data moves into the table and uncompressed when the containing rows are accessed.
Teradata Database software includes several standard compression algorithms, in the form of UDFs, which is used to compress many types of data by table column. We can also create custom compression and decompression algorithms in the UDF format.
When column values are unique, then algorithmic compression (ALC) provides better compression results than MVC.
If columns have repeated values, then we can use ALC and MVC concurrently on the same column, but the system does not apply ALC to any value covered by MVC.
Row compression is a form of logical data compression in which Teradata Database stores a repeating column value set only once.
And other non-repeating column values that belong to that set are stored as logical segmental extensions of the base repeating set. Row compression is a lossless method.
Like multi-value compression, there is no decompression necessary to access row compressed data values.
A data block is a physical unit of input that defines how Teradata handles data when we specify block-level compression (BLC), Teradata stores data blocks in a compressed format to save storage space.
The goals of block-level compression are to save storage space and to reduce disk input bandwidth. Block-level compression can use significantly more CPU to compress and decompress data dynamically. Thus, whether query performance is enhanced with block-level compression depends on whether performance is more limited by disk input bandwidth or CPU usage.
Teradata Database offers software-based and hardware-based BLC. We select the BLC method when we specify the compression algorithm in the Compression Algorithm field in DBS Control.
When we create a column-partitioned table or join index, Teradata Database attempts to use one or more methods to compress the data. That inserts into the physical rows of the object unless we specify the NO AUTO COMPRESS option when we create it, or NO AUTO COMPRESS is set as the default.
The process of selecting and applying appropriate compression methods to the physical containers of a column-partitioned table or join index is referred to as Auto Compression.
Auto Compression is most useful for a column partition with a single column and COLUMN format.
Teradata Database Auto Compression only applies on column partitions with COLUMN format, and then if it reduces the size of a container. Or it is set as default with the following requirements, such as:
For some values, there are no applicable compression techniques that can reduce the size of the physical row, so Teradata Database does not compress the values for that physical row. Still, the system attempts to compress physical row values using one of the Auto Compression methods available to it.
Teradata Database automatically decompresses any compressed column partition values when we retrieve rows from a column-partitioned table.