Teradata FastLoad

Teradata FastLoad is used to load a large amount of data in an empty table on a Teradata System or load data from the client to the Database system.

It allows fast loading of a single empty table with no SIs, JIs, HIs, or RI (FKs) or column partitioning.

FastLoad loads large amounts of data into an empty table, with or without a primary index, on Teradata Database. However, row size and the number of columns, more than any other factor, affect performance.

Teradata FastLoad is a command-driven utility which can be invoked either in batch or interactive mode.

Teradata FastLoad utility establishes multiple session i.e., one session per AMP by default, does not use any transient journal during operation, and transfer data using block rather than row to load data into the table. That's why it is so fast.

It loads only one table at a time. If we want to load multiple tables simultaneously, invoke multiple FastLoad jobs. FastLoad can load data from:

Input files on a network-attached workstation.
Disk or tape files from channel attached workstation.
Individual input module (INMOD) routines written to select, validate, and preprocess input data.
Any other device is providing correctly formatted source data.

Teradata FastLoad Prerequisite

There are three key components specified to run any Teradata FastLoad.

1. Log Table: One log table is needed to track the status of every FastLoad session running on the system.

Teradata maintains one table called fastlog under the database SYSADMIN. To use this table, we need to INSERT, DELETE, and UPDATE privilege on this table.

2. Empty Target Table:FastLoad needs one empty target table before inserting data into it. FastLoad does not care about how this has been accomplished.

We can achieve this by deleting the data from the target table or dropping-recreate the target table before starting the loading.

3. Two ERROR Tables:Two error tables are required by FastLoad to capture if an exception occurs during the FastLoad process. These tables will be created automatically.

The First ERROR Table:It is used to capture any translation errors or constraint violation.
For example, a row with column wrong data type would be reported to the first error table.
The Second ERROR Table:It is used for error caused by duplicates values for Unique Primary Index (UPI).
FastLoad will load only one instance for every UPI and stores the duplicate occurrence in the second table. However, if the entire row is a duplicate, FastLoad counts it but does not store the row.

NOTE
Teradata will allow a maximum of 15 FastLoads, Multiloads, or FastExports at the same time. This protects the system from giving all the resources to only load utilities.

Phases of FastLoad

FastLoad divides the whole process into two phases:

Phase 1: Acquisition

This phase's main objective is to transfer the data from the Host computer to the Teradata environment for Access Module Processor (AMP) as quickly as possible.
Teradata Parsing Engine reads the records from the input file and sends a block to each AMP.
Then PE opens a session from the FastLoad client to the AMPs directly. By default, it will create one session per AMP.
One of the client sessions will pack the raw data into 64K blocks.
Then this large block of data will be sent to the AMP randomly without any concern for which AMPs get the data.
After receipt of each data block, every AMP hashes its rows based on the primary index and redistributes them to the proper AMP.
After redistribution of rows, data will be written to an internal Worktable on the AMPs but remain unsorted.
At the end of Phase 1, each AMP has its rows, but they are not in row hash sequence.

Phase 2: Application

The main objective of this phase is to write the data to the actual table space.
Phase 2 starts when FastLoad receives the END LOADING statement.
Each AMP will start sorting of rows in the second error table.
After sorting, these rows will be written to the actual data block permanently in the disk.
Locks on the target table are released, and the error tables are dropped.

Example

Create a text file with the following records and name the file as emp.txt.

202001, James, 1980/01/05, 2010/03/01, 1  
202002, Robert, 1983/03/05, 2010/09/01, 1 
202003, Peter, 1983/04/01, 2009/02/12, 2 
202004, Mike, 1984/11/06, 2014/01/01, 2 
202005, Robert, 1984/12/01, 2015/03/09, 3

Following is a sample FastLoad script to load the above file into the Emp_Stg table.

LOGON 192.168.1.102/dbc,dbc;  
   DATABASE tduser;  
   BEGIN LOADING tduser.Emp_Stg  
      ERRORFILES Emp_ET, Emp_UV  
      CHECKPOINT 10;  
      SET RECORD VARTEXT ",";  
     DEFINE 
         in_EmpId (VARCHAR(10)), 
         in_FirstName (VARCHAR(10)), 
         in_BirthDate (VARCHAR(10)), 
         in_JoinedDate (VARCHAR(10)), 
         in_DepartmentNo (VARCHAR(05)), 
         FILE = emp.txt;
      INSERT INTO Emp_Stg 
    (
         EmpId,
         FirstName,
         BirthDate,
         JoinedDate, 
         DepartmentNo
     ) 

      VALUES 
    (  
         :in_EmpId, 
         :in_FirstName, 
         :in_BirthDate (FORMAT 'YYYY/MM/DD'), 
         :in_JoinedDate (FORMAT 'YYYY/MM/DD'),
         :in_DepartmentNo
      ); 
   END LOADING;  
LOGOFF;

Executing a FastLoad Script

Once the input file emp.txt is created, and the FastLoad script is named as EmpLoad.fl, we can run the FastLoad script using the following command in UNIX and Windows.

Once the above command is executed, the FastLoad script will run and produce the log. In the log, here is the number of records processed by FastLoad and status code.

/* 03:19:14 END LOADING COMPLETE 
   Total Records Read = 5 
   Total Error Table 1 = 0 (Table has been dropped)
   Total Error Table 2 = 0 (Table has been dropped)
   Total Inserts Applied = 5 
   Total Duplicate Rows = 0 
   Start: Sat June 27 05:20:14 2020 
   End: Sat June 27 05:20:15 2020 

/* 05:20:15 Application Phase statistics: 
              Elapsed time: 00:00:01 (in hh:mm:ss) 
0008  LOGOFF; 
/* 05:20:16 Logging off all sessions

Next TopicTeradata MultiLoad

← prev next →