Talend Data Integration Components and Connectors:

In this section, we are going to learn about the data integration components and connectors, which are used while creating a job.

The Connectors and components perform all the operations in Talend, and it provides 800+ connectors, and components to perform multiple actions.

The components are available in the palette panel, and there are 21 main categories, which belong to the components.

By doing drag and drop in the designer panel, we can choose the connectors, and it automatically creates the java code.

After that, save the Talend code and execute it.

We are showing a list of the components available in the palette panel in the below image,

Talend Data Integration Components and Connectors

The above list is widely used as the connectors and components for the Talend data integration.

Let us see some commonly used components for the data integration in Talend studio,

Components for Data Integration	Description
tMysqlConnection	It is used to connect the MySQL database, which is defined in the component.
tMysqlInput	It is used to run the database query to read a database and extract fields (tables, views, etc.) depending on the query.
tMysqlOutput	It is used to write, update, and modify data in the MySQL database.
tFileInputDelimited	It reads a delimited file row by row and divides them into separate fields, and passes it to the next component.
tFileOutputDelimited	It is used to get the output from the input data in a delimited file based on the defined schema.
tFileInputExcel	It reads an excel file row by row and divides them into separate fields, and passes it to the next component.
tFileOutputExcel	It is used to write an MS Excel file with different data values based on a defined schema.
tFileList	It is used to get all the files and directories from a given file mask pattern.
tFileArchive	It is used to compress a set of files or folders into a zip, gzip, or tar.gz archive file.
tRowGenerator	It provides an editor where we can write functions or choose expressions to generate our sample data.
tMsgBox	It returns a dialog box with the message specified and an OK button.
tLogRow	It is used to monitor the data which is getting processed. And it always displays data/output in the run console.
tPreJob	It defines the sub-jobs that will run before our actual job started.
tMap	tMap is used to transform and route the data from single or multiple sources to single and various destinations.
tJoin	It is used to join two tables by performing inner and outer joins between the main flow and the lookup flow.
tJava	It enables you to use personalized java code in the Talend program.
tRunJob	It is used to manage the complex job systems by running one Talend job after another.
tCloudStart	It is used to start instances on AmazonEC2(Amazon Elastic Compute Cloud)
tCloudStop	It is used to change the status of a launched instance on Amazon EC2(Amazon Elastic Compute Cloud)
tDotNETInstantiate	It is used to invoke the constructor of a .NET object, which is intended for later reuse.
tDotNETRow	It helps us to transform the data by utilizing the custom or built-in.NET classes.
tDB2Connection	It is used to open a connection in a specified database, which can be reused in the subsequent subjob or subjobs.
tFileFetch	It is used to retrieve a file through the given protocol (HTTP, HTTPS, FTP, or SMB).
tFTPClose	It helps us to close an active FTP connection to release the taken resources.
tFTPConnection	It is used to open the FTP connection to transfer the file in a single transaction.
tFTPDelete	It is used to delete the files or folders in a specified directory on the FTP server.
tFileInputJSON	It is used to extract JSON data from a file and transfer the data to a file, database table, etc.
tFileOutputJSON	It helps us to receive the data and rewrites it in a JSON structured data block in an output file.
tFileInputXML	It reads the XML structure related file row by row and breaks them up into fields and sends those fields, which is defined in the schema for the next component.
tFileOutputXML	It writes an XML file with separated data values based on a defined schema.
tReplicate	It is used to duplicate the incoming schema into two identical output flows.

Connectors:

Row
Iterate
Triggers
Link

Row:

The row connector is used to maintain the actual data flow, some of the following row connectors are as below,

Main
Lookup
Filter
Rejects
ErrorRejects
Output
Unique/duplicates
Multiple input/output

Main:

The most commonly used row connection is Main because it helps to pass on the data flows from one component to the other and iterate on each row or reading input data based on the component properties setting.

Note:
We cannot connect two input components with the help of the Main row connection.
One incoming Row connection is possible per component because we will not be able to link twice the same target component using the Main row connection.

The second-row connection will be called as Lookup.

For connecting the two-component with the help of Main row connection,

Right-click on the input component, and select Row → Main on the connection list as we can see in the below image,

Or,

We can click on the component to highlight it, then right-click it or click on the O icon, which is visible on the side of it, then drag the cursor towards the destination component, which automatically creates a Row → Main type of connection.

Lookup:

The Lookup row connection is used when we want to connect multiple input flows.

It is a sub-flow component of the main flow component, which means that it is allowed to receive more than one incoming flows.

For connecting the lookup row connection, right-click on the row which needs to be changed and one popup menu will open, then click on the Set this connection as Main to turn the lookup row into the main row, as we can see in the below image,

Filter:

The filter row connection is used to connect the tFilterRow component specifically to an output component. It is used to collect the data matching for the filtering criteria.

Rejects:

The Rejects row connection is used to connect processing components to the output component.

It is used to collect the data, which does not match the filter or not valid for the expected output.

It also allows us to track the data which cannot be processed for reasons like the wrong type, undefined null value, etc. on some components.

When the Die on error option is deactivated, the reject connection got enabled.

ErrorRejects:

The ErrorRejects connection is used to connect the tMap components to the output component.

It is enabled when we clear the Die on Error checkbox in the tMap editor, and it collects data, which cannot be processed on some components.

Output:

The output row connection is used to connect a tMap component to one or more output components.

Unique/Duplicate:

The unique/duplicate row connection is used for connecting a tUniqRow to the output components.

The Unique row connection is used to collect the rows, which are found first in the incoming flow, and this flow of unique data is directed to the related output component or else to another processing subjob.

The Duplicate row connection is used to collect the possible duplicate of the first related rows.

Multiple input/output:

This type of row connection is used to handle the data through various inputs and outputs.

Combine:

A combine row connection is used to connect one CombinedSQL component to another.

Iterate:

To perform a loop on files contained in a directory, rows available in a file or the database entries is done by iterate connectors.

It is mainly used to connect the star component of flow (in a subjob).

Triggers:

The trigger connectors are used to create a dependency between jobs and Subjob, which are triggered one after the other according to the trigger's nature.

There are two types of triggers available in Talend:

Subjob triggers
Component triggers

Subjob triggers	Description
OnSubjobOK	It is used to trigger the next subjob on the condition where the subjob is completed without any error.
OnSubjobError	It is used to trigger the next subjob when the first (Main) subjob is not completed correctly.
Run if	It triggers a subjob or a component when the condition is met.

Component triggers	description
OnComponentOk	This type of connection is used to trigger the target component once the execution of the source component is completed without any error.
OnComponentError	It will trigger the subjob or a component as soon as an error is encountered in the primary job.

Link:

The link connector is used only with ETL components. This type of connection does not handle the actual data but only the metadata, which concerns the operating table.

Next TopicWorking with Projects

← prev next →