Javatpoint Logo
Javatpoint Logo

Talend Data Integration Components and Connectors:

In this section, we are going to learn about the data integration components and connectors, which are used while creating a job.

The Connectors and components perform all the operations in Talend, and it provides 800+ connectors, and components to perform multiple actions.

The components are available in the palette panel, and there are 21 main categories, which belong to the components.

By doing drag and drop in the designer panel, we can choose the connectors, and it automatically creates the java code.

After that, save the Talend code and execute it.

We are showing a list of the components available in the palette panel in the below image,

Talend Data Integration Components and Connectors

The above list is widely used as the connectors and components for the Talend data integration.

Let us see some commonly used components for the data integration in Talend studio,

Components for Data Integration Description
tMysqlConnection It is used to connect the MySQL database, which is defined in the component.
tMysqlInput It is used to run the database query to read a database and extract fields (tables, views, etc.) depending on the query.
tMysqlOutput It is used to write, update, and modify data in the MySQL database.
tFileInputDelimited It reads a delimited file row by row and divides them into separate fields, and passes it to the next component.
tFileOutputDelimited It is used to get the output from the input data in a delimited file based on the defined schema.
tFileInputExcel It reads an excel file row by row and divides them into separate fields, and passes it to the next component.
tFileOutputExcel It is used to write an MS Excel file with different data values based on a defined schema.
tFileList It is used to get all the files and directories from a given file mask pattern.
tFileArchive It is used to compress a set of files or folders into a zip, gzip, or tar.gz archive file.
tRowGenerator It provides an editor where we can write functions or choose expressions to generate our sample data.
tMsgBox It returns a dialog box with the message specified and an OK button.
tLogRow It is used to monitor the data which is getting processed. And it always displays data/output in the run console.
tPreJob It defines the sub-jobs that will run before our actual job started.
tMap tMap is used to transform and route the data from single or multiple sources to single and various destinations.
tJoin It is used to join two tables by performing inner and outer joins between the main flow and the lookup flow.
tJava It enables you to use personalized java code in the Talend program.
tRunJob It is used to manage the complex job systems by running one Talend job after another.
tCloudStart It is used to start instances on AmazonEC2(Amazon Elastic Compute Cloud)
tCloudStop It is used to change the status of a launched instance on Amazon EC2(Amazon Elastic Compute Cloud)
tDotNETInstantiate It is used to invoke the constructor of a .NET object, which is intended for later reuse.
tDotNETRow It helps us to transform the data by utilizing the custom or built-in.NET classes.
tDB2Connection It is used to open a connection in a specified database, which can be reused in the subsequent subjob or subjobs.
tFileFetch It is used to retrieve a file through the given protocol (HTTP, HTTPS, FTP, or SMB).
tFTPClose It helps us to close an active FTP connection to release the taken resources.
tFTPConnection It is used to open the FTP connection to transfer the file in a single transaction.
tFTPDelete It is used to delete the files or folders in a specified directory on the FTP server.
tFileInputJSON It is used to extract JSON data from a file and transfer the data to a file, database table, etc.
tFileOutputJSON It helps us to receive the data and rewrites it in a JSON structured data block in an output file.
tFileInputXML It reads the XML structure related file row by row and breaks them up into fields and sends those fields, which is defined in the schema for the next component.
tFileOutputXML It writes an XML file with separated data values based on a defined schema.
tReplicate It is used to duplicate the incoming schema into two identical output flows.

Connectors:

  • Row
  • Iterate
  • Triggers
  • Link
Talend Data Integration Components and Connectors

Row:

The row connector is used to maintain the actual data flow, some of the following row connectors are as below,

  • Main
  • Lookup
  • Filter
  • Rejects
  • ErrorRejects
  • Output
  • Unique/duplicates
  • Multiple input/output

Main:

The most commonly used row connection is Main because it helps to pass on the data flows from one component to the other and iterate on each row or reading input data based on the component properties setting.

Note:
We cannot connect two input components with the help of the Main row connection.
One incoming Row connection is possible per component because we will not be able to link twice the same target component using the Main row connection.

The second-row connection will be called as Lookup.

For connecting the two-component with the help of Main row connection,

Right-click on the input component, and select Row → Main on the connection list as we can see in the below image,

Talend Data Integration Components and Connectors

Or,

We can click on the component to highlight it, then right-click it or click on the O icon, which is visible on the side of it, then drag the cursor towards the destination component, which automatically creates a Row → Main type of connection.

Lookup:

The Lookup row connection is used when we want to connect multiple input flows.

It is a sub-flow component of the main flow component, which means that it is allowed to receive more than one incoming flows.

For connecting the lookup row connection, right-click on the row which needs to be changed and one popup menu will open, then click on the Set this connection as Main to turn the lookup row into the main row, as we can see in the below image,

Talend Data Integration Components and Connectors

Filter:

The filter row connection is used to connect the tFilterRow component specifically to an output component. It is used to collect the data matching for the filtering criteria.

Rejects:

The Rejects row connection is used to connect processing components to the output component.

It is used to collect the data, which does not match the filter or not valid for the expected output.

It also allows us to track the data which cannot be processed for reasons like the wrong type, undefined null value, etc. on some components.

When the Die on error option is deactivated, the reject connection got enabled.

ErrorRejects:

The ErrorRejects connection is used to connect the tMap components to the output component.

It is enabled when we clear the Die on Error checkbox in the tMap editor, and it collects data, which cannot be processed on some components.

Output:

The output row connection is used to connect a tMap component to one or more output components.

Unique/Duplicate:

The unique/duplicate row connection is used for connecting a tUniqRow to the output components.

The Unique row connection is used to collect the rows, which are found first in the incoming flow, and this flow of unique data is directed to the related output component or else to another processing subjob.

The Duplicate row connection is used to collect the possible duplicate of the first related rows.

Multiple input/output:

This type of row connection is used to handle the data through various inputs and outputs.

Combine:

A combine row connection is used to connect one CombinedSQL component to another.

Iterate:

To perform a loop on files contained in a directory, rows available in a file or the database entries is done by iterate connectors.

It is mainly used to connect the star component of flow (in a subjob).

Triggers:

The trigger connectors are used to create a dependency between jobs and Subjob, which are triggered one after the other according to the trigger's nature.

Talend Data Integration Components and Connectors

There are two types of triggers available in Talend:

  • Subjob triggers
  • Component triggers
Subjob triggers Description
OnSubjobOK It is used to trigger the next subjob on the condition where the subjob is completed without any error.
OnSubjobError It is used to trigger the next subjob when the first (Main) subjob is not completed correctly.
Run if It triggers a subjob or a component when the condition is met.

Component triggers description
OnComponentOk This type of connection is used to trigger the target component once the execution of the source component is completed without any error.
OnComponentError It will trigger the subjob or a component as soon as an error is encountered in the primary job.

Link:

The link connector is used only with ETL components. This type of connection does not handle the actual data but only the metadata, which concerns the operating table.






Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA