Talend Data Integration Components and Connectors:
In this section, we are going to learn about the data integration components and connectors, which are used while creating a job.
The Connectors and components perform all the operations in Talend, and it provides 800+ connectors, and components to perform multiple actions.
The components are available in the palette panel, and there are 21 main categories, which belong to the components.
By doing drag and drop in the designer panel, we can choose the connectors, and it automatically creates the java code.
After that, save the Talend code and execute it.
We are showing a list of the components available in the palette panel in the below image,
The above list is widely used as the connectors and components for the Talend data integration.
Let us see some commonly used components for the data integration in Talend studio,
|Components for Data Integration
||It is used to connect the MySQL database, which is defined in the component.
||It is used to run the database query to read a database and extract fields (tables, views, etc.) depending on the query.
||It is used to write, update, and modify data in the MySQL database.
||It reads a delimited file row by row and divides them into separate fields, and passes it to the next component.
||It is used to get the output from the input data in a delimited file based on the defined schema.
||It reads an excel file row by row and divides them into separate fields, and passes it to the next component.
||It is used to write an MS Excel file with different data values based on a defined schema.
||It is used to get all the files and directories from a given file mask pattern.
||It is used to compress a set of files or folders into a zip, gzip, or tar.gz archive file.
||It provides an editor where we can write functions or choose expressions to generate our sample data.
||It returns a dialog box with the message specified and an OK button.
||It is used to monitor the data which is getting processed. And it always displays data/output in the run console.
||It defines the sub-jobs that will run before our actual job started.
||tMap is used to transform and route the data from single or multiple sources to single and various destinations.
||It is used to join two tables by performing inner and outer joins between the main flow and the lookup flow.
||It enables you to use personalized java code in the Talend program.
||It is used to manage the complex job systems by running one Talend job after another.
||It is used to start instances on AmazonEC2(Amazon Elastic Compute Cloud)
||It is used to change the status of a launched instance on Amazon EC2(Amazon Elastic Compute Cloud)
||It is used to invoke the constructor of a .NET object, which is intended for later reuse.
||It helps us to transform the data by utilizing the custom or built-in.NET classes.
||It is used to open a connection in a specified database, which can be reused in the subsequent subjob or subjobs.
||It is used to retrieve a file through the given protocol (HTTP, HTTPS, FTP, or SMB).
||It helps us to close an active FTP connection to release the taken resources.
||It is used to open the FTP connection to transfer the file in a single transaction.
||It is used to delete the files or folders in a specified directory on the FTP server.
||It is used to extract JSON data from a file and transfer the data to a file, database table, etc.
||It helps us to receive the data and rewrites it in a JSON structured data block in an output file.
||It reads the XML structure related file row by row and breaks them up into fields and sends those fields, which is defined in the schema for the next component.
||It writes an XML file with separated data values based on a defined schema.
||It is used to duplicate the incoming schema into two identical output flows.
The row connector is used to maintain the actual data flow, some of the following row connectors are as below,
- Multiple input/output
The most commonly used row connection is Main because it helps to pass on the data flows from one component to the other and iterate on each row or reading input data based on the component properties setting.
We cannot connect two input components with the help of the Main row connection.
One incoming Row connection is possible per component because we will not be able to link twice the same target component using the Main row connection.
The second-row connection will be called as Lookup.
For connecting the two-component with the help of Main row connection,
Right-click on the input component, and select Row → Main on the connection list as we can see in the below image,
We can click on the component to highlight it, then right-click it or click on the O icon, which is visible on the side of it, then drag the cursor towards the destination component, which automatically creates a Row → Main type of connection.
The Lookup row connection is used when we want to connect multiple input flows.
It is a sub-flow component of the main flow component, which means that it is allowed to receive more than one incoming flows.
For connecting the lookup row connection, right-click on the row which needs to be changed and one popup menu will open, then click on the Set this connection as Main to turn the lookup row into the main row, as we can see in the below image,
The filter row connection is used to connect the tFilterRow component specifically to an output component. It is used to collect the data matching for the filtering criteria.
The Rejects row connection is used to connect processing components to the output component.
It is used to collect the data, which does not match the filter or not valid for the expected output.
It also allows us to track the data which cannot be processed for reasons like the wrong type, undefined null value, etc. on some components.
When the Die on error option is deactivated, the reject connection got enabled.
The ErrorRejects connection is used to connect the tMap components to the output component.
It is enabled when we clear the Die on Error checkbox in the tMap editor, and it collects data, which cannot be processed on some components.
The output row connection is used to connect a tMap component to one or more output components.
The unique/duplicate row connection is used for connecting a tUniqRow to the output components.
The Unique row connection is used to collect the rows, which are found first in the incoming flow, and this flow of unique data is directed to the related output component or else to another processing subjob.
The Duplicate row connection is used to collect the possible duplicate of the first related rows.
This type of row connection is used to handle the data through various inputs and outputs.
A combine row connection is used to connect one CombinedSQL component to another.
To perform a loop on files contained in a directory, rows available in a file or the database entries is done by iterate connectors.
It is mainly used to connect the star component of flow (in a subjob).
The trigger connectors are used to create a dependency between jobs and Subjob, which are triggered one after the other according to the trigger's nature.
There are two types of triggers available in Talend:
- Subjob triggers
- Component triggers
||It is used to trigger the next subjob on the condition where the subjob is completed without any error.
||It is used to trigger the next subjob when the first (Main) subjob is not completed correctly.
||It triggers a subjob or a component when the condition is met.
||This type of connection is used to trigger the target component once the execution of the source component is completed without any error.
||It will trigger the subjob or a component as soon as an error is encountered in the primary job.
The link connector is used only with ETL components. This type of connection does not handle the actual data but only the metadata, which concerns the operating table.