Javatpoint Logo
Javatpoint Logo

Centralizing XML File Metadata

In this section, we will learn how to centralize XML File Metadata in Talend Studio for Data integration platform.

Before going further in this chapter first, we will understand why we will use XML Files.

It can be used to define the properties of both input and output connections like the tFileInputXML and tExtractXMLField components that are used to create the input connection for reading XML files.

The tAdvancedFileOutputXML components are used to create either write or update the Xml file in the output schema.

If we want to connect to an XML File, we will centralize the connection and schema information in the Repository for reusability.

To create the XML File connection from the beginning:

  • Go to the Repository panel.
  • Then expand the Metadata and right-click on the File XML, then select Create File XML option in the popup menu, as we can see in the below image:

Repository → Metadata → File XML → Create file XML

Centralizing XML File Metadata

Note: To use the centralized XML File in our job, go to the basic setting view of the necessary components with its property typeset as build-in for opening the File Metadata setup window.

Setting up XML metadata for an input file:

In this section, we will understand how to describe a file connection and upload the XML schema for an input file.

Then New XML File window will open where both the File connection and schema definitions are completed in five steps:

  • Define General Properties
  • Setting the type of metadata (input)
  • Uploading the XML file
  • Define the schema
  • Finalizing the End schema

Step 1: Defining General Properties

In the first step, we will define the general properties of the schema.

In the New Xml File window, fill all the necessary details like Name, Purpose, and Description.

We can also manage the version and status fields of a Repository item in the project settings dialog box.

Click on the Select button next to the Path field for selecting a folder under the File XML node to hold our newly created File connection.

After filling all the details of general properties, click on the Next button to select the type of metadata.

Centralizing XML File Metadata

Step 2: Setting the type of metadata (input)

Now, in this step, we will set the metadata as either input or output.

In the below dialog box, select Input XML to create a XML metadata.

Centralizing XML File Metadata

And, click on the Next button to proceed further.

Step 3: Uploading XML file

In the next step, we will upload the XML file.

To upload the XML file, follow the below process:

  • Click on the Browse button and browse our directory for uploading the XML file from our local system.
  • For example, we will select the xml File from our system.
<employeeDeatils>  <employee>  <empid>101</empid>  <firstName>Naina</firstName>  <lastName>Rai</lastName>  <company>Talend</company>  <city>Mumbai</city>  <phone>5554</phone>  </employee>  <employee>  <empid>102</empid>  <firstName>Kapil</firstName>  <lastName>Singh</lastName>  <company>Talend</company>  <city>Kanpur</city>  <phone>9900</phone>  </employee>  …... </employeeDeatils>
  • We can also change the Encoding type based on our file format if the system does not find it automatically.
  • The Limit field is used to enter the number of the columns on which the XPath query is to be executed, or we can put 0 to run it against all of the columns.
  • The Schema Viewer section is used to show the preview of the XML structure. We can expand and see every level of the file’s XML tree structure as shown in the below image:
Centralizing XML File Metadata
  • After that, click on the Next button to process further.

 

Step 4: Define the schema

In this step, we will be defining the settings of the parse job.

As we can see in the below image, we have four parts where we can define the schema:

Source schema: it displays the tree view of the XML file.

Target Schema: it shows the Extraction and iteration information.

Preview: it displays the preview of the target schema and the input data of the selected columns in the defined order together.

File Viewer: it displays the preview of the Xml file.

Centralizing XML File Metadata
  • To define the file parameter, first, we will define the XPath loop and the maximum number of times the loop can run.
  • There are two ways to generate the Xpath loop expression field, having an absolute Xpath expression.

First: Enter the absolute Xpath expression for the node to be emphasized.

Second: Drag the node from the source schema and drop it into the absolute Xpath expression field under the Target Schema.

  • The orange arrow represents a connection between the node and the corresponding expression.

Note: The Xpath loop expression is a mandatory field.

  • The Loop limit field is used to define the maximum number of times the selected node can be iterated or -1 if we want to run it against all of the rows.
  • We can select multiple nodes for dropping on the table by pressing the Ctrl or Shift keys and clicking the node.
  • The blue color arrow represents the linking of the selected nodes from Source Schema to the Field To extract And not selected ones are represented with gray color.
  • We can add many columns in the Field To extract table for the extraction or delete the columns and change the order of columns with the help of the toolbar.
  • For adding the columns, click on the [+] button, and for deletion, click on the [X] button, which is present on the toolbar.
  • To change the order of the columns, use the upward and downward arrows buttons in the toolbar as we can see in the below image:
Centralizing XML File Metadata
  • To see the preview of the Target, click on the Refresh Preview button as we can see in the below image:
Centralizing XML File Metadata

Note: The preview function is not valid if we load XSD file.

  • And, to verify and edit the end schema, click on the Next

Step 5: Finalizing the end schema

In the last step, we will be finalizing the end schema.

  • To customize the File schema, check the data type in the Type column, which is correct or not.
  • The Guess button is used to update and recover the XML File schema.

Note: If we have customized schema, the Guess feature does not keep these changes.

  • After that, click on the Finish button to complete the process, as we can see in the below image:
Centralizing XML File Metadata

To see the newly created Metadata in the Talend studio:

  • Go to the Repository panel, then go to Metadata.
  • After that, expand the File XML node, and select the New_XML_input metadata, as we can see in the below screenshot:

Repository → Metadata → File XML → New_XML_input

Centralizing XML File Metadata

To reuse the Metadata as a new component or the existing component, simply drag the File connection or schema from the Repository's Metadata node and drop it to the design workspace window.

For modifying the existing File connection:

  • Go to the Repository panel, then go to the Metadata node.
  • After that, expand the File XML, and right-click on the New_XML_input schema and select Edit File XML as shown in the below image:
Centralizing XML File Metadata

For adding a new schema to an existing File connection:

  • Go to the Repository panel, and right-click on the File xml.
  • Select Retrieve Schema from the popup menu in the Metadata, as we can see in the below image:
Centralizing XML File Metadata

Setting up XML metadata for an output file:

In this section, we will understand how to describe a file connection and upload the XML schema for an output file.

Then New XML File window will open where both the File connection and schema definitions are completed in five steps:

  • Define General properties
  • Setting the type of metadata (input)
  • Uploading the XML file
  • Define the schema
  • Finalizing the End schema

Step 1: Defining General Properties

In the first step, we will be defining the general properties of the schema.

  • Fill the necessary details like Name, Purpose, and Description.
  • We can also manage the version and status fields of a Repository item in the project settings dialog box.
  • Click on the Select button next to the Path field for selecting a folder under the File XML node to hold our newly created File connection.
  • After filling all the details of general properties, click on the Next button to select the type of metadata as we can see in the below image:
Centralizing XML File Metadata

Step 2: Setting the type of metadata (output)

Now, we will be setting the type of metadata as output.

In the below dialog box, select Output XML to create XML metadata.

Centralizing XML File Metadata

And, click on the Next button to proceed further.

Step 3: Defining the output file

In the next step, we will be defining the output file.

  • To define the output file, we will select either to create our file manually or from an existing XML or XSD file.
  • If we select the Create manually option, we will have to configure our schema, source, and target columns our self.
  • And the file will be created in a job with the help of an XML output component like

For creating an output XML structure from an Xml file, follow the below process:

  • In the output setting area, select the Create from a file
  • Click on the Browse button, which is corresponding to the XML or XSD File field, and browse the path of the XML file from our local system and double-click on the file.

For example, we will select the carr.xml File from our system.

  • We can also change the Encoding type based on our file format if the system does not find it automatically.
  • The Limit field is used to enter the number of the columns on which the Xpath query is to be executed or we can put 0 to run it against all of the columns.
Centralizing XML File Metadata
  • The File Viewer section displays the preview of the XML structure, and the File Content section is used to show the maximum of the first 50 rows of the file.
  • After that, in the Output File path area, we can browse the path of the output file in the Output File If the file does not exist as yet, it will be created during the execution of a job with the help of the tAdvancedFileOutputXML component, or if the file exists already, it will be overwritten.
  • Click on the Next button to process further.

Step 4: Define the schema

In this step, we will be defining the schema.

  • After defining the output file in the above step, the Linker Source section will be automatically mapped to the related ones in the Linker Target Section, which is denoted by blue arrow links.
  • To define the output schema, we have the following option to perform like:
  • In the Linker Source section, we can create a schema from the beginning by clicking on the Schema Management button, and it will open the schema editor to edit the source schema and passing the output schema.
  • In the Linker Target section, right-click on the element, which we want to run a loop and select Set As Loop Element from the popup menu as we can see in the below image:

Note: This is the mandatory option to define the element to run a loop on.

Centralizing XML File Metadata
  • We can select and drop the multiple fields at a time with the help of Ctrl + Shift keys and make a various selection.

This makes mapping faster, and we can also make a various selection from the right-click operations like:

  • Create as a sub-element of the target node
  • Create an attribute of a target node
  • Add linker to the target node

As we can see in the below image that we selected the second option, create an attribute of the target node, and click on the Ok button.

Centralizing XML File Metadata

And, to verify and edit the end schema, click on the Next button.

Step 5: Finalizing the end schema

In the last step, we will be finalizing the end schema.

  • To customize the XML File schema, check the data type in the Type column, which is correct or not.
  • The Guess button is used to update, and recover the XML File schema.

Note: If we have customized schema, the Guess feature does not keep these changes.

  • For adding the columns, click on the [+] button, and for deletion, click on the [X] button, which is present on the toolbar.
  • To change the order of the columns, use the upward and downward arrows buttons in the toolbar.
  • After that, click on the Finish button to complete the process, as we can see in the below image:
Centralizing XML File Metadata

To see the newly created Metadata in the Talend studio:

  • Go to the Repository panel, then go to Metadata.
  • After that, expand the File xml node, and select the New_XML_output metadata, as we can see in the below screenshot:

Repository → Metadata → File xml → New_XML_output

Centralizing XML File Metadata

To reuse the Metadata as a new component or the existing component, simply drag the File connection or schema from the Repository's Metadata node and drop it to the design workspace window.

For modifying the existing File connection:

  • Go to the Repository panel, then go to the Metadata node.
  • After that, expand the File xml, and right-click on the New_XML_output schema and select Edit File xml as we can see in the below image:
Centralizing XML File Metadata

For adding a new schema to an existing File connection:

  • Go to the Repository panel, and right-click on the new_XML_output schema from the File xml.
  • Select Retrieve Schema from the popup menu in the Metadata, as we can see in the below image:
Centralizing XML File Metadata




Youtube For Videos Join Our Youtube Channel: Join Now

Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA