SSIS Tutorial

SSIS Tutorial

SSIS tutorial provides basic and advanced concepts of SQL Server Integration Services. Our SSIS tutorial is designed for beginners and professionals.

SQL Server Integration Service is a fast and flexible data warehousing tool used for data extraction, transformation, and data loading. It makes it easy to load the data from one database to another database such as SQL Server, Oracle, Excel file, etc.

In this tutorial, we will discuss the following topics:


What is SSIS?

  • SSIS stands for SQL Server Integration Services.
  • It is a component available in the Microsoft SQL Server database software used to perform a wide range of integration tasks.
  • It is a data warehousing tool used for data extraction, loading the data into another database, transformations such as cleaning, aggregating, merging data, etc.
  • SSIS tool also contains the graphical tools and window wizards workflow functions such as sending email messages, ftp operations, data sources.
  • SSIS is used to perform a wide range of transformation and integration tasks. As a whole, the SSIS tool is used in data migration.

SSIS is a tool mainly used to perform two functionalities:

  • Data Integration
    SSIS performs data integration by combining the data from multiple sources and provides unified data to the users.
  • Workflow
    Workflow can be used to perform several things. Sometimes we need to execute some specific steps or a particular path which is either based on the time period or the parameter passed to the package or the data queried from the database. It can be used to automate the maintenance of SQL Server databases and provides the update to the multidimensional analytical data.

What is Data Integration?

Data Integration is a process that you follow to integrate the data from multiple sources. The data can be either heterogeneous data or homogeneous data. The data can be structured, semi-structured, or unstructured. In Data Integration, the data from different dissimilar data sources integrate to form some meaningful data.

Some methods are used to achieve data integration:

What is Data Integration
  • Data Modelling: In Data Modelling, you need first to create the data model and perform operations on it.
  • Data Profiling: Data Profiling is a process which is used to check the errors, inconsistency, or variations in the available data. Data Profiling ensures the data quality where data quality refers to the accuracy, consistency, and completeness of data.

Advantages of Data Integration:

What is Data Integration
  • Reduce data complexity
    It reduces data complexity which means that the data can be delivered to any system. Data Integration maintains the complexity, streamlined connections, and making it easy to deliver the data to any system.
  • Data integrity
    Data integrity plays a major role in data integration. It deals with cleansing and validating the data. Everyone wants high quality and robust data, so to achieve this data integration concept is used. Data integration is helpful in removing errors, inconsistency, and duplication.
  • Easy data collaboration
    Accessibility comes under data collaboration. Accessibility means that the data can be easily transformed, and people can easily integrate the data into projects, share their results, and keep the data up-to-date.
  • Smarter business decisions
    It also provides you to make smarter decisions. An integrated data refers to the transmit process within a company so that we can understand the information more easily. An integrated data is much easier and informative.

Why SSIS?

SSIS is used because of the following reasons:

Why SSIS
  • Data can be loaded in parallel to many varied destinations
    SSIS is used to combine the data from multiple data sources to generate a single structure in a unified view. Basically, it is responsible for collecting the data, extracting the data from multiple data sources, and merging into a single data source.
  • Removes the need of hard core programmers
    SSSIS is a platform that has the capability to load a large amount of data from excel to a SQL Server database.
  • Integration with other products
    SSSIS tool provides tight integration with other products of Microsoft.
  • Cheaper than other ETL tools
    SSSIS tool is cheaper than most of the other tools. It can resist with other base products, their manageability, business intelligence, etc.
  • Complex error handling within dataflows
    SSSIS allows you to handle the complex error within a dataflow. You can start and stop the dataflow based on the severity of the error. You can even send an email to admin when some error occurs. When an error is resolved, then you can pick the path in between the workflow.

How SSIS works?

We know that SSIS is a platform for two functions, i.e., Data Integration and workflow. Both the tasks Data transformations and workflow creation are carried by using the SSIS package. SSIS package consists of three components:

How SSIS works

Operational data

Operational data is a database used to integrate the data from multiple data sources to perform additional operations on the data. It is the place where the data is housed for current operation before sending to the data warehouse for storing, reporting, or archiving.

ETL

  • ETL is the most important process in SSIS tool. ETL is used to Extract, Transform, and Load the data into a data warehouse.
  • ETL is a process responsible for pulling out the data multiple data sources, transforming the data into useful data, and then storing the data into a data warehouse. The data can be in any format xml file, flat file, or any database file.
  • It also ensures that the data stored in the data warehouse is relevant, accurate, high quality, and useful to the business users.
  • It can be easily accessed so that the data warehouse can be used effectively and efficiently.
  • It also helps the organization to make data-driven decisions by retrieving the structured and unstructured data from multiple data sources.

An ETL is a three-word concept, but it is divided into four phases:

How SSIS works

Capture: Capture phase is also known as Extract phase. In this phase, it picks the source data or metadata, and the data can be in any format such as xml file, flat file, or any database file.

Scrub: In this phase, the original data is checked. It checks the data, whether it consists of any errors or not. It checks for the errors or inconsistency of data by using some artificial intelligence techniques. In short, it verifies whether the quality of the product is met or not.

Transform: It is the third phase in ETL. Transformation is the process in which the original format is converted into a required format that you want. Transformation is modelling or changing the data according to the user requirements. The changes can be either change in the number of columns or rows.

Load and index: The fourth phase is Load and index. It loads the data and validates the number of rows that have been processed. Once the loading of data is completed, the indexing is used. Indexing helps you to track the number of rows that are loaded in the data warehouse. Indexing also helps to identify the data, whether it is in the correct format or not.

Data Warehouse

Data warehouse is a single, complete, and consistent store of data which is formulated by combining the data from multiple data sources.

Difference between Database and Data warehouse

The answer can be yes as well as no. Both the database and data warehouse have a large unit of data and similar physical representation but the response time of complex queries in Data warehouse is faster than the database.


Requirements for SQL Server Integration Services

The following are the requirements to install the SQL Server Integration Services:

  • Install the SQL Server
  • Install the SQL Server Data Tools

Follow the below steps to install the SQL Server Data tools:

Step 1: Click on the link https://docs.microsoft.com/en-us/sql/ssdt/previous-releases-of-sql-server-data-tools-ssdt-and-ssdt-bi?view=sql-server-2017 to download the SQL Server data tools.

Step 2: When you click on the above link, the screen appears shown below:

Requirements for SQL Server Integration Services

In the above screen, select the version of SSDT that you want to install.

Step 3: Once the downloading is completed, run the downloaded file. When you run the downloaded file, the screen appears which is shown below:

Requirements for SQL Server Integration Services

Step 4: Click on the Next button.

Step 5: Select the visual studio instance and the tools that you want to install in the visual studio 2017.

Requirements for SQL Server Integration Services

Step 6: Click on the Install button.


What is the SSIS Package?

The Package is a fundamental block where you code in SSIS. Here, code does not mean that you are coding in some programming language; it means the development that you do. The development is done in the SSIS package. SSIS is mainly used for the ETL process, and the ETL process is performed inside the SSIS package.

SSIS package is composed of three parts:

What is The SSIS Package
  • Connections
    SSIS package will have some connections, and these connections are used to connect to various data sources.
  • Control flow elements
    SSIS package is composed of two elements, i.e., control flow elements and data flow elements. Control flow elements handle workflows. Workflow means that we are performing some tasks in steps, so the sequence is done through control flow.
  • Data flow elements
    The data flow elements perform transformations.

SSIS Tasks

In SSIS package, we can add the tasks. A task is a unit of work and we will have different types of tasks to perform different kinds of work. There are various types of tasks, but we will discuss the most common tasks used in SSIS:

  • Execute SQL Task
    It is used to execute the SQL statements against a relational database.
  • Data Flow Task
    It is mainly used to read the data from one or multiple data sources, transform the data and can also load the data to one or more destinations.
  • Analysis Services Processing Task
    It is used to process objects of an SSAS cube or Tabular model.
  • Execute Package Task
    This task is used to call the other packages within the same project. You can even pass the variable values to the called package.
  • Execute Process Task
    It allows you to run an application or batch scripts as a SQL Server Integration Services. It can be used to open the standard application such as Microsoft Excel, Microsoft Word, etc. It is also used to unzip the compressed file.
  • File System Task
    It can be used to perform the manipulations in the file system such as moving files, deleting files, renaming the files, changing the directory, etc.
  • FTP Tasks
    IFTP tasks are used to perform the operations on files and folders. For example, if you want to send or receive the file from the FTP server to the local directory, then the SSIS FTP task is used.
  • Script Task
    This task allows you to write the .Net code that you want to perform.
  • Send Mail Task
    This task is used to send an email. It is mainly used when you want to notify the users regarding the state of the task whether it is in a running state or some error has occurred.

Let's see some working example of a Data integration service.


Example of Data Flow Task

Step 1: First, we create an excel file in Microsoft Excel. Suppose I create the student database in Microsoft Excel, and fields in student table are student_id, student_name, marks, and gender.

SSIS Tasks

Step 2: Open the visual studio (SQL Server Data Tools).

SSIS Tasks

Step 3: Click on the File option and then move to New->Project.

SSIS Tasks

Step 4: When you click on the Project, the screen appears, which is shown as below:

SSIS Tasks

In the above screen, click on the Integration Services appearing at the leftmost side of the panel and then click on the Integration Services Project.

Step 5: After clicking on the Integration Services Project option, the screen appears shown as below:

SSIS Tasks

The above screen consists of five tabs such as Control flow, Data flow, Parameters, Event Handler, and Package Explorer. We have already discussed the Control flow and Data flow, both the control flow and data flow exist inside the SSIS package.

step 6: To import the data, we need first to create the database in which we create the table in SQL Server database. As we know that student data is available inside the Excel file, and we want to import the data in SQL Server database. In order to achieve this, we need first to create the table in SQL Server. Open the SQL Server Management studio.

Step 7: Create the student database. Right-click on the database, and then click on the New Database option.

SSIS Tasks

Step 8: Enter the database name in the below screenshot:

SSIS Tasks

I have given the StudentWarehouse as a database name.

Step 9: Click on the OK button.

Step 10: Now, we create the table in a StudentWarehouse database. Right-click on the Tables, and then click on the New Table.

SSIS Tasks

Step 11: Now we add the fields in a table such as student_id, student_name, marks, and gender. We set the student_id as a primary key which uniquely identifies each row.

SSIS Tasks

The above screen shows that we add four fields in a table. After adding all the fields in a table, we save the table by providing some useful name. Suppose I have provided the student as a table name.

Step 12: Move back to the visual studio. Control flow contains the Data Flow Task, drag and drop the Data Flow Task on the main window.

SSIS Tasks

Step 13: When we double-click on the Data Flow Task, the control automatically moves to the Data Flow from the Control Flow, so we can say that Control Flow is a container of Data Flow.

Step 14: Now, we can perform the transformations in Data Flow. We want to extract the data from Excel file that we created earlier, so drag and drop the Excel source from the other sources appearing at the leftmost side of the panel.

SSIS Tasks

From the above screen, we observe that the red cross appears inside the Excel Source, it means that the component is configured with the Excel Source.

Step 15: To configure the component, right-click on the Excel source and click on the Edit option as shown in the below screenshot:

SSIS Tasks

Step 16: Fill the details shown in the below screenshot:

SSIS Tasks

Step 17: After entering all the details, the screen appears shown as below:

SSIS Tasks

Step 18: Click on the Excel Souce.

SSIS Tasks

The above screen shows two arrows, red and blue arrow. The blue arrow defines the actual data that we receive from the Excel file, and the red arrow denotes the errors that we have. Now we have to perform the transformations, and before performing the transformations, we need to load the database.

Step 19: To load the database, click on the Other Destinations and then click on the Destination.

SSIS Tasks

Step 20: After adding the destination, connect the source to the Destination.

SSIS Tasks

Step 21: In the above step, we noticed the red cross in ADO NET Destination, which means that it is not configured with a component. Right-click on the ADO NET Destination and then click on the Edit option as shown in the below screenshot:

SSIS Tasks

Step 22: You need to specify the configuration manager.

SSIS Tasks

Step 23: Click on the New button to add the connection manager. When you click on the New button, the screen appears shown below:

SSIS Tasks

Step 24: Add the Server name and select the database name from the SQL Server.

SSIS Tasks

Step 25: Click on the OK button.

Step 26: Add the table name in which you want to load the data.

SSIS Tasks

Step 27: In the below screen, map the field of input source to the field of the destination source.

SSIS Tasks

Step 28: After adding a mapping, the source is connected to the Destination, as shown in the below screenshot:

SSIS Tasks

Step 29: Click on the Start button to run the Integration Services Project. When you run the project, the data gets loaded to the destination, i.e., SQL Server.


Example of Execute SQL Task

  • Follow the steps from step 2 to step 5 of the previous example.
  • Drag and drop the Execute SQL Task appearing at the leftmost panel to the main window.
SSIS Tasks

In the above screen, we observe that the red-cross appears in the Execute SQL Task, which means that the Connection Manager is not configured with the Execute SQL Task. To configure it, we need to add the Connection Manager.

  • To add the Connection Manager, Right-click on the Connection Manager and then click on the New Connection Manager as shown in the below screenshot:
SSIS Tasks
  • Select the OLEDB as a Connection Manager type and then click on the Add button as shown in the below screenshot:
SSIS Tasks
  • After clicking on the Add button, the screen appears shown in the below screenshot:
SSIS Tasks

In the above screen, we observe that the data connection of StudentWarehouse database is already available as we created in the previous step. Click on the OK button.

  • When you click on the OK button, the screen appears as shown below:
SSIS Tasks

In the above screen, you need to enter the details such as the name of your task, description, SQL statement (SQL task that you need to perform on the table).

  • I entered the following details in the above screenshot:
SSIS Tasks

In the above case, I have provided the insert command in SQLStatement.

  • Click on the OK button.
  • When you configure the component with the Execute SQL Task, the red cross disappears as shown in the below screenshot:
SSIS Tasks
  • Run the project by clicking on the Start.
SSIS Tasks
  • When you click on the Start, the screen appears shown as below:
SSIS Tasks

In the above screen, the tick mark shows that your SQL Task has been successfully completed. The insert statement which we added in the previous steps has been executed, and the data is added in table. i.e., student1.

  • Output of this task is shown below
SSIS Tasks

Prerequisite

Before learning SSIS, you must have the basic knowledge of SQL Server.

Audience

Our SSIS tutorial is designed to help beginners and professionals.

Problem

We assure that you will not find any problem in this SSIS tutorial. But if there is any mistake or error, please post the error in the contact form.


Next Topic#