Why we need data integration and what can we do?

The first question is why we need data integration?

Let me give you an example here to answer this question.

Image for post
Image for post

Every company has many departments, and different departments use different tools to store their data. For example, marketing team may use hubspot tool.

Now, we have different departments which store different types of data in a company.

However, insightful information is needed to make business decisions through those large amount of data.

What can we do?

Maybe we can connect all the databases everytime to generate reports. However, it will cost us large amount of time, then the term of data integration is raised.

What is data integration?

Data integration is a process in which heterogeneous data is retrieved and combined as an incorporated form and structure.

There are some advantages of data integration:

  • Reduce data complexity
  • Data integrity
  • Easy data collaboration
  • Smarter business decisions

There are also some well-known tools can do data integration, e.g., Microsoft SQL Server Integration Services (SSIS), Informatica, Oracle Data Integrator etc.

This blog we will talk about SSIS, which is one of the most popular ETL tool in New Zealand.

Why SSIS?

  • Data can be loaded in parallel to many varied destinations
  • SSIS removes the need of hard core programmers
  • Tight integration with other products of Microsoft
  • SSIS is cheaper than most other ETL tools

SSIS stands for SQL Server Integration Services, which is a component of the Microsoft SQL Server database software that can be used to perform a broad range of data integration and data transformation tasks.

What can SSIS do?

It combines the data residing in different sources and provides users with an unified view of these data.

Also, it can also be used to automate maintenance of SQL Server databases and update to multidimensional analytical data.

How is works?

These tasks of data transformation and workflow creation is carried out using SSIS package:

Operational Data->ETL->Data Warehouse

In the first place, operational data store (ODS) is a database designed to integrate data from multiple sources for additional operations on the data. This is the place where most of the data used in current operation is housed before it’s tranferred to the data warehouse for longer tem storage or achiiving.

Next step is ETL(Extract, Transform and Load), the process of extracting the data from various sources, tranforming this data to meet your requirement and then loading into a target data warehouse.

Image for post
Image for post

The third step is data warehouse, which is a large set of data accumulated which is used for assembling and managing data from various sources of answering business questions. Hence, helps in making decisions.

What is SSIS package?

It is an object that implements integration services functionality to extract, transform and load data. A package is composed of:

  • connection
  • control flow elements(handle workflow)
  • data flow elements(handle data transform)

If you want to investigate more about SSIS, feel free to check it out on Microsoft official documents.

If you are interested in or have any problems with SSIS, feel free to contact me.

Or you can connect with me through my LinkedIn.

Originally published at http://jacquiwu.com on July 18, 2020.

Written by

A current BI Analyst in a subsidiary under Webjet, with experience in applying data science techniques to business.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store