The word data integration is used very frequently these days by many companies, and for many different reasons. Since data is the lifeblood of virtually every modern business, and by definition integrating the data from many sources is often required, the term is ubiquitous in the data and analytics market.
However, depending on the use case, and the desired end results it can mean very different things.
Automation vs. Analytics
A very popular use case for “integrating” data is to automate business processes between applications within a company. For example, you may use Hubspot to manage your sales process and customer quotes. But when you sell something you need this data to flow into your accounting system so you can process the order (Quote to Order automation). Use cases like this abound in companies, and they are very important to streamline a business. So in this regard you are “integrating” data from your CRM to your ERP or accounting system. However, this is not the same as the analytics use case. For analytics you might need to “integrate” data from multiple data sources, both internal and external and then load this data into a data warehouse, database, or BI platform to perform further analytics and gain business insights. In both cases the term “integration” is used liberally, but the use case and end results are completely different, and solutions focused on one or the other won’t work well if at all for both.
Application Integration (Automation)
Application automation is all about creating efficiency across your business stack. Sales, Operations, Finance, all connected. Use cases typically involve moving a smaller number of individual fields ( <50) from one application to another (or several). And in the majority of the cases the data is updated based on some sort of business trigger (quote moves to sale) that causes the data to get updated, this is the “automation” part of the description. What typically does not happen in this process is large scale movement of large amounts of data and significant transformation of this data. Examples of companies providing data “automation” are tray.io, funnel.io, celigo.com and workato.com. These solutions are generally cloud based, and focus on real time integration, meaning they update the target systems quickly when new data is available and this can be very frequently.
In this use case it’s common to load entire data sets from many sources (both internal and external) including SaaS applications (API’s), databases, IoT devices, streaming queues (Kafka) and more. These data sets can be extremely large, containing thousands of fields and millions of rows of data, and they can often require significant transformation before loading into an analytic destination like a Data Warehouse or BI/ML platform. The data is loaded as SQL ready tables so any analyst can slice and dice it and create reports, insights and predictions. By contrast, this data might only be updated once per day, or maybe once per hour, but not “real time” as new data comes in.
Which Do You Need?
In theory solutions built for both the automation and analytics “integration” use case can be used for either, but in practice the technologies, approaches, architecture and even costs won’t support interchangeability. Trying to load 10 terabytes of data using an automation solution will be slow, very expensive and probably won’t work since things like CDC (Delta loads), large scale schema transformation, and transfer speed will be unacceptable. And the same is true for the reverse, using a solution built for creating large scale analytic pipelines to process frequent, small data business automations will be inefficient.
So you can start to see that while all these solutions talk about “integration” and data movement, they are focused on very different use cases, and in most cases different users. So as you begin any new data integration project the important first step is to determine if this is focused on automating a business process, or building out an analytic pipeline that will be used for a variety of different business cases and insights. Another way to determine the focus is asking if you plan to use SQL based tools on the data once it’s moved? In the automation use case the answer is usually no, it’s just a matter of moving the data from one business system to another. For the analytic use case the answer is always yes, the data ends up in a Data warehouse, database, or BI/ML platform and is queried and visualized in some way.
Ultimately any data integration comes down to what business goal does it support? If the answer is automating business processes between applications then you need a data automation solution. Most popular solutions for this are SaaS based solutions. If your goal is building out a data warehouse or data lake containing data from a variety of sources for purposes of business analytics or machine learning then you need Analytic integration.
Each solution and approach requires different technologies and pricing models to achieve a consistent, cost effective solution.