Why you are confused about data integration

By Chris Dima
May 24, 2021
The word data integration is used very frequently these days by many companies, and for many different reasons. Since data is the lifeblood of virtually every modern business, and by definition integrating the data from many sources is often required, the term is ubiquitous in the data and analytics market.

However, depending on the use case, and the desired end results it can mean very different things.

Automation vs. Analytics

A very popular use case for “integrating” data is to automate business processes between applications within a company. For example, you may use Hubspot to manage your sales process and customer quotes. But when you sell something you need this data to flow into your accounting system so you can process the order (Quote to Order automation). Use cases like this abound in companies, and they are very important to streamline a business.  So in this regard you are “integrating” data from your CRM to your ERP or accounting system. However, this is not the same as the analytics use case. For analytics you might need to “integrate” data from multiple data sources, both internal and external and then load this data into a data warehouse, database, or BI platform to perform further analytics and gain business insights. In both cases the term “integration” is used liberally, but the use case and end results are completely different, and solutions focused on one or the other won’t work well if at all for both.

Application Integration (Automation)

Application automation is all about creating efficiency across your business stack. Sales, Operations, Finance, all connected. Use cases typically involve moving a smaller number of individual fields ( <50) from one application to another (or several). And in the majority of the cases the data is updated based on some sort of business trigger (quote moves to sale) that causes the data to get updated, this is the “automation” part of the description. What typically does not happen in this process is large scale movement of large amounts of data and significant transformation of this data. Examples of companies providing data “automation” are tray.io, funnel.io, celigo.com and workato.com. These solutions are generally cloud based, and focus on real time integration, meaning they update the target systems quickly when new data is available and this can be very frequently.

Analytics Integration

In this use case it’s common to load entire data sets from many sources (both internal and external) including SaaS applications (API’s), databases, IoT devices, streaming queues (Kafka) and more. These data sets can be extremely large, containing thousands of fields and millions of rows of data, and they can often require significant transformation before loading into an analytic destination like a Data Warehouse or BI/ML platform. The data is loaded as SQL ready tables so any analyst can slice and dice it and create reports, insights and predictions. By contrast, this data might only be updated once per day, or maybe once per hour, but not “real time” as new data comes in.

Which Do You Need?

In theory solutions built for both the automation and analytics “integration” use case can be used for either, but in practice the technologies, approaches, architecture and even costs won’t support interchangeability.  Trying to load 10 terabytes of data using an automation solution will be slow, very expensive and probably won’t work since things like CDC (Delta loads), large scale schema transformation, and transfer speed will be unacceptable. And the same is true for the reverse, using a solution built for creating large scale analytic pipelines to process frequent, small data business automations will be inefficient.

So you can start to see that while all these solutions talk about “integration” and data movement, they are focused on very different use cases, and in most cases different users.  So as you begin any new data integration project the important first step is to determine if this is focused on automating a business process, or building out an analytic pipeline that will be used for a variety of different business cases and insights. Another way to determine the focus is asking if you plan to use SQL based tools on the data once it’s moved? In the automation use case the answer is usually no, it’s just a matter of moving the data from one business system to another. For the analytic use case the answer is always yes, the data ends up in a Data warehouse, database, or BI/ML platform and is queried and visualized in some way.

Summary

Ultimately any data integration comes down to what business goal does it support?  If the answer is automating business processes between applications then you need a data automation solution. Most popular solutions for this are SaaS based solutions. If your goal is building out a data warehouse or data lake containing data from a variety of sources for purposes of business analytics or machine learning then you need Analytic integration.

Each solution and approach requires different technologies and pricing models to achieve a consistent, cost effective solution.

News and Blog

Analytics-Ready Data from Infor In One Day

Analytics-Ready Data from Infor In One Day

The Legacy Companies is a leading manufacturer and exporter of food service equipment serving over 130 countries through a network of distributors, dealers and retail channels, representing a diverse portfolio brands such as: Avanti, Bevles, Blakeslee, Chef’sChoice, Excalibur, General, Legion, Maxx Cold, Maxx Ice, Maxx Scientific, Nautilus, and more. The following is a transcript of an interview with Adrian Ostman, CTO, The Legacy Companies.

read more
Precog’s AI-powered, no-code data integration for SAP

Precog’s AI-powered, no-code data integration for SAP

The platform also automates the generation of analytic-ready tables, directly extracting data from various sources including application APIs, and creates tables suitable for analysis in SAP Datasphere, SAP HANA, and other data warehouses and databases, guaranteeing accuracy and efficiency in data processing.

read more

From Our Customers

Localize

We chose to use Precog because they were the only company willing to handle our complex data connections. Since the beginning, it has been one of those tools that just works solidly and reliably.

Derek Binkley - Engineering Manager
Cured

The Precog platform has delivered data connectors to necessary data sources other vendors could not or would not, and in a very short timeframe.

Ashmer Aslam - CEO Cured
Walnut St. Labs

Precog lets us prototype analytics projects quickly — building marketing dashboards based on data from a variety of sources — without needing a data engineer or developer.

Chris Dima - CEO
Alteryx

We welcome Precog to the Alteryx technology partner ecosystem as a partner extending the capabilities of our platform, further simplifying analytics for our customers.

Hakan Soderbom - Director of Technology Alliances
SouthEnd

We recognized a need in our customer base to perform advanced analytics on SAP data sets. We chose Precog based on its performance and its strong strategic relationship with SAP.

Alfredo Poncio - CEO
SendaRide

Precog is a vital tool — it gives us the ability to pull data from a variety of business sources quickly and cleanly.

Josh Wilsie - VP