Why You Are Confused About Data Integration
By Jeff Carr
May 24, 2021

The word data integration is used very frequently these days by many companies, and for many different reasons. Since data is the lifeblood of virtually every modern business, and by definition integrating the data from many sources is often required, the term is ubiquitous in the data and analytics market.

However, depending on the use case, and the desired end results it can mean very different things.

Automation vs. Analytics

A very popular use case for “integrating” data is to automate business processes between applications within a company. For example, you may use Hubspot to manage your sales process and customer quotes. But when you sell something you need this data to flow into your accounting system so you can process the order (Quote to Order automation). Use cases like this abound in companies, and they are very important to streamline a business.  So in this regard you are “integrating” data from your CRM to your ERP or accounting system. However, this is not the same as the analytics use case. For analytics you might need to “integrate” data from multiple data sources, both internal and external and then load this data into a data warehouse, database, or BI platform to perform further analytics and gain business insights. In both cases the term “integration” is used liberally, but the use case and end results are completely different, and solutions focused on one or the other won’t work well if at all for both.

Application Integration (Automation)

Application automation is all about creating efficiency across your business stack. Sales, Operations, Finance, all connected. Use cases typically involve moving a smaller number of individual fields ( <50) from one application to another (or several). And in the majority of the cases the data is updated based on some sort of business trigger (quote moves to sale) that causes the data to get updated, this is the “automation” part of the description. What typically does not happen in this process is large scale movement of large amounts of data and significant transformation of this data. Examples of companies providing data “automation” are tray.io, funnel.io, celigo.com and workato.com. These solutions are generally cloud based, and focus on real time integration, meaning they update the target systems quickly when new data is available and this can be very frequently.

Analytics Integration

In this use case it’s common to load entire data sets from many sources (both internal and external) including SaaS applications (API’s), databases, IoT devices, streaming queues (Kafka) and more. These data sets can be extremely large, containing thousands of fields and millions of rows of data, and they can often require significant transformation before loading into an analytic destination like a Data Warehouse or BI/ML platform. The data is loaded as SQL ready tables so any analyst can slice and dice it and create reports, insights and predictions. By contrast, this data might only be updated once per day, or maybe once per hour, but not “real time” as new data comes in.

Which Do You Need?

In theory solutions built for both the automation and analytics “integration” use case can be used for either, but in practice the technologies, approaches, architecture and even costs won’t support interchangeability.  Trying to load 10 terabytes of data using an automation solution will be slow, very expensive and probably won’t work since things like CDC (Delta loads), large scale schema transformation, and transfer speed will be unacceptable. And the same is true for the reverse, using a solution built for creating large scale analytic pipelines to process frequent, small data business automations will be inefficient.

So you can start to see that while all these solutions talk about “integration” and data movement, they are focused on very different use cases, and in most cases different users.  So as you begin any new data integration project the important first step is to determine if this is focused on automating a business process, or building out an analytic pipeline that will be used for a variety of different business cases and insights. Another way to determine the focus is asking if you plan to use SQL based tools on the data once it’s moved? In the automation use case the answer is usually no, it’s just a matter of moving the data from one business system to another. For the analytic use case the answer is always yes, the data ends up in a Data warehouse, database, or BI/ML platform and is queried and visualized in some way.

Summary

Ultimately any data integration comes down to what business goal does it support?  If the answer is automating business processes between applications then you need a data automation solution. Most popular solutions for this are SaaS based solutions. If your goal is building out a data warehouse or data lake containing data from a variety of sources for purposes of business analytics or machine learning then you need Analytic integration.

Each solution and approach requires different technologies and pricing models to achieve a consistent, cost effective solution.

NEWS & BLOG

The Failed Promise of Cloud ETL

The Failed Promise of Cloud ETL

With the constant explosion of new data sources and applications, both public and private, the old way doesn’t scale, period. What does scale is Precog’s concept of “just-in-time data sources.”

read more

Ready to Start?

FROM OUR CUSTOMERS

GiddyUp

Precog delivers on the dream of simple data architecture that is roaring across the world. Precog solves all these problems, keeping your warehouse up to date with all the data you need and making the ELT dream a reality.

Venkatarama Cherukupalli
Walnut St. Labs

Precog lets us prototype analytics projects quickly — building marketing dashboards based on data from a variety of sources — without needing a data engineer or developer — we create new data sources in a few hours to sources like Brightlocal, a popular local SEO SaaS solution, and h... Read More

Chris Dima - CEO
Alteryx

We welcome Precog to the Alteryx technology partner ecosystem as a partner extending the capabilities of our platform, further simplifying analytics for our customers.

Hakan Soderbom - Director of Technology Alliances
Data.World

Enterprises struggle to understand and trust the data sources powering their business analyses,” said Jon Loyens, co-founder and chief product officer at data.world. “Adding ways to integrate sources to our catalog introduces more flexibility to our users, increasing their efficiency a... Read More

Jon Loyens - Co-Founder and CPO
SouthEnd

We recognized a need in our customer base to perform advanced analytics on SAP data sets — we performed an extensive evaluation of Precog and chose it as a strategic solution for our go to market needs based on its performance and given their strong strategic relationship with SAP.

Alfredo Poncio - CEO
SouthEnd
Everflow

Precog changed the game for us — instead of grueling data integration work, Precog offers a ‘connect and go’ experience — this allows us to reallocate resources to our product and our customers.

Sam Darawish - CEO
SendaRide

Precog is the vital tool in our ability to pull data from a variety of business sources quickly and cleanly. Our internal MongoDB backend, as well as other cloud services like Hubspot, were a constant challenge to the business teams desire for reporting data prior to using Precog. With the... Read More

Josh Wilsie - VP