Why You Are Confused About Data Integration
By Jeff Carr
May 24, 2021

The word data integration is used very frequently these days by many companies, and for many different reasons. Since data is the lifeblood of virtually every modern business, and by definition integrating the data from many sources is often required, the term is ubiquitous in the data and analytics market.

However, depending on the use case, and the desired end results it can mean very different things.

Automation vs. Analytics

A very popular use case for “integrating” data is to automate business processes between applications within a company. For example, you may use Hubspot to manage your sales process and customer quotes. But when you sell something you need this data to flow into your accounting system so you can process the order (Quote to Order automation). Use cases like this abound in companies, and they are very important to streamline a business.  So in this regard you are “integrating” data from your CRM to your ERP or accounting system. However, this is not the same as the analytics use case. For analytics you might need to “integrate” data from multiple data sources, both internal and external and then load this data into a data warehouse, database, or BI platform to perform further analytics and gain business insights. In both cases the term “integration” is used liberally, but the use case and end results are completely different, and solutions focused on one or the other won’t work well if at all for both.

Application Integration (Automation)

Application automation is all about creating efficiency across your business stack. Sales, Operations, Finance, all connected. Use cases typically involve moving a smaller number of individual fields ( <50) from one application to another (or several). And in the majority of the cases the data is updated based on some sort of business trigger (quote moves to sale) that causes the data to get updated, this is the “automation” part of the description. What typically does not happen in this process is large scale movement of large amounts of data and significant transformation of this data. Examples of companies providing data “automation” are tray.io, funnel.io, celigo.com and workato.com. These solutions are generally cloud based, and focus on real time integration, meaning they update the target systems quickly when new data is available and this can be very frequently.

Analytics Integration

In this use case it’s common to load entire data sets from many sources (both internal and external) including SaaS applications (API’s), databases, IoT devices, streaming queues (Kafka) and more. These data sets can be extremely large, containing thousands of fields and millions of rows of data, and they can often require significant transformation before loading into an analytic destination like a Data Warehouse or BI/ML platform. The data is loaded as SQL ready tables so any analyst can slice and dice it and create reports, insights and predictions. By contrast, this data might only be updated once per day, or maybe once per hour, but not “real time” as new data comes in.

Which Do You Need?

In theory solutions built for both the automation and analytics “integration” use case can be used for either, but in practice the technologies, approaches, architecture and even costs won’t support interchangeability.  Trying to load 10 terabytes of data using an automation solution will be slow, very expensive and probably won’t work since things like CDC (Delta loads), large scale schema transformation, and transfer speed will be unacceptable. And the same is true for the reverse, using a solution built for creating large scale analytic pipelines to process frequent, small data business automations will be inefficient.

So you can start to see that while all these solutions talk about “integration” and data movement, they are focused on very different use cases, and in most cases different users.  So as you begin any new data integration project the important first step is to determine if this is focused on automating a business process, or building out an analytic pipeline that will be used for a variety of different business cases and insights. Another way to determine the focus is asking if you plan to use SQL based tools on the data once it’s moved? In the automation use case the answer is usually no, it’s just a matter of moving the data from one business system to another. For the analytic use case the answer is always yes, the data ends up in a Data warehouse, database, or BI/ML platform and is queried and visualized in some way.


Ultimately any data integration comes down to what business goal does it support?  If the answer is automating business processes between applications then you need a data automation solution. Most popular solutions for this are SaaS based solutions. If your goal is building out a data warehouse or data lake containing data from a variety of sources for purposes of business analytics or machine learning then you need Analytic integration.

Each solution and approach requires different technologies and pricing models to achieve a consistent, cost effective solution.


Ready to Start?



A couple of years ago, Forbes began to notice a trend, commenting that “customer/social analysis is considered the second most important big data analytics use case…” With the rise and now dominance of social platforms like Facebook, Twitter, TikTok, LinkedIn, and others, it’s not h

Read More
Walnut St. Labs

Precog lets us prototype analytics projects quickly — building marketing dashboards based on data from a variety of sources — without needing a data engineer or developer — we create new data sources in a few hours to sources like Brightlocal, a popular local SEO SaaS solution, and h

Read More
Chris Dima - CEO

We welcome Precog to the Alteryx technology partner ecosystem as a partner extending the capabilities of our platform, further simplifying analytics for our customers.

Hakan Soderbom - Director of Technology Alliances

Enterprises struggle to understand and trust the data sources powering their business analyses,” said Jon Loyens, co-founder and chief product officer at data.world. “Adding ways to integrate sources to our catalog introduces more flexibility to our users, increasing their efficiency a

Read More
Jon Loyens - Co-Founder and CPO

We recognized a need in our customer base to perform advanced analytics on SAP data sets — we performed an extensive evaluation of Precog and chose it as a strategic solution for our go to market needs based on its performance and given their strong strategic relationship with SAP.

Alfredo Poncio - CEO

Precog changed the game for us — instead of grueling data integration work, Precog offers a ‘connect and go’ experience — this allows us to reallocate resources to our product and our customers.

Sam Darawish - CEO

Precog is the vital tool in our ability to pull data from a variety of business sources quickly and cleanly. Our internal MongoDB backend, as well as other cloud services like Hubspot, were a constant challenge to the business teams desire for reporting data prior to using Precog. With the

Read More
Josh Wilsie - VP

Precog lets us quickly build complicated dashboards and BI queries without being constrained by our MongoDB schema. The Precog team provided expert support in recommending how to fit Precog into our BI system and helped every step of the way.

Fred Cook - Co-Founder and CTO
Gold Town Games AB

Given our experience, we approached Precog with skepticism, but to our surprise Precog lives up its word; and now we have a working analytic environment in Google BigQuery.

Patrik Berggren

Precog fills a huge gap in the business analytics arena by dramatically simplifying the movement of data. Anyone serious about enabling business analysts and data scientists via self-service data should consider this product for their toolkit.