The Final Frontier? The Missing Link In ETL/ELT
By Mike Corbisiero
January 23, 2019

The ETL (extract, transform, load) industry has been around for decades. Its primary purpose is to move data from source locations to data warehouses so analytics and data science teams can access the data in one place and perform analysis across a range of critical data sources.

More recently, the rise of low cost cloud object storage like AWS S3, Azure Blob storage and others has morphed this process into ELT (Extract, load, transform). This process pushes the data transformations further down the pipeline, which somewhat streamlines the problems and also lowers overall costs.

ETL/ELT tools have flourished in the last decade as the volume and variety of data sources enterprises need to handle has exploded, but there’s one obvious gap in the solution space complex JSON data.

Coincidentally, complex JSON data is also one of the most popular and rapidly growing kinds of data we see in the market. Virtually ALL ETL/ELT vendors claim to support JSON, but the reality is far different. Unlike traditional relational or tabular data, JSON is not a one-size-fits-all data model it can range from very simple to unbelievably complex depending on the whims of the developer building the applications.

When an ETL/ELT vendor says they support JSON, what they mean is that they support VERY simple flat JSON. As soon as the data gets too complex, they falter, reverting back to their old familiar approach: start writing code. Some vendors don’t even try to avoid code they actually build a coding engine in their platform to handle complex JSON.

The harsh reality is that complex JSON is the last ETL/ELT problem to be solved without expecting users to write code which, by definition, means that the only access to this data is through a data integration engineer. Business analysts, most data scientists, and other non-engineers are locked out.

Ask around. We’ve spoken to many vendors in the ETL/ELT space, and all of them said that JSON is no problem. But as soon as we handed them a couple of slightly complex JSON data sets (multiple levels of nesting, variable types), they quickly retreated to the familiar refrain. For this we would need to write Python (or Scala or some other code). Tools like AWS Glue attempt to solve the problem by auto-generating Python code, but that just means your engineers have to de-bug a bunch of non-functional code. Don’t believe us? Give it a try.

Solving The Missing Link

JSON data is some of the most common data created today. Virtually all SaaS applications, Mobile applications, IoT, and many others use JSON as their default data model. JSON data is unlike traditional relational data in many ways, including non-fixed variable schema, variable data types, and the ability to have “nested” data structures. This presents a major hurdle when companies need to access this data for analytic purposes.

Analytic tools expect the data to be in a fixed tabular form essentially a spreadsheet of rows and columns. For this to be the case, the data needs to be transformed from the JSON model to the tabular model.

As previously noted, this transformation can take place as ETL or the updated version, ELT. Either version can be very complex. Traditional ETL/ELT solutions like Informatica, Talend, Alooma, or FiveTran cannot handle complex JSON. They all claim they can handle JSON data, but what they really mean is they can do some very simple things, beyond which they require engineers to write complicated Python or Scala code to solve the rest.

Most companies don’t even bother trying to use ETL/ELT software and simply use expensive data integration engineers write custom code to transform the data. This approach is slow, complicated, and completely inaccessible to anyone who isn?t an engineer.

Enter Precog Precog. Our solution lets ANY user visually prepare virtual analytic ready tables directly on the JSON, regardless of complexity. There’s no coding and no waiting on data integration engineers. Users can curate custom data sets in minutes and then iterate over them at any time as their data needs change. These tables can be streamed into most popular data warehouses, including Redshift, Snowflake and Google Big Query, or to any other location you choose.

Benefits Of Using Precog Vs. Writing Custom Code

  • Zero-coding solution
  • Any user, not just engineers, can make analytics-ready tables out of complex JSON data
  • Works with any data (JSON, CSV, XML) regardless of complexity
  • More agile and cost-effective than custom coding
  • High-performance streaming engine
  • Analytics data is always up to date, no more batch ETL processes

Who Can Benefit

Almost any company, large or small, can have JSON data generated by SaaS and mobile apps or from web APIs, but companies most likely dealing with these kinds of issues include:

  • SaaS companies (large or small)
  • Mobile application companies
  • IoT companies
  • Healthcare
  • Industrial including utilities and automotive
  • Financial services

Who Can Use Precog?

  • Data integration engineers? Precog makes their job much easier, requires less coding, and allows them to respond to user needs faster
  • Data architects? faster responses to user requests and a much simpler interface
  • Business analysts? Precog gives them the ability to truly self-serve without submitting requests through data integration engineers
  • Data scientists? Offers real self-service against complex JSON data, something no other utility can truly offer

Implementing Precog Is Fast And Easy

Currently, Precog Precog is provided as a docker image, and is also available in the AWS marketplace. Since it is a docker, it can run in any client infrastructure, be it on-prem, public or private cloud. Since Precog is not a SaaS solution, we don’t hold any user data we also don’t carry compliance or regulatory burdens like GDPR. Users can install the solution within their existing infrastructure and use it as they need securely.

Precog supports JSON data stored in AWS S3, Azure Blob Storage, Wasabi, and MongoDB all very popular places to store JSON data. However, we can add a new connector to virtually ANY new data source easily within a few weeks using our advanced Lightweight Connector Technology (LWC).

Solving The Final Frontier Of ETL/ELT

Precog Precog is the world’s first solution designed from the ground up to transform any JSON data using a simple and powerful data browser approach. Our advanced mathematical foundation allows us to understand and shape semi-structured data in ways existing tools cannot. Our solution fills the gap in existing ETL/ELT and Data prep tools the market has been demanding.

You can learn more about Precog here or see it in action in our demo video.

NEWS & BLOG

Ready to Start?

FROM OUR CUSTOMERS

Localize

We chose to use Precog because they were the only company willing to handle our complex data connections. Precog was extremely helpful getting us set up and running smoothly. Since then it has been one of those tools that just works solidly and reliably which is one less thing our team nee... Read More

Derek Binkley - Engineering Manager
Cured

Precog is an important partner for Cured and a critical member of our data stack. The Precog platform has delivered data connectors to necessary data sources other vendors could not or would not, and in a very short timeframe. The product is intuitive, efficient, cost-effective, and doesn&... Read More

Ashmer Aslam - CEO Cured
Walnut St. Labs

Precog lets us prototype analytics projects quickly — building marketing dashboards based on data from a variety of sources — without needing a data engineer or developer — we create new data sources in a few hours to sources like Brightlocal, a popular local SEO SaaS solution, and h... Read More

Chris Dima - CEO
Alteryx

We welcome Precog to the Alteryx technology partner ecosystem as a partner extending the capabilities of our platform, further simplifying analytics for our customers.

Hakan Soderbom - Director of Technology Alliances
SouthEnd

We recognized a need in our customer base to perform advanced analytics on SAP data sets — we performed an extensive evaluation of Precog and chose it as a strategic solution for our go to market needs based on its performance and given their strong strategic relationship with SAP.

Alfredo Poncio - CEO
SouthEnd
SendaRide

Precog is the vital tool in our ability to pull data from a variety of business sources quickly and cleanly. Our internal MongoDB backend, as well as other cloud services like Hubspot, were a constant challenge to the business teams desire for reporting data prior to using Precog. With the... Read More

Josh Wilsie - VP