You’ve probably been hearing about “big data” for a while now the term has been around for years, and in the meantime, data has only gotten bigger. A whole industry has sprung up around it, from collection to storage to analytics, and data integration engineers are part of that puzzle. But what if your company doesn’t have one?
Big Data Is Getting Bigger
Companies have been tracking data for a long time. What they sell and when, customer demographics, busiest times of day and year, website traffic and sources, social media engagements, and so on are all data that can be turned into analytics-ready tables and used to draw conclusions and make predictions about your users.
But recently, the world of consumer data has gotten a lot more complicated. SaaS companies have exploded in popularity and brought with them huge amounts of complicated usage data. Dropbox, for example, isn’t just tracking how many times users log in. It’s tracking what kinds of files they upload, when they do it, how long it takes, which ISPs they use, how many folders they use, and dozens of other data points. Something as simple as saving a Word document can generate a huge amount of data.
Mobile app usage has also taken off, bringing with it its own set of complicated data. App makers want to know who’s using their app, where and when, what they’re using it for, how long they’re in the app, which features they use the most, and so on. Add in a social aspect, like with multiplayer gaming, and there’s a whole other set of data about how users interact to accompany it.
Finally, the Internet of Things has officially arrived. The concept has been around for almost 20 years, but it took a while to be realized. The number of IoT devices connected to the internet not phones and computers but smart appliances, voice assistants, wi-fi enabled plugs, lights, switches, and cameras, and the like has increased more than tenfold in the last ten years. Some projections say that there will be 50 billion connected devices by 2020.
And every single one of them is generating data on a second-to-second basis. A smart bulb is registering every time it’s turned on and off, which color the user selects, whether they used their phone, smart watch, or voice assistant to do it, which room it’s assigned to, and more.
The Complex Data Problem
The Role Of A Data Integration Engineer
This is where a data integration engineer comes in. There are lots of tools to perform traditional ETL (extract, transform, and load) or ELT tasks, but they all fall short when it comes to complex JSON data.
Usually, an engineer is required to bridge that gap. The engineer will manually write code, often in Python, to deal with the complex and constantly shifting nature of the JSON data. This is a slow, tedious process. It’s also costly engineers are highly trained in a very technical skill, and most small companies can’t afford to hire them on full-time.
Even big companies that can afford a whole floor full of data integration engineers run into another problem that custom code isn’t flexible. If marketers or CIOs make a request for a certain set of analytics-ready data over a certain period of time, it can take weeks or months to extract that data from the raw JSON. If they have follow-up requests or want to see a different time span, the clock starts again. So what’s a small business supposed to do?
Option 1: Contract Out The Work
There are a lot of companies that can be hired on a contract basis to parse data, and a lot of them claim to be able to handle JSON data. When pressed, though, they all admit that they’ll need to write custom code to handle the sort of complex JSON data that’s common in today’s world. You can hire these companies to parse your data for much cheaper than hiring your own engineer, but you’ll still run into the same issues of time and inflexibility.
Option 2: Precog Precog
For decades, all data parsing tools have been built on a type of mathematics called, relational algebra, but RA falls short when it comes to the complex nested data points of JSON. Precog created a novel but powerful underlying algebra called multi-dimensional relational algebra, which allows us to convert any data, regardless of complexity, into an analytics-ready tabular format in seconds.
More importantly, Precog’s interface is no more complicated than a filesystem. Anyone, from a data engineer to a c-suite exec to a marketer or analyst, can extract the data they need and run the analytics they need, complete with follow-up queries. Precog is the only tool in the world built on this framework. In short, the days of waiting on a data integration engineer to deliver curated data are over.