JSON to Insights: Analysing Non-tabular Healthcare Data
By Jeff Carr
December 9, 2019

Introduction

Just under a year ago Keshav Murthy, Vice President at CouchBase R&D, published this article detailing how to derive insights from a JSON dataset from CDC.gov.

11 months is a long time in the data world and a lot has changed between then and now. Snowflake has grown tremendously and Precog was released, setting a new standard for scalable self-service JSON, MongoDB, data lake and Web API connectivity.

This article follows in Keshav’s footsteps, showing how to derive insights from JSON healthcare data using Snowflake and Precog.

Snowflake is an incredibly powerful and cost effective data warehouse. However, working with non-tabular data using Snowflake alone is an involved task that is time consuming, prone to error and not suited for self service.

Precog empowers even non-technical users to easily browse and curate tables from tabular and non-tabular data. They can then load these tables directly into software like Power BI and Tableau, databases such as Postgres and warehouses such as Snowflake.

Precog is a universal self-service connector for non-tabular data such as MongoDB, JSON data in S3, data lakes, Azure Blob Storage, SaaS applications and web APIs such as Airship, Benchling, WooCommerce and HubSpot. Precog reduces development, integration, testing and compute times whilst improving accuracy. It empowers analysts with self-service access to tabular and non-tabular data to fulfil their own data requests.

Data

JSON data comes in many forms. Some of these are essentially tables and others are non-tabular data. Keshav Murthy’s article uses the Causes of death dataset which is small and tabular. It is also available from CDC.gov as CSV.

In lieu of this, dataset we’ll use the similarly grim adverse events related to drugs indicated for osteoarthritis dataset from the FDA. This data is non-tabular and more representative of data from web apis, internet of things devices, data lakes and applications. It is paginated and over a gigabyte in size. This dataset is made up of adverse reaction reports which exhibit differences in structure. Each report includes a list of medicines and a list of the adverse reactions.

Important disclaimer

This article uses data from openFDA which issues the following disclaimer.

Do not rely on openFDA to make decisions regarding medical care. Always speak to your health provider about the risks and benefits of FDA-regulated products.

As such you should not rely on this document to make decisions regarding medical care. Always speak to your health provider about the risks and benefits of products. Neither myself nor Precog are health providers or researchers.

Methodology

The first step to getting insight from non-tabular data, such as this dataset, is to tabulate it. This allows us to make charts, train and predict using machine learning models and visualise the data using software such as Tableau, Looker, ThoughtSpot and Power BI.

In this article, we’ll explore three approaches to tabulating this JSON healthcare data; using Precog alone, using Precog with Snowflake and using Snowflake alone. These approaches can be applied to any non-tabular data such as apps backed by MongoDB, JSON data in S3, Azure Blob Storage, SaaS applications and web APIs such as Airship, Benchling and HubSpot.

Once the data is tabulated it can be easily:

  • Analysed with applications such as Tableau, Power BI, Looker and ThoughtSpot
  • Used to train and predict with machine learning models using software such as TensorFlow and DataRobot
  • Manipulated using standard SQL and loaded into databases or warehouses such as Postgres, Snowflake and Redshift

How to follow along

Precog Desktop and Snowflake are available as a free trials. PowerBI is available free as well.

If you get Precog Desktop and Snowflake you can follow along with the “Tabulating non-tabular data with Precog”, “Tabulating non-tabular data without Precog” and “Analysing the tabulated data with SQL” sections.

If you get Precog Desktop and Power BI you can follow along with the “Tabulating non-tabular data with Precog” and “Analysing the tabulated data with Power BI” sections.

If you prefer a server or private cloud based solution, Docker, AMI, CloudFormation, JAR and Google Cloud based trials of Precog are also available. Please contact [email protected].

Next page: Tabulating non-tabular data with Precog

NEWS & BLOG

Ready to Start?

FROM OUR CUSTOMERS

GiddyUp

Precog delivers on the dream of simple data architecture that is roaring across the world. Precog solves all these problems, keeping your warehouse up to date with all the data you need and making the ELT dream a reality.

Venkatarama Cherukupalli
Walnut St. Labs

Precog lets us prototype analytics projects quickly — building marketing dashboards based on data from a variety of sources — without needing a data engineer or developer — we create new data sources in a few hours to sources like Brightlocal, a popular local SEO SaaS solution, and h... Read More

Chris Dima - CEO
Alteryx

We welcome Precog to the Alteryx technology partner ecosystem as a partner extending the capabilities of our platform, further simplifying analytics for our customers.

Hakan Soderbom - Director of Technology Alliances
Data.World

Enterprises struggle to understand and trust the data sources powering their business analyses,” said Jon Loyens, co-founder and chief product officer at data.world. “Adding ways to integrate sources to our catalog introduces more flexibility to our users, increasing their efficiency a... Read More

Jon Loyens - Co-Founder and CPO
SouthEnd

We recognized a need in our customer base to perform advanced analytics on SAP data sets — we performed an extensive evaluation of Precog and chose it as a strategic solution for our go to market needs based on its performance and given their strong strategic relationship with SAP.

Alfredo Poncio - CEO
SouthEnd
Everflow

Precog changed the game for us — instead of grueling data integration work, Precog offers a ‘connect and go’ experience — this allows us to reallocate resources to our product and our customers.

Sam Darawish - CEO
SendaRide

Precog is the vital tool in our ability to pull data from a variety of business sources quickly and cleanly. Our internal MongoDB backend, as well as other cloud services like Hubspot, were a constant challenge to the business teams desire for reporting data prior to using Precog. With the... Read More

Josh Wilsie - VP