Data Science On JSON Using Precog Precog and RStudio or Jupyter Notebooks
By Mike Corbisiero
March 18, 2019

Obstacles in doing/learning Data Science

Whether you’re a professional data scientist or studying to become a data scientist, you’ll likely need to work with a JSON dataset. JSON isn’t easy to work with. It’s not tabular, and you can’t just push it to a SQL database — at least not without a “bit” of work.

Tabularizing JSON data usually requires writing python scripts. But given multiple JSON datasets there’s a high chance that each one is very different, so your python code is likely not re-usable.

As a professional data scientist, tabularizing your JSON is time you have to factor into your report delivery. You may have the option to use your company’s resources to deliver a tabular form of your complicated JSON. Either way, somewhere along your workflow, someone is spending time and money writing throw away python code to tabularize JSON.

If you are learning data science then you’ve likely bumped into blogs such as?this one. The article spends approximately 80% of the content showing how to explore JSON data using Linux commands and python code. It isn’t until the last few paragraphs where the article dives into answering questions about actually doing something with the data.

What if you didn’t have to spend your time writing one-off python scripts to tabularize your JSON data? You might deliver insight more quickly, or more frequently, or maybe your company would save money on ETL resources.

Imagine if data science students didn’t have to spend time converting JSON data into a table. Instead, they could spend time learning to ask and answer questions about their data.

Imagine if students of Berkeley’s Data 8 course spent zero time tabularizing complicated JSON datasets. Rather, they could spend time applying their skills in a class project involving complicated JSON datasets not previously considered due to the overhead of tabularizing.

In this short blog I’ll show you how to quickly tabularize a 700MB NBA JSON dataset I found online, nbagames.json, using Precog.

I’m an engineer at Precog and I’m very proud of the engineering feat my colleagues have created. I’m here show you how you can benefit from our creation, Precog.

Access your JSON data with ease

Precog Precog can read your JSON data from multiple sources. In this tutorial we’ll be reading a JSON file from an S3 bucket. Download the nbagames.json and upload it to an S3 bucket. There are plenty of instructions on how to do this via a quick google search.

Precog Precog can readily read your data from an API, local file, Azure, etc. Check out [our instructions] (https://precog.com/user-guides/) for these scenarios.

Tabularize your JSON data using Precog

In a 2-minute video, I’m going to show you how to create a table from the NBS JSON data I mentioned above. First, I will connect to my S3 bucket containing the nbagames.json data. Then, I will point Precog to the nbagames.json file and create some columns.

http://cl.ly/6bf0cdcbfc9c

And that’s it! I have just created a table from the 700MB JSON file without needing to explore my data using Linux commands and especially without having to write any python.

Importing the Precog table

Using RStudio

The short video below demonstrates how quickly one can import the table created from the NBA JSON data in Precog using RStudio. Like many of the other tools in this realm, RStudio can read data from a URL, display the columns we selected, and produce a simple bar plot.

https://cl.ly/60ce8421f782

Using Jupyter Notebook

Of course, if you like doing data science using python you certainly can! My point is that with Precog there’s no need to write python to tabularize your JSON. Below, I’ve included an example on how to get started with a Jupyter Notebook, pandas, and python.

Conclusion

Yes! It?s actually this easy to get started analyzing JSON data. Just connect Precog Precog to your datasource and select your columns, then import into your favorite data science tool.

Precog has other amazing uses. Would you like to stream data into AWS RedShift, tabularize data stuck in MongoDB, or front your data API with Precog? Let us know, we are happy to help.

To get started with Precog get in touch with our sales people for more information.

Precog Precog is also available on [AWS Marketplace] (https://aws.amazon.com/marketplace/pp/B07N4B9N7Z).

Get started analyzing your complex JSON, skip the python scripts with Precog!

NEWS & BLOG

Ready to Start?

FROM OUR CUSTOMERS

Localize

We chose to use Precog because they were the only company willing to handle our complex data connections. Precog was extremely helpful getting us set up and running smoothly. Since then it has been one of those tools that just works solidly and reliably which is one less thing our team nee... Read More

Derek Binkley - Engineering Manager
Cured

Precog is an important partner for Cured and a critical member of our data stack. The Precog platform has delivered data connectors to necessary data sources other vendors could not or would not, and in a very short timeframe. The product is intuitive, efficient, cost-effective, and doesn&... Read More

Ashmer Aslam - CEO Cured
Walnut St. Labs

Precog lets us prototype analytics projects quickly — building marketing dashboards based on data from a variety of sources — without needing a data engineer or developer — we create new data sources in a few hours to sources like Brightlocal, a popular local SEO SaaS solution, and h... Read More

Chris Dima - CEO
Alteryx

We welcome Precog to the Alteryx technology partner ecosystem as a partner extending the capabilities of our platform, further simplifying analytics for our customers.

Hakan Soderbom - Director of Technology Alliances
SouthEnd

We recognized a need in our customer base to perform advanced analytics on SAP data sets — we performed an extensive evaluation of Precog and chose it as a strategic solution for our go to market needs based on its performance and given their strong strategic relationship with SAP.

Alfredo Poncio - CEO
SouthEnd
SendaRide

Precog is the vital tool in our ability to pull data from a variety of business sources quickly and cleanly. Our internal MongoDB backend, as well as other cloud services like Hubspot, were a constant challenge to the business teams desire for reporting data prior to using Precog. With the... Read More

Josh Wilsie - VP