JSON to Insights: Analysing the Tabulated Data with SQL
By Chris Dima
December 9, 2019

Analysing the tabulated data with SQL

Sometimes we would prefer to use normal SQL instead of software like Power BI. Now that the data has been tabulated, we can analyse it using straightforward SQL.

Using Precog we can push the tabulated data into SQL databases and warehouses such as Snowflake and Postgres.

The FDA states that

When a report lists multiple drugs and multiple reactions, there is no way to conclude from the data therein that a given drug is responsible for a given reaction.

To reflect this, let’s add a column which denotes whether each reaction is conclusive or not. We will also add a column showing drug characterization in english, rather than as a number.

create table drug_reactions as 
select * from adverse_drug_reactions
left join characterization on drug_characterization_code = characterization_code
right join (
  select 
    report_id as conclusive_report_id,
    count(distinct drug) = 1 or count(distinct reaction) = 1 as conclusive
  from adverse_drug_reactions 
  group by report_id 
) on report_id = conclusive_report_id;

Here we count the number of reports that do and don’t meet this standard.

select count(distinct report_id), conclusive 
from drug_reactions
group by conclusive;
COUNT(DISTINCT REPORT_ID) CONCLUSIVE
12760 False
10457 True

As shown here, the majority of reports are not conclusive. If reporting was changed such that reactions were listed per drug, then all new reports would be. Of course, such a change is quite a large endeavour.

The FDA provides access to the following chart but the usefulness of it is limited. This is due to it including non-conclusive reports, drugs which weren’t indicated for osteoarthritis and drugs which weren’t suspected to have caused the reported adverse reactions.

OpenFDA Chart of drug classes in adverse event reports that contain drugs indicated for osteoarthritis

Instead we’ll do our own analysis with the following query.

select drug_class, count(distinct report_id) as frequency 
from drug_reactions
where drug_indication = 'OSTEOARTHRITIS' and characterization = 'Suspect' and conclusive
group by drug_class
order by frequency desc
limit 10;
DRUG_CLASS FREQUENCY
NULL 9136
Nonsteroidal Anti-inflammatory Drug [EPC] 68
Platelet Aggregation Inhibitor [EPC] 1

Our results are much simpler. It is clear that the dataset does not include the pharmaceutical class of the majority of the suspected drugs indicated for osteoarthritis. Those drugs whose pharmaceutical classes are included are primarily NSAIDs.

Given the high number of missing pharmaceutical classes, we will need to drill down into the drugs themselves.

select 
  regexp_replace(upper(drug), '\\W+', '') as sanitized_drug,
  count(distinct report_id) as frequency 
from drug_reactions
where drug_indication = 'OSTEOARTHRITIS' and characterization = 'Suspect' and conclusive
group by sanitized_drug
order by frequency desc 
limit 10;
SANITIZED_DRUG FREQUENCY
CELEBREX 1236
VIOXX 973
ENBREL 916
PROLIA 626
IBUPROFEN 471
CELECOXIB 346
BEXTRA 333
NAPROXEN 266
VOLTAREN 254
HUMIRA 179

The position of drugs in these tables does not mean they are more or less dangerous than each other, as there are many factors that affect reporting of adverse events. Please refer to the important disclaimer.

This table shows that, as expected, the majority of reported adverse reactions involving drugs indicated for osteoarthritis are NSAIDs. NSAIDs are often prescribed for osteoarthritis. Also included in this list are a TNF inhibitor, a monoclonal antibody and a gabapentinoid.

Other drugs which are often prescribed for osteoarthritis include acetaminophen and opioid analgesics, however, these don’t feature prominently in this table. This may be the result of reporting practices.

Vioxx is an NSAID which was approved by the FDA in 1999 and voluntarily removed from the US market in 2004 due to concerns about risks of heart attack and stroke. Despite only being on the market for a short period, it is prominently featured on this list. Bextra (also known as Meloxicam) is an NSAID which was approved by the FDA in 2001 and removed from the US market by the FDA in 2005. The other drugs in this list are approved at the time of writing.

Celebrex (celecoxib), Voltaren and Ibuprofen are NSAIDs. Celebrex is reported as the suspected cause of adverse reactions more than twice as often than Ibuprofen. Some organisations, such as Best Practice Advocacy Centre New Zealand, state that “A significant benefit of celecoxib is that it is associated with less risk of gastrointestinal bleeding compared to non-selective NSAIDs”. Let us see if the BPACNZ statement is reflected in this dataset.

select 
  regexp_replace(upper(drug), '\\W+') as sanitized_drug, 
  count(distinct report_id) as gi_bleed_frequency
from drug_reactions
where 
  drug_indication = 'OSTEOARTHRITIS' 
  and characterization = 'Suspect' 
  and conclusive
  and upper(reaction) like '%GASTROINTESTINAL HAEMORRHAGE%'
group by sanitized_drug
order by gi_bleed_frequency desc 
limit 10;
SANITIZED_DRUG GI_BLEED_FREQUENCY
IBUPROFEN 28
CELEBREX 27
VIOXX 24
NAPROXEN 20
MOBIC 15
MELOXICAM 8
DICLOFENAC 7
NAPROXENSODIUM 5
CELECOXIB 4
ALEVE 4

The frequency of reported gastrointentinal bleeding as a suspected reaction to Celebrex and Ibuprofen are similar. This is not proportional to the overall number of reports for each drug, so let us inspect the top reported reactions for Celebrex and Ibuprofen.

Celebrex:

select 
  upper(reaction) as sanitized_reaction,
  count(distinct report_id) as frequency
from drug_reactions
where 
  drug_indication = 'OSTEOARTHRITIS'
  and characterization = 'Suspect'
  and conclusive
  and (
    regexp_replace(upper(drug), '\\W+') = 'CELECOXIB' 
    or regexp_replace(upper(drug), '\\W+') = 'CELEBREX'
  )
group by sanitized_reaction
order by frequency desc 
limit 20;
SANITIZED_REACTION FREQUENCY
MYOCARDIAL INFARCTION 172
DRUG INEFFECTIVE 161
CEREBROVASCULAR ACCIDENT 89
PAIN 71
DYSPEPSIA 58
ABDOMINAL DISCOMFORT 44
DEATH 43
ATRIAL FIBRILLATION 36
NAUSEA 36
ABDOMINAL PAIN 34
CONSTIPATION 33
DEPRESSION 29
FLATULENCE 29
RASH 27
GASTROINTESTINAL HAEMORRHAGE 27

Ibuprofen:

select 
  upper(reaction) as sanitized_reaction,
  count(distinct report_id) as frequency
from drug_reactions
where 
  drug_indication = 'OSTEOARTHRITIS' 
  and characterization = 'Suspect'
  and conclusive
  and regexp_replace(upper(drug), '\\W+') = 'IBUPROFEN'
group by sanitized_reaction
order by frequency desc 
limit 5;
SANITIZED_REACTION FREQUENCY
ACUTE MYOCARDIAL INFARCTION 19
MYOCARDIAL INFARCTION 18
GASTROINTESTINAL HAEMORRHAGE 17
DEATH 15
GASTRIC ULCER 14

Gastrointestinal bleeding is the 15th most reported adverse event where Celebrex is suspected and the 3rd most reported adverse event where Ibuprofen is suspected. This seems to reflect BPACNZ’s statement, however, there are many possible causes for these results. These results should not be used to make decisions about medical care.

Let’s summarize our insights into adverse reaction reporting of drugs indicated for osteoarthritis.

  • Out of 23217 reports only 10457 could be used to conclude that a given drug is suspected as responsible for a given reaction.
  • Out of the 10457 resulting reports only 246 contained drug classification information.
  • There are more than twice as many adverse reaction reports where Celebrex is suspected as responsible than there are where Ibuprofen is suspected as responsible.
  • Gastrointestinal bleeding is the 15th most reported adverse event where Celebrex is suspected and the 3rd most reported adverse event where Ibuprofen is suspected.
  • Despite being on the market for a short time, Vioxx and Bextra are listed in the top 10 most reported drugs.

Research into the causes of these insights could lead to preferable patient outcomes and more valuable adverse event reporting.

In just a few minutes, we have connected to and tabulated the data with Precog, then used straightforward SQL to gain a number of insights into the data.

Previous page: Joining datasets with Precog

Next page: More details about Precog

NEWS & BLOG

Ready to Start?

FROM OUR CUSTOMERS

Localize

We chose to use Precog because they were the only company willing to handle our complex data connections. Precog was extremely helpful getting us set up and running smoothly. Since then it has been one of those tools that just works solidly and reliably which is one less thing our team nee... Read More

Derek Binkley - Engineering Manager
Cured

Precog is an important partner for Cured and a critical member of our data stack. The Precog platform has delivered data connectors to necessary data sources other vendors could not or would not, and in a very short timeframe. The product is intuitive, efficient, cost-effective, and doesn&... Read More

Ashmer Aslam - CEO Cured
Walnut St. Labs

Precog lets us prototype analytics projects quickly — building marketing dashboards based on data from a variety of sources — without needing a data engineer or developer — we create new data sources in a few hours to sources like Brightlocal, a popular local SEO SaaS solution, and h... Read More

Chris Dima - CEO
Alteryx

We welcome Precog to the Alteryx technology partner ecosystem as a partner extending the capabilities of our platform, further simplifying analytics for our customers.

Hakan Soderbom - Director of Technology Alliances
SouthEnd

We recognized a need in our customer base to perform advanced analytics on SAP data sets — we performed an extensive evaluation of Precog and chose it as a strategic solution for our go to market needs based on its performance and given their strong strategic relationship with SAP.

Alfredo Poncio - CEO
SouthEnd
SendaRide

Precog is the vital tool in our ability to pull data from a variety of business sources quickly and cleanly. Our internal MongoDB backend, as well as other cloud services like Hubspot, were a constant challenge to the business teams desire for reporting data prior to using Precog. With the... Read More

Josh Wilsie - VP