JSON to Insights: Analysing the Tabulated Data with SQL
By Jeff Carr
December 9, 2019

Analysing the tabulated data with SQL

Sometimes we would prefer to use normal SQL instead of software like Power BI. Now that the data has been tabulated, we can analyse it using straightforward SQL.

Using Precog we can push the tabulated data into SQL databases and warehouses such as Snowflake and Postgres.

The FDA states that

When a report lists multiple drugs and multiple reactions, there is no way to conclude from the data therein that a given drug is responsible for a given reaction.

To reflect this, let’s add a column which denotes whether each reaction is conclusive or not. We will also add a column showing drug characterization in english, rather than as a number.

create table drug_reactions as 
select * from adverse_drug_reactions
left join characterization on drug_characterization_code = characterization_code
right join (
  select 
    report_id as conclusive_report_id,
    count(distinct drug) = 1 or count(distinct reaction) = 1 as conclusive
  from adverse_drug_reactions 
  group by report_id 
) on report_id = conclusive_report_id;

Here we count the number of reports that do and don’t meet this standard.

select count(distinct report_id), conclusive 
from drug_reactions
group by conclusive;
COUNT(DISTINCT REPORT_ID)CONCLUSIVE
12760False
10457True

As shown here, the majority of reports are not conclusive. If reporting was changed such that reactions were listed per drug, then all new reports would be. Of course, such a change is quite a large endeavour.

The FDA provides access to the following chart but the usefulness of it is limited. This is due to it including non-conclusive reports, drugs which weren’t indicated for osteoarthritis and drugs which weren’t suspected to have caused the reported adverse reactions.

OpenFDA Chart of drug classes in adverse event reports that contain drugs indicated for osteoarthritis

Instead we’ll do our own analysis with the following query.

select drug_class, count(distinct report_id) as frequency 
from drug_reactions
where drug_indication = 'OSTEOARTHRITIS' and characterization = 'Suspect' and conclusive
group by drug_class
order by frequency desc
limit 10;
DRUG_CLASSFREQUENCY
NULL9136
Nonsteroidal Anti-inflammatory Drug [EPC]68
Platelet Aggregation Inhibitor [EPC]1

Our results are much simpler. It is clear that the dataset does not include the pharmaceutical class of the majority of the suspected drugs indicated for osteoarthritis. Those drugs whose pharmaceutical classes are included are primarily NSAIDs.

Given the high number of missing pharmaceutical classes, we will need to drill down into the drugs themselves.

select 
  regexp_replace(upper(drug), '\\W+', '') as sanitized_drug,
  count(distinct report_id) as frequency 
from drug_reactions
where drug_indication = 'OSTEOARTHRITIS' and characterization = 'Suspect' and conclusive
group by sanitized_drug
order by frequency desc 
limit 10;
SANITIZED_DRUGFREQUENCY
CELEBREX1236
VIOXX973
ENBREL916
PROLIA626
IBUPROFEN471
CELECOXIB346
BEXTRA333
NAPROXEN266
VOLTAREN254
HUMIRA179

The position of drugs in these tables does not mean they are more or less dangerous than each other, as there are many factors that affect reporting of adverse events. Please refer to the important disclaimer.

This table shows that, as expected, the majority of reported adverse reactions involving drugs indicated for osteoarthritis are NSAIDs. NSAIDs are often prescribed for osteoarthritis. Also included in this list are a TNF inhibitor, a monoclonal antibody and a gabapentinoid.

Other drugs which are often prescribed for osteoarthritis include acetaminophen and opioid analgesics, however, these don’t feature prominently in this table. This may be the result of reporting practices.

Vioxx is an NSAID which was approved by the FDA in 1999 and voluntarily removed from the US market in 2004 due to concerns about risks of heart attack and stroke. Despite only being on the market for a short period, it is prominently featured on this list. Bextra (also known as Meloxicam) is an NSAID which was approved by the FDA in 2001 and removed from the US market by the FDA in 2005. The other drugs in this list are approved at the time of writing.

Celebrex (celecoxib), Voltaren and Ibuprofen are NSAIDs. Celebrex is reported as the suspected cause of adverse reactions more than twice as often than Ibuprofen. Some organisations, such as Best Practice Advocacy Centre New Zealand, state that “A significant benefit of celecoxib is that it is associated with less risk of gastrointestinal bleeding compared to non-selective NSAIDs”. Let us see if the BPACNZ statement is reflected in this dataset.

select 
  regexp_replace(upper(drug), '\\W+') as sanitized_drug, 
  count(distinct report_id) as gi_bleed_frequency
from drug_reactions
where 
  drug_indication = 'OSTEOARTHRITIS' 
  and characterization = 'Suspect' 
  and conclusive
  and upper(reaction) like '%GASTROINTESTINAL HAEMORRHAGE%'
group by sanitized_drug
order by gi_bleed_frequency desc 
limit 10;
SANITIZED_DRUGGI_BLEED_FREQUENCY
IBUPROFEN28
CELEBREX27
VIOXX24
NAPROXEN20
MOBIC15
MELOXICAM8
DICLOFENAC7
NAPROXENSODIUM5
CELECOXIB4
ALEVE4

The frequency of reported gastrointentinal bleeding as a suspected reaction to Celebrex and Ibuprofen are similar. This is not proportional to the overall number of reports for each drug, so let us inspect the top reported reactions for Celebrex and Ibuprofen.

Celebrex:

select 
  upper(reaction) as sanitized_reaction,
  count(distinct report_id) as frequency
from drug_reactions
where 
  drug_indication = 'OSTEOARTHRITIS'
  and characterization = 'Suspect'
  and conclusive
  and (
    regexp_replace(upper(drug), '\\W+') = 'CELECOXIB' 
    or regexp_replace(upper(drug), '\\W+') = 'CELEBREX'
  )
group by sanitized_reaction
order by frequency desc 
limit 20;
SANITIZED_REACTIONFREQUENCY
MYOCARDIAL INFARCTION172
DRUG INEFFECTIVE161
CEREBROVASCULAR ACCIDENT89
PAIN71
DYSPEPSIA58
ABDOMINAL DISCOMFORT44
DEATH43
ATRIAL FIBRILLATION36
NAUSEA36
ABDOMINAL PAIN34
CONSTIPATION33
DEPRESSION29
FLATULENCE29
RASH27
GASTROINTESTINAL HAEMORRHAGE27

Ibuprofen:

select 
  upper(reaction) as sanitized_reaction,
  count(distinct report_id) as frequency
from drug_reactions
where 
  drug_indication = 'OSTEOARTHRITIS' 
  and characterization = 'Suspect'
  and conclusive
  and regexp_replace(upper(drug), '\\W+') = 'IBUPROFEN'
group by sanitized_reaction
order by frequency desc 
limit 5;
SANITIZED_REACTIONFREQUENCY
ACUTE MYOCARDIAL INFARCTION19
MYOCARDIAL INFARCTION18
GASTROINTESTINAL HAEMORRHAGE17
DEATH15
GASTRIC ULCER14

Gastrointestinal bleeding is the 15th most reported adverse event where Celebrex is suspected and the 3rd most reported adverse event where Ibuprofen is suspected. This seems to reflect BPACNZ’s statement, however, there are many possible causes for these results. These results should not be used to make decisions about medical care.

Let’s summarize our insights into adverse reaction reporting of drugs indicated for osteoarthritis.

  • Out of 23217 reports only 10457 could be used to conclude that a given drug is suspected as responsible for a given reaction.
  • Out of the 10457 resulting reports only 246 contained drug classification information.
  • There are more than twice as many adverse reaction reports where Celebrex is suspected as responsible than there are where Ibuprofen is suspected as responsible.
  • Gastrointestinal bleeding is the 15th most reported adverse event where Celebrex is suspected and the 3rd most reported adverse event where Ibuprofen is suspected.
  • Despite being on the market for a short time, Vioxx and Bextra are listed in the top 10 most reported drugs.

Research into the causes of these insights could lead to preferable patient outcomes and more valuable adverse event reporting.

In just a few minutes, we have connected to and tabulated the data with Precog, then used straightforward SQL to gain a number of insights into the data.

Previous page: Joining datasets with Precog

Next page: More details about Precog

NEWS & BLOG

New Rules for Modern Data

New Rules for Modern Data

The new world of data has new rules that might seem scary at first, but it doesn’t have to be that way. The trick is to use a tool — like Precog — explicitly designed for the world where data is messy, experts are expensive, and agility is vital.

read more

Analytics-ready data from any source in 5 minutes.