rdrugtrajectory: An R Package for the Analysis of Drug Prescriptions in Electronic Health Care Records


JSS Journal of Statistical Software
MMMMMM YYYY, Volume VV, Issue II. Reddoi: 10.18637/jss.v000.i00

rdrugtrajectory: An R Package for the Analysis of
Drug Prescriptions in Electronic Health Care

Records

Anthony Nash
University of Oxford

Tingyee E. Chang
University of Oxford

Benjamin Wan
Kings College London

M. Zameel Cader
University of Oxford

Abstract

Primary care electronic health care records are rich with patient and clinical infor-
mation. Studying electronic health care records has resulted in marked improvements to
national health care processes and patient-care decision making, and is a powerful supple-
mentary source of data for drug discovery effort. We present the R package rdrugtrajec-
tory, designed to yield demographic and patient-level characteristics of drug prescriptions
in the UK Clinical Practice Research Datalink dataset. The package operates over Clin-
ical Practice Research Datalink Gold clinical, referral and therapy datasets and includes
features such as first drug prescriptions analysis, cohort-wide prescription information, cu-
mulative drug prescription events, the longitudinal trajectory of drug prescriptions, and a
survival analysis timeline builder to identify risks related to drug prescription switching.
The rdrugtrajectory package has been made freely available via the GitHub repository.

Keywords: EHR, electronic health care records, CPRD, Clinical Practice Research Datalink,
prescriptions, R, therapeutics, drug discovery, clinical epidemiology.

1. Introduction

The UK Clinical Practice Research Datalink (CPRD) service offers high quality longitudinal
data on 50 million patients with up to 20 years of follow-up for 25% of those patients. The
service provides drug treatment patterns, feasibility studies and health care resource use stud-
ies. Patient electronic health care records (EHR) are stored as coded and anonymised data
and sourced from over 1,800 primary care practices across England. CPRD holds informa-
tion on consultation events, medical diagnoses, symptoms, prescriptions, vaccination history,
laboratory tests, and referrals. CPRD can provide routine linkage to other health-related
patient datasets, for example: Small area level data, such as patient and/or practice postcode

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

http://dx.doi.org/10.18637/jss.v000.i00
https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


2 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

linked deprivation measures; data from NHS digital which includes hospital episode statistic,
outpatient and accident and emergency data; and cancer data from Public Health England.
Evidence from EHRs is making an impact on primary care decision-making and best prac-
tice Oyinlola et al. (2016). With nationwide longitudinal datasets more readily available, the
evaluation of treatments over long timescales can contribute to clinical decision-making Hepp
et al. (2017). For example, adverse events caused by prescription medication can be studied
using retrospective data in situations where randomized clinical trials may prove impracti-
cal Ghosh et al. (2019); Bally et al. (2017).
This publication serves as an introduction to the rdrugtrajectory R package and whilst this
publication is by no means a complete tutorial, we will expand on some of the main pack-
age features, such as, how to: Isolate patients by first drug prescriptions at given clinical
events; calculate time-invariant prescriptions; construct survival analysis timelines (compati-
ble with Cox proportional hazard regression and Kaplan Meier curves), and; visualise patient
prescription switching. For a comprehensive list of functions please visit the Github reposi-
tory https://github.com/acnash/rdrugtrajectory. Almost all features can be controlled
by covariates or stratified by some variable, for example, by gender, age, medical codes or
treatment product codes.
The example code, figures and data structures presented here mimic a small fraction of our
own research. In the interest of patient confidentiality, the clinical data used in the analysis
have been fabricated. We present a brief tour of some of the functions available, starting with
a discussion on the CPRD data structure and how records must be formatted. A glossary of
terms has been provided (Table 1) to assist the reader.

2. rdrugtrajectory package and data structures

2.1. rdrugtrajectory availability and installation

rdrugtrajectory is free to download from the Github repository https://github.com/acnash/
rdrugtrajectory and holds an MIT license. Fabricated CPRD clinical and CPRD prescrip-
tion records in addition to age, gender and index of multiple deprivation scores are included
for test and tutorial purposes. Before installing the package, the following R dependencies
are required: plyr, dplyr, foreach, doParallel, data.table, parallel, splus2R, rlist,
reda, ggplot2, ggalluvial, stats, utils and useful. The latest rdrugtrajectory binary is
install using:

install.packages("path/to/tar/file", source = TRUE, repos=NULL)

rdrugtrajectory was developed and tested on R version 4.0.1. Please consult the Github page
for release notes, the latest version and up to date installation instructions.

2.2. CPRD product descirption

Several rdrugtrajectory functions use the CPRD product.txt file for assigning a text descrip-
tion to a prescription prodcode. The product.txt (and medical.txt for medcode description)
is available in the CPRD Data Dictionary Windows software. It is important that the file

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://github.com/acnash/rdrugtrajectory
https://github.com/acnash/rdrugtrajectory
https://github.com/acnash/rdrugtrajectory
https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


Journal of Statistical Software 3

Term Description
rdrugtrajectory An R packaged designed for the management of CPRD prescription data.
clinical The ClinicalNNN.txt dataset presented in a rdrugtrajectory dataframe.
referral The ReferralNNN.txt dataset presented in a rdrugtrajectory dataframe.
therapy The TherapyNNN.txt dataset presented in a rdrugtrajectory dataframe.
AdditionalNNN.txt The CPRD dataset of additional clinical information, for example, patient

smoking status and alcohol comsumption. Data can be retrieved using
CPRDLookups.R.

modecode A CPRD identifier that denotes medical conditions, diagnosis and com-
plaints made by a patient. medcodes are recorded in the ClinicalNNN.txt
and ReferralNNN.txt files.

prodcode A CPRD identifier that denotes treatment products, including drugs,
foods, and medical apparatus. prodcodes are recorded in the Thera-
pyNNN.txt files.

patid A unique CPRD patient identifier. Used to link datasets.
event Any procode or medcode in a patient’s EHR.
eventdate The date of an event recorded by a general practitioner. Present in all

three datasets and corresponding rdrugtrajectory dataframe.
IMD Index of Multiple Deprivation score - a UK Government socioeconomic

measurement based on postcode of the clinic or a patient’s registered ad-
dress.

Prescription A general time for any prodcode prescribed for treatment.
medical history Indicates a combination of one or more sets of CPRD data, for example,

the collection of all clinical and therapy EHR for patients with a medcode
for migraine.

product.txt A plain text file that contains all prodcodes with a description and comes
bundled with the CPRD Data Dictionary. The file is used to link a prodcode
with a description.

Table 1: Table of frequently used terms.

remains in plain text, with columns tab-delimited. The files can be simplified by removing all
non-essential products. Finally, all the eleven columns that make up the product.txt file must
be available, with the first column containing all prodcodes and the fourth column containing
the product description. A simplified product.txt file, presented below, can be downloaded
from the Github page.

> library(rdrugtrajectory)
> productDF <- read.csv("../RDrugTrajectory_Data/product.txt",
+ sep="\t",
+ header=FALSE)
> head(productDF)

V1 V2 V3 V4 V5
1 5 60153020 14958680 Atenolol 50mg tablets Atenolol
2 24 60152020 5354283 Atenolol 100mg tablets Atenolol
3 26 67920020 6869099 Atenolol 25mg tablets Atenolol

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


4 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

4 49 58950020 4920857 Amitriptyline 25mg tablets Amitriptyline hydrochloride
5 65 68572020 4771731 Lisinopril 10mg tablets Lisinopril
6 78 68571020 4006669 Lisinopril 5mg tablets Lisinopril

V6 V7 V8 V9
1 50mg Tablet Oral 2040000
2 100mg Tablet Oral 2040000
3 25mg Tablet Oral 2040000
4 25mg Tablet Oral 04030100/04070300/04070402
5 10mg Tablet Oral 2050501
6 5mg Tablet Oral 2050501

V10
1 Beta-adrenoceptor Blocking Drugs
2 Beta-adrenoceptor Blocking Drugs
3 Beta-adrenoceptor Blocking Drugs
4 Tricyclic And Related Antidepressant Drugs/Neuropathic Pain/Prophylaxis Of Migraine
5 Angiotensin-converting Enzyme Inhibitors
6 Angiotensin-converting Enzyme Inhibitors

V11 V12
1 Feb-09 3059002
2 Feb-09 3059001
3 Feb-09 5070002
4 Feb-09 2776002
5 Feb-09 5250003
6 Feb-09 5250002

2.3. rdrugtrajectory package structure

rdrugtrajectory contains three R files: (1) all functions related to data curating and search-
ing reside within PRDDrugTrajectory.R; (2) analysis tools and timeline construction reside
within CPRDDrugTrajectoryStats.R; and, (3) all utilities including input/output operations
reside within CPRDDrugTrajectoryUtils.R. The packages contains several fabricated CPRD
datasets: testClinicalDF, testTherapyDF, ageGenderDF, imdDF, and drugListDF. A de-
scription of each, along with information on data types and structures are given below.

2.4. The CPRD EHR data structure

The structure of CPRD Gold data may depend on whether the CPRD license holder per-
forms intermediate data management steps before releasing data to the user. However, typ-
ically, CPRD Gold data follows the CPRD Gold specification https://cprdcw.cprd.com/
_docs/CPRD_GOLD_Full_Data_Specification_v2.0.pdf. Currently, rdrugtrajectory sup-
ports EHR data from the flat files ClinicalNNN.txt, ReferralNNN.txt, and TherapyNNN.txt.
The Additional Clinical Details files (AdditionalNNN.txt) are currently supported using our re-
leased R script CPRDLookups.R https://github.com/acnash/CPRD_Additional_Clinical ?.
Patients are assigned a unique numerical patid value. The operations performed by rdrugtra-
jectory requires the patid to identify patients and subset patient groups. We recommend that
patid, medcode, prodcode are kept as character data throughout any preliminary data curating

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://cprdcw.cprd.com/_docs/CPRD_GOLD_Full_Data_Specification_v2.0.pdf
https://cprdcw.cprd.com/_docs/CPRD_GOLD_Full_Data_Specification_v2.0.pdf
https://github.com/acnash/CPRD_Additional_Clinical
https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


Journal of Statistical Software 5

steps. Medical events are recorded as codes and stored in the ClinicalNNN.txt and Refer-
ralNNN.txt under the column header medcode. Prescription events, such as drug prescriptions
are also recorded as codes and stored in the TherapyNNN.txt file under the column header
prodcode and the sequences of repeat prescriptions are under the issueseq column header.
Dates associated medical and prescription events, recorded by the General Practitioner, are
stored under the column header eventdate.

2.5. Essential data types and data structures

rdrugtrajectory can operate over CPRD Gold EHR clinical, referral and prescription data
provided each dataset format is presented as separate R dataframes or combined into a rdrug-
trajectory medical history dataframe. The construction of clinical, referral and prescription
dataframes require, as a minimum, a patid and eventdate column, and either medcode or prod-
code (for therapy data, issueseq is necessary), and presented in that order. Every record of
medcode or prodcode must be accompanied by an eventdate entry (encoded as a Date class of
the form YYYY-MM-DD). Patients can have duplicate events within the same data set and
between data sets. Medical and prescription codes can be retrieved from the corresponding
medical.txt and product.txt files which come bundled with the CPRD Data Dictionary Win-
dows application. rdrugtrajectory comes packaged with fabricated EHR data in the structure
of:

> library(rdrugtrajectory)
> #fabricated clinical data (referral data follows the same format)
> names(testClinicalDF)

[1] "patid" "eventdate" "medcode" "consid"

> #fabricated prescription data
> names(testTherapyDF)

[1] "patid" "eventdate" "prodcode" "consid" "issueseq"

Users can check if the structure of an EHR dataframe meets the requirements for this package
by calling checkCPRDRecord; additional columns such as consultation identification number
(consid) are not considered. In the following instance, a prescription dataset with the required
columns and the optional consultation identification number is presented.

> library(rdrugtrajectory)
> #check the structure of testTherapy, specify that it is therapy data
> checkCPRDRecord(df=testTherapyDF, dataType="therapy")

[1] "The data.frame is appropriately formatted. Returning TRUE."
[1] TRUE

> #display the rdrugtrajectory EHR therapy dataframe
> str(testTherapyDF, strict.width="wrap")

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


6 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

'data.frame': 91647 obs. of 5 variables:
$ patid : int 3515 3515 3515 3515 3515 3515 3515 3653 3653 3653 ...
$ eventdate: Date, format: "2005-02-24" "2006-01-26" ...
$ prodcode : int 83 83 83 707 707 707 707 297 297 297 ...
$ consid : int 540850 540865 540892 541108 541114 541118 541133 571336 571345

571357 ...
$ issueseq : int 0 0 0 0 0 0 0 0 1 2 ...

Users can combine with the rdrugtrajectory EHR dataframes any number of patient and EHR
data to act as covariates and stratifying variables, typically this can be done using the R cbind
operation. For example, BMI and smoking status, both of which can be retrieved from the
AdditionalNNN.txt dataset files using CPRDLookups.R, can be linked by searching for and
binding with the record patid values. The rdrugtrajectory package contains several utility
functions to retrieve CPRD data, including, patient year of birth, gender (male or female) and
either patient-level or clinical-level index of multiple deprivation score (IMD). The patient age
can be determined by adding 1800 to the value in yob column in the Patient CPRD EHR
dataset and then subtracting that value (birth year) from the year of the CPRD database
release. This data requires preliminary treatment before presenting to the rdrugtrajectory
package. Patient age, gender and IMD score must be presented in a dataframe with the
linked patient column patid, along with the columns age, gender, and score. Providing the
patid column is preserved, patient characteristics can be presented in separate dataframe, for
example:

> library(rdrugtrajectory)
> #patient age and gender as one dataframe
> str(ageGenderDF, strict.width="wrap")

'data.frame': 3838 obs. of 3 variables:
$ patid : int 1 2 3 4 5 6 7 8 9 10 ...
$ yob : num 45 35 33 42 63 57 34 51 51 22 ...
$ gender: int 2 2 1 2 2 1 2 2 2 1 ...

> #clinic-level IMD score as one datafrmae
> str(imdDF, strict.width="wrap")

'data.frame': 2126 obs. of 3 variables:
$ patid : int 6 11 16 34 42 44 54 60 63 79 ...
$ pracid: int 184 31 66 344 66 47 18 90 379 317 ...
$ score : int 1 3 1 4 1 2 1 5 1 2 ...

The patid patient identifier is fundamental in every operation performed by rdrugtrajectory.
The examples presented here and those in the reference manual rely on searching and subset-
ting EHR data using a list or vector of patient identifier. The function getUniquePatidList
will retrieve an R List of patient identification numbers from any dataframe with a patid
column.
The aforementioned rdrugtrajectory EHR dataframes, clinical, referral and therapy, can be
combined into a single dataframe. We refer to this dataset instance as the patient’s medical

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


Journal of Statistical Software 7

history and can be constructed using constructMedicalHistory. This dataframe expects
events to be in chronological order, and will introduce a new column, code and codetype to
denote each of the combined events. The code (medcode and/or prodcode) can be distinguished
by a codetype value of c (clinical events), r (referral events), and t (prescription events). Events
are returned in chronological order using the eventdate data. The following code demonstrates
how to retrieve a list of patient identifier from a prescription dataframe and from a medical
history dataframe, followed by how to subset using base R operations and, finally, the medical
history dataframe structure.

> library(rdrugtrajectory)
> #Retrieve patids from therapy data.
> idList <- getUniquePatidList(testClinicalDF)
> medHistoryDF <- constructMedicalHistory(testClinicalDF, NULL, testTherapyDF)

[1] "Using clinical data."
[1] "Using therapy data."
[1] "Building with clinical and therapy data."

> #Retrieve patid from medical history.
> medHistoryIDList <- getUniquePatidList(medHistoryDF)
> numOfPatients <- length(medHistoryIDList)
> #Subset using the first 100 patients.
> smallMedHistoryDF <- subset(medHistoryDF,
+ medHistoryDF$patid %in% medHistoryIDList[1:100])
> #Separate out the first 100 patient with a clinical record.
> smallClinicalOnlyDF <- subset(smallMedHistoryDF,
+ smallMedHistoryDF$codetype == "c")
> #Separate out the first 100 patient with a therapy record.
> smallTherapyOnlyDF <- subset(smallMedHistoryDF,
+ smallMedHistoryDF$codetype == "t")
> #Subset only or those patient records beyond 31st Jan 2010.
> laterMedHistoryDF <- subset(medHistoryDF,
+ medHistoryDF$eventdate > as.Date("2010-01-31"))

> #Medical history dataframe structure
> str(medHistoryDF, strict.width="wrap")

'data.frame': 103336 obs. of 4 variables:
$ patid : int 1 1 1 1 1 1 1 2 2 3 ...
$ eventdate: Date, format: "2002-06-07" "2005-07-25" ...
$ code : int 5767 5767 5767 707 707 707 707 5767 769 5767 ...
$ codetype : chr "c" "c" "c" "t" ...

The patid data can also be used to retrieve patient characteristics, for example, the gender
of the patient using getGenderOfPatients:

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


8 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

> library(rdrugtrajectory)
> idList <- getUniquePatidList(testTherapyDF)
> #Only use half of the cohort.
> idList <- idList[1:(length(idList)/2)]
> #Get gender data by specific gender.
> maleCode <- 1
> femaleCode <- 2
> malePatientsDF <- getGenderOfPatients(idList, ageGenderDF, maleCode)
> femalePatientsDF <- getGenderOfPatients(idList, ageGenderDF, femaleCode)
> #Get all gender data
> allPatientsDF <- getGenderOfPatients(getUniquePatidList(testTherapyDF),
+ ageGenderDF)

> #Structure of the patient gender data.
> str(allPatientsDF, strict.width="wrap")

'data.frame': 3838 obs. of 2 variables:
$ patid : int 1 2 3 4 5 6 7 8 9 10 ...
$ gender: int 2 2 1 2 2 1 2 2 2 1 ...

IMD data can be retrieved by combining getUniquePatidList and getIMDOfPatients func-
tions:

> library(rdrugtrajectory)
> idList <- getUniquePatidList(testTherapyDF)
> #Get patients with an IMD score of 1 or 2
> onePatientsDF <- getIMDOfPatients(idList, imdDF, 1)
> twoPatientsDF <- getIMDOfPatients(idList, imdDF, 2)
> #Get all IMD scores for all patients in testTherapyDF
> allPatientsDF <- getIMDOfPatients(getUniquePatidList(testTherapyDF), imdDF)

> #Structure of the patient gender data.
> str(allPatientsDF, strict.width="wrap")

'data.frame': 2123 obs. of 2 variables:
$ patid: int 6 11 16 34 42 44 54 60 63 79 ...
$ score: int 1 3 1 4 1 2 1 5 1 2 ...

The final example of EHR dataframe manipulation presented here demonstrates how to re-
trieve all prescription records for patients prescribed a specific prescription treatment. For
example, such an operation can be used to retrieve all prescription records for any patient
prescribed amitriptyline. In addition, it is also possible to return only prescription records
matching specific prescription treatments. Importantly, prescription prodcodes can be grouped
into lists and used to collect those patients with at least one record that matches an element
of that list. This approach is useful if the dose is not relevant to the study or the prescription
is dispensed under multiple product names.

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


Journal of Statistical Software 9

> library(rdrugtrajectory)
> #It is easy to retrieve a list of all unique prodcodes in the cohort.
> prodCodesVector <- unique(testTherapyDF$prodcode)
> reducedProdCodesVector <- prodCodesVector[1:10]
> #All records are maintained for those patients with a matching prodcode.
> therapyOfInterestDF <- getPatientsWithProdCode(testTherapyDF,
+ reducedProdCodesVector)
> #Only those records that match are retained.
> reducedTherapyOfInterestDF <- getPatientsWithProdCode(testTherapyDF,
+ reducedProdCodesVector,
+ removeExcessDrugs=TRUE)

3. EHR drug prescription results and discussion
Having briefly demonstrated some basic operation on retrieving patient records by matching
EHR dataframes against sets of patid values, we move on to showcase several operations
available to the user. We begin by presenting examples of cohort prescription summary
statistics followed by methods of dataset curating and stratifying by patient groups. We
then present examples on how to search for patients prescribed with a first-line treatments,
followed by presenting some of these patient groups as sequences of prescriptions. Finally,
we demonstrate several examples of building time-lines. For futher examples, please see the
Github page and reference manual.

3.1. Cohort summmary statistics

getEventdateSummaryByPatient

rdrugtrajectory can return summary based statistics on patient and cohort level prescription
data with getEventdateSummaryByPatient and getPopulationDrugSummary, respectively.
For example, a single patient (via getUniquePatidList and [] dataframe subsetting) pre-
scription history returns the patient patid, number of prescription events, median number of
days between events, fewest number of days between events, the most number of days between
events (maxTime and longestDuration are the same), and record duration (number of days
between the first and last prescription event on record):

> library(rdrugtrajectory)
> idList <- getUniquePatidList(testTherapyDF)
> resultList <- getEventdateSummaryByPatient(
+ testTherapyDF[testTherapyDF$patid==idList[[1]],])
> str(resultList, strict.width="wrap")

List of 2
$ TimeSeriesList: num [1:6] 336 652 2540 34 42 44
$ SummaryDF :'data.frame': 1 obs. of 7 variables:
..$ patid : int 3515
..$ numberOfEvents : int 7

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


10 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

..$ medianTime : num 190

..$ minTime : num 34

..$ maxTime : num 2540

..$ longestDuration: num 2540

..$ recordDuration : int 3648
- attr(*, "class")= chr "EventdateSummaryObj"

getPopulationDrugSummary
This approach can be extended across the cohort of patients with getPopulationDrugSummary.
The returning PopulationEventdateSummary S3 object is a list of three elements. The first
element is the SummaryDF dataframe derived from calling getEventdateSummaryByPatient
per patient, with the set of statistics retrievable through the accompanied patid. The second
element is the TimeSeriesList, which holds a vector per patient of the number of days between
consecutive prescription events. Vectors can be accessed using the patid element name:

> library(rdrugtrajectory)
> resultList <- getPopulationDrugSummary(df = testTherapyDF,
+ prodCodesVector = NULL)

> str(resultList, strict.width="wrap", list.len = 5)

List of 2
$ SummaryDF :'data.frame': 3838 obs. of 7 variables:
..$ patid : int [1:3838] 3515 3653 3756 3813 435 553 731 891 1781 1991 ...
..$ numberOfEvents : int [1:3838] 7 21 1 1 13 2 15 2 23 79 ...
..$ medianTime : num [1:3838] 190 60 0 0 28.5 ...
..$ minTime : num [1:3838] 34 34 0 0 11 ...
..$ maxTime : num [1:3838] 2540 1623 0 0 322 ...
.. [list output truncated]
$ TimeSeriesList:List of 3838
..$ 3515: num [1:6] 336 652 2540 34 42 44
..$ 3653: num [1:20] 890 222 182 301 539 ...
..$ 3756: num 0
..$ 3813: num 0
..$ 435 : num [1:12] 26 23 24 24 32 322 31 29 11 51 ...
.. [list output truncated]
- attr(*, "class")= chr "PopulationEventdateSummary"

> #Get all patids for patients younger than 40.
> ageIDList <- getUniquePatidList(ageGenderDF[ageGenderDF$yob < 40,])
> timeSeriesList <- resultList[[2]]
> #Get all patids of available data.
> recordPatids <- names(timeSeriesList)
> #Get time data for the intersect of those patids of patients < 40 and the patids
> #of available data.
> subTimeList <- timeSeriesList[intersect(ageIDList, recordPatids)]
> str(subTimeList, strict.width="wrap", list.len = 5)

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


Journal of Statistical Software 11

List of 640
$ 2 : num 0
$ 3 : num 0
$ 7 : num 25
$ 10 : num 0
$ 15 : num 0
[list output truncated]

3.2. Curating drug prescription records

There is no direct link between a prescription event and a medcode in the CPRD data.
The relationship between the two can be inferred from the event dates of the prescription
and clinical events, in addition, to information provided by the consultation ID and the
prescription issue number.

matchDrugWithDisease
rdrugtrajectory provides several methods for curating prescription datasets with the aim of es-
tablishing a relationship between prescription and clinical events. The matchDrugWithDisease
function returns a subset of all prescription events with an established relationship between
therapy and clinical event. To what degree these patients are included in the search is con-
trolled with a function argument. There are three scenarios: all patients with a record of a
specific prescription event and specific clinical event, at any point; all patients with a record
of a specific prescription event on the same date as a specific clinical event; and, all patients
with a record of a specific prescription event on the same date as a specific clinical event and
clear from additional clinical events on that day. One would expect fewer patients as the
stringency of the search criteria is increased:

> library(rdrugtrajectory)
> prodcodes <- unique(testTherapyDF$prodcode)
> amitriptylineCodes <- prodcodes[1:5]
> propranololCodes <- prodcodes[6:11]
> medcodeList <- unique(testClinicalDF$medcode)
> headacheCodes <- medcodeList[1:10]
> amitriptylineResult1 <- matchDrugWithDisease(clinicalDF = testClinicalDF,
+ therapyDF = testTherapyDF,
+ medcodeList = headacheCodes,
+ drugcodeList = amitriptylineCodes,
+ severity = 1)
> amitriptylineResult2 <- matchDrugWithDisease(clinicalDF = testClinicalDF,
+ therapyDF = testTherapyDF,
+ medcodeList = headacheCodes,
+ drugcodeList = amitriptylineCodes,
+ severity = 2)
> amitriptylineResult3 <- matchDrugWithDisease(clinicalDF = testClinicalDF,
+ therapyDF = testTherapyDF,
+ medcodeList = headacheCodes,

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


12 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

+ drugcodeList = amitriptylineCodes,
+ severity = 3)
> propranololResult1 <- matchDrugWithDisease(clinicalDF = testClinicalDF,
+ therapyDF = testTherapyDF,
+ medcodeList = headacheCodes,
+ drugcodeList = propranololCodes,
+ severity = 1)
> propranololResult2 <- matchDrugWithDisease(clinicalDF = testClinicalDF,
+ therapyDF = testTherapyDF,
+ medcodeList = headacheCodes,
+ drugcodeList = propranololCodes,
+ severity = 2)
> propranololResult3 <- matchDrugWithDisease(clinicalDF = testClinicalDF,
+ therapyDF = testTherapyDF,
+ medcodeList = headacheCodes,
+ drugcodeList = propranololCodes,
+ severity = 3)

getGenderOfPatients

The example presented, demonstrates how to identify patients prescribed amitriptyline and
patients prescribed propranolol (there is patient overlap, easily controlled for by subsetting)
whilst controlling for clinical overlap with or without consideration for off topic clinical events.
With the identified patients, we can, for example, stratify by gender:

> library(rdrugtrajectory)
> library(ggplot2)
> ami1Gender <- getGenderOfPatients(amitriptylineResult1, ageGenderDF)
> ami2Gender <- getGenderOfPatients(amitriptylineResult2, ageGenderDF)
> ami3Gender <- getGenderOfPatients(amitriptylineResult3, ageGenderDF)
> prop1Gender <- getGenderOfPatients(propranololResult1, ageGenderDF)
> prop2Gender <- getGenderOfPatients(propranololResult2, ageGenderDF)
> prop3Gender <- getGenderOfPatients(propranololResult3, ageGenderDF)
> amiDF <- data.frame(Freq=c(nrow(ami1Gender[ami1Gender$gender==1, ]),
+ nrow(ami2Gender[ami2Gender$gender==1, ]),
+ nrow(ami3Gender[ami3Gender$gender==1, ]),
+ nrow(ami1Gender[ami1Gender$gender==2, ]),
+ nrow(ami2Gender[ami2Gender$gender==2, ]),
+ nrow(ami3Gender[ami3Gender$gender==2, ])
+ ),
+ Search=c("Prescribed","With headache","No comorbidities",
+ "Prescribed","With headache","No comorbidities"),
+ Drug="Amitriptyline",
+ Gender=c("Male","Male","Male",
+ "Female","Female","Female")
+ )

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


Journal of Statistical Software 13

> propDF <- data.frame(Freq=c(nrow(prop1Gender[prop1Gender$gender==1, ]),
+ nrow(prop2Gender[prop2Gender$gender==1, ]),
+ nrow(prop3Gender[prop3Gender$gender==1, ]),
+ nrow(prop1Gender[prop1Gender$gender==2, ]),
+ nrow(prop2Gender[prop2Gender$gender==2, ]),
+ nrow(prop3Gender[prop3Gender$gender==2, ])
+ ),
+ Search=c("At any time","With clinical","Clinical & No comorbidities",
+ "At any time","With clinical","Clinical & No comorbidities"),
+ Drug="Propranolol",
+ Gender=c("Male","Male","Male",
+ "Female","Female","Female")
+ )
> drugPrescriptionDF <- rbind(amiDF, propDF)
> ggPrescriptionAmi <- ggplot(drugPrescriptionDF[
+ drugPrescriptionDF$Drug=="Amitriptyline",],
+ aes(x=Search, y=Freq, fill=Gender)) +
+ geom_bar(stat="identity", position=position_dodge()) +
+ theme_bw() + xlab("Search critera (severity)") + ylab("Patient count") +
+ theme(axis.text.x = element_text(angle=45,hjust=1)) +
+ ggtitle("Amitriptyline")
> ggPrescriptionProp <- ggplot(drugPrescriptionDF[
+ drugPrescriptionDF$Drug=="Propranolol",],
+ aes(x=Search, y=Freq, fill=Gender)) +
+ geom_bar(stat="identity", position=position_dodge()) +
+ theme_bw() + xlab("Search critera (severity)") + ylab("Patient count") +
+ theme(axis.text.x = element_text(angle=45,hjust=1)) +
+ ggtitle("Propranolol")
>

Filtering through prescription events can also be controlled by a date range. For example,
if one was calculating the number of patients prescribed amitriptyline per year from 2000 to
2004 and matched to a headache event, one can apply a date range:

> library(rdrugtrajectory)
> library(ggplot2)
> prodcodes <- unique(testTherapyDF$prodcode)
> amitriptylineCodes <- prodcodes[1:5]
> #Clinical event of interest are headaches.
> medcodeList <- unique(testClinicalDF$medcode)
> #Medcodes can be refined further.
> headacheCodes <- medcodeList[1:10]
> #Dataframes defined for binned dates are constructed by providing all the
> #patients to consider and the binned start and stop date.
> date2000DF <- data.frame(patid=unlist(getUniquePatidList(testTherapyDF)),
+ start=as.Date(as.character("2000-01-01")),
+ stop=as.Date(as.character("2000-12-31")))

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


14 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

0

500

1000

1500

2000

No
 c
om

or
bi
di
tie

s

Pr
es

cr
ib
ed

W
ith

 h
ea

da
ch

e

Search critera (severity)

P
a

tie
n

t 
co

u
n

t

Gender

Female

Male

AmitriptylineA

0

250

500

750

1000

At
 a

ny
 ti
m

e

Cl
in
ica

l &
 N

o 
co

m
or

bi
di
tie

s

W
ith

 c
lin

ica
l

Search critera (severity)

P
a

tie
n

t 
co

u
n

t

Gender

Female

Male

PropranololB

Figure 1: The number of patients prescribed (A) amitriptyline or (B) propranolol. The
criteria to match against clinical data is indicated: at any time, with a clinical record, and
with a clinical record clear off topic clinical events.

> date2001DF <- data.frame(patid=unlist(getUniquePatidList(testTherapyDF)),
+ start=as.Date(as.character("2001-01-01")),
+ stop=as.Date(as.character("2001-12-31")))
> date2002DF <- data.frame(patid=unlist(getUniquePatidList(testTherapyDF)),
+ start=as.Date(as.character("2002-01-01")),
+ stop=as.Date(as.character("2002-12-31")))
> date2003DF <- data.frame(patid=unlist(getUniquePatidList(testTherapyDF)),
+ start=as.Date(as.character("2003-01-01")),
+ stop=as.Date(as.character("2003-12-31")))
> date2004DF <- data.frame(patid=unlist(getUniquePatidList(testTherapyDF)),
+ start=as.Date(as.character("2004-01-01")),
+ stop=as.Date(as.character("2004-12-31")))
> #Retrieve prescription frequencies per binned range
> amitResult2000 <- matchDrugWithDisease(clinicalDF = testClinicalDF,
+ therapyDF = testTherapyDF,

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


Journal of Statistical Software 15

+ medcodeList = headacheCodes,
+ drugcodeList = amitriptylineCodes,
+ severity = 1,
+ dateDF = date2000DF)
> amitResult2001 <- matchDrugWithDisease(clinicalDF = testClinicalDF,
+ therapyDF = testTherapyDF,
+ medcodeList = headacheCodes,
+ drugcodeList = amitriptylineCodes,
+ severity = 1,
+ dateDF = date2001DF)
> amitResult2002 <- matchDrugWithDisease(clinicalDF = testClinicalDF,
+ therapyDF = testTherapyDF,
+ medcodeList = headacheCodes,
+ drugcodeList = amitriptylineCodes,
+ severity = 1,
+ dateDF = date2002DF)
> amitResult2003 <- matchDrugWithDisease(clinicalDF = testClinicalDF,
+ therapyDF = testTherapyDF,
+ medcodeList = headacheCodes,
+ drugcodeList = amitriptylineCodes,
+ severity = 1,
+ dateDF = date2003DF)
> amitResult2004 <- matchDrugWithDisease(clinicalDF = testClinicalDF,
+ therapyDF = testTherapyDF,
+ medcodeList = headacheCodes,
+ drugcodeList = amitriptylineCodes,
+ severity = 1,
+ dateDF = date2004DF)
> #The number of patids returned by matchDrugWithDisease is equal to the number
> #of patients with a drug - disease match per year
> dataDF <- data.frame(Year=c("2000","2001","2002","2003","2004"),
+ Count=c(length(amitResult2000),length(amitResult2001),
+ length(amitResult2002),length(amitResult2003),
+ length(amitResult2004)))
> ggPrescriptionYear <- ggplot(dataDF, aes(x=Year, y=Count)) +
+ geom_bar(stat = "identity") + theme_bw()

getPatientsWithFirstDrugWithDisease
Unlike matchDrugWithDisease which retrieves patients with a prescription event matching
clinical criteria at any time within a CPRD EHR record, getPatientsWithFirstDrugWithDisease
identifies patients with a first prescription event that matches a desired clinical event. Please
note, care must be taken when searching for medication with off-label uses. For example,
beta-blockers are frequently prescribed to treat hypertension and arrhythmia, however, the
beta-blocker propranolol is also prescribed to treat migraine. Without in depth analysis into
the patient history, patients propranolol with records for hypertension or arrhythmia in addi-
tion to migraine on a matching eventdate with the first propranolol prescription, could result

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


16 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

0

100

200

300

2000 2001 2002 2003 2004
Year

C
o

u
n

t

Figure 2: The number of patients prescribed amitriptyline from the start of the year 2000 to
the end of 2004, stratified in year intervals.

in a misleading disease-drug association. In cases where a health care professional suggests a
change in the patient’s lifestyle choices, that patient may have several clinical events free from
prescriptions before the first prescription of interest is prescribed. Using basic subsetting one
can calculate the number of clinical events before the patient’s first prescription intervention
(Figure 3 A). Further more, we can stratify patients into subgroups (Figure 3 B):

> library(rdrugtrajectory)
> library(ggplot2)
> #A vector of prescriptions of interest.
> drugList <- unique(testTherapyDF$prodcode)
> sampleDrugs <- drugList[1:8]
> #A vector of clinical events to match prescriptions against.
> medCodes <- unique(testClinicalDF$medcode)
> sampleMedCodes <- medCodes[1:30]
> #Returns the subset of the first prescription event prescribed on the same
> #eventdate as those clinical events of interest

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


Journal of Statistical Software 17

> firstDF <- getPatientsWithFirstDrugWithDisease(clinicalDF = testClinicalDF,
+ therapyDF = testTherapyDF,
+ medCodesVector = sampleMedCodes,
+ drugCodesVector = sampleDrugs)
> #Ensure the only clinical data are for those with an assume first-drug-disease
> firstClinicalDF <- subset(testClinicalDF,
+ testClinicalDF$patid %in% getUniquePatidList(firstDF))
> #Only keep the diseases of interest
> firstClinicalDF <- subset(firstClinicalDF,
+ firstClinicalDF$medcode %in% sampleMedCodes)
> #Only keep the prescriptions of interest
> firstDF <- subset(firstDF, firstDF$prodcode %in% sampleDrugs)
> idList <- getUniquePatidList(firstClinicalDF)
> beforeResultDF <- data.frame(patid=unlist(idList), Freq=0)
> for(id in idList) {
+ #Retrieve the clinical/therapy data for each patients, one by one.
+ indClinicalDF <- subset(firstClinicalDF, firstClinicalDF$patid == id)
+ indTherapyDF <- subset(firstDF, firstDF$patid == id)
+ #Get the first event date on record; this will match a clinical date.
+ firstEventDate <- indTherapyDF$eventdate[1]
+ clinicalBeforeTherapyDF <- subset(indClinicalDF,
+ indClinicalDF$eventdate < firstEventDate)
+ #Number of clinical complaints before first prescription.
+ nComplaints <- nrow(clinicalBeforeTherapyDF)
+ beforeResultDF[beforeResultDF$patid==id,]$Freq <- nComplaints
+ }
> ggBefore <- ggplot(beforeResultDF, aes(x=Freq)) +
+ geom_histogram(binwidth=1, color="black", fill="white") +
+ ylab("Patients") + xlab("Clinical events before prescription") +
+ theme_bw()
> #Note: not every patient will have a clinical IMD score.
> imdIDsDF <- getIMDOfPatients(idList = idList,
+ imdDF = imdDF)
> #Only work with those with an IMD score.
> imdResultsDF <- subset(beforeResultDF,
+ beforeResultDF$patid %in% getUniquePatidList(imdIDsDF))
> imdResultsDF <- imdResultsDF[order(imdResultsDF$patid),]
> imdIDsDF <- imdIDsDF[order(imdIDsDF$patid),]
> imdResultsDF <- cbind(imdResultsDF, IMD_score=as.factor(imdIDsDF$score))
> ggBeforeIMD <- ggplot(imdResultsDF,
+ aes(x=Freq, fill=IMD_score)) +
+ geom_histogram(binwidth=1) + theme_bw() +
+ ylab("Patients") + xlab("Clinical events before prescription")

getMultiPrescriptionSameDayPatients
The function getMultiPrescriptionSameDayPatients returns all prescription events for

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


18 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

0

100

200

0 10 20
Clinical events before prescription

P
a

tie
n

ts

A

0

50

100

150

0 10 20
Clinical events before prescription

P
a

tie
n

ts

IMD_score

1

2

3

4

5

B

Figure 3: The number of clinical events before the first treatment across the whole cohort
(A), and by IMD score (B).

those patients prescribed more than two prescriptions on the same date. All events of those pa-
tients without a prescription prodcode event can be removed. Combining getMultiplePrescriptionSameDayPatients
with getPatientsWithFirstDrugWithDisease or matchDrugWithDisease is useful for filter-
ing patients for specific prescription patterns. For example, to retrieve all patient prescription
records if specific prescriptions are (a) never recorded together on the same date and (b) are
used as a first line treatment for a given complaint:

> library(rdrugtrajectory)
> prodcodesVector = unique(testTherapyDF$prodcode)[1:8]
> #ensure only patients with specific prescriptions are returned providing a
> #patient is prescribed those drugs on different dates, never on the same date.
> uniqueTherapyDF <- getMultiPrescriptionSameDayPatients(df = testTherapyDF,
+ prodCodesVector = prodcodesVector,
+ removePatientsWithoutDrugs = TRUE)
> #Ensure that the patients (patid) in the therapy and clinical dataframes
> #are the same. Subsetting might not be enough.
> reducedClinicalDF <- subset(testClinicalDF,

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


Journal of Statistical Software 19

+ testClinicalDF$patid %in% getUniquePatidList(uniqueTherapyDF))
> #Specific medcodes have not been provided. All medcodes in the clinical
> #dataframe are considered. This is possible if one either one is not interested
> #in the nature of the clinical complaint or the clinical dataframe has been
> #adjusted to only include clinical complaints of interest.
> firstDF <- getPatientsWithFirstDrugWithDisease(clinicalDF = reducedClinicalDF,
+ therapyDF = uniqueTherapyDF,
+ drugCodesVector = sampleDrugs)

In the above example, patients with more than one prescription on the same date or without
a prescription at all (from the set of desired prescription prodcodes) were removed from the
cohort. This reduced the number of patients from 3838 patients to 2930. Next, only those
patients with a first line treatment (first prescription event on the same date as a clinical
event) were kept, reducing the sample size to 587 patients.

removePatientsByDuration

Longitudinal EHR cohort studies often requires careful time-related consideration. Currently,
rdrugtrajectory presents two functions that identify prescription records of patients that
match two time constraints. The first, removePatientsByDuration, removes all patients
with prescription events that are no more than n years between consecutive events or removes
patients if the duration between the first and last prescription event on record is less than n
years.

> library(rdrugtrajectory)
> df <- removePatientsByDuration(minObsYr = 5,
+ minBreakYr = 2,
+ therapyDF = testTherapyDF)

getBurnInPatients

The second time-related function, getBurnInPatients identifies all patient prescription records
with at least n days free from prescription events before a specific prescription event. This
is useful if one requires a period of time free from prescription intervention before a given
prescription event:

> library(rdrugtrajectory)
> drugOfInterestVector <- c(83,49,297,1888,940,5)
> patientList <- getBurnInPatients(df = testTherapyDF,
+ startCodesVector = drugOfInterestVector,
+ periodDaysBefore = 172)
> burnInTherapyDF <- subset(testTherapyDF,
+ testTherapyDF$patid %in% patientList)

In the above example, from a cohort of 3838 patients, 426 patients had a period of up to 172
days free from of prescription events before the first prescription prodcode specified via the
startCodesVector argument. The functionality relies on the patient having prescription events
before the burn-in period (required to define whether the patient had a CPRD record early

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


20 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

enough before the burn-in period began). For example, this patient had over three years of
prescription events before the prescription of interest (from 2003-05-29 to 2007-10-17 with
over 172 days free from exposure before the prescription event of interest prodcode 297:

> head(burnInTherapyDF[burnInTherapyDF$patid == 332412,], n=9)

[1] patid eventdate prodcode consid issueseq
<0 rows> (or 0-length row.names)

3.3. First drug prescriptions

getFirstDrugPrescription

A patient’s first prescription event on CPRD record can be identified by supplying getFirstDrugPrescription
with a list of prescription prodcodes. The functions returns FirstDrugObject, an R S3 ob-
ject of type List. Only the first prescription event to match anyone one of the prescription
prodcodes provided is identified. The first element of FirstDrugObject contains a named list
of patid vectors. Each vector contains the patids of all those patients that share the same
first prescription prodcode. The list element is named after the corresponding prescription
prodcode. The second element in FirstDrugOject, like the first, is a list of Date vectors, each
named after the corresponding prescription prodcode. Each Date vector contains the eventdate
of the prescription event for the patient identified by the patid in the identical position of the
preceding List. The third list element contains a table of prescription frequencies for each
first prescription prodcode on record. The prodcode is accompanied by a product description
providing a file of CPRD prescription products has been provided. Below we demonstrate
how to retrieve information on first-line treatment:

> library(rdrugtrajectory)
> library(ggplot2)
> #An adjusted data dictionary file.
> fileLocation <- "product.txt"
> #Without supplying a vector of product files all prodcodes in the therapy
> #dataset are considered.
> resultFDO <- getFirstDrugPrescription(df = testTherapyDF,
+ idList = NULL,
+ prodCodesVector = NULL,
+ descriptionFile = fileLocation)
> patidList <- resultFDO[[1]]
> eventdateList <- resultFDO[[2]]
> drugFrequencyDF <- resultFDO[[3]]
> drugFrequencyDF <- drugFrequencyDF[order(drugFrequencyDF$Frequency,
+ decreasing = TRUE), ]
> ggFreq <- ggplot(data=drugFrequencyDF, aes(x=description, y=Frequency)) +
+ geom_bar(stat="identity") + theme_bw() +
+ theme(axis.text.x = element_text(angle=45, hjust=1)) +
+ xlab("Drug product description")

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


Journal of Statistical Software 21

> #The structure of the FirstDrugObject.
> str(resultFDO, strict.width="wrap", list.len = 5)

0

500

1000

Am
itr

ip
ty
lin

e 
10

m
g 

ta
bl
et

s

Am
itr

ip
ty
lin

e 
25

m
g 

ta
bl
et

s

Am
itr

ip
ty
lin

e 
50

m
g 

ta
bl
et

s

At
en

ol
ol
 1

00
m

g 
ta

bl
et

s

At
en

ol
ol
 2

5m
g 

ta
bl
et

s

At
en

ol
ol
 5

0m
g 

ta
bl
et

s

Ca
nd

es
ar

ta
n 

2m
g 

ta
bl
et

s

Ca
nd

es
ar

ta
n 

4m
g 

ta
bl
et

s

Li
sin

op
ril

 1
0m

g 
ta

bl
et

s

Li
sin

op
ril

 2
.5

m
g 

ta
bl
et

s

Li
sin

op
ril

 5
m

g 
ta

bl
et

s

Pr
op

ra
no

lo
l 1

0m
g 

ta
bl
et

s

Pr
op

ra
no

lo
l 4

0m
g 

ta
bl
et

s

Pr
op

ra
no

lo
l 8

0m
g 

m
od

ifie
d−

re
le
as

e 
ca

ps
ul
es

Pr
op

ra
no

lo
l 8

0m
g 

ta
bl
et

s

To
pi
ra

m
at

e 
25

m
g 

ta
bl
et

s

Ve
nl
af

ax
in
e 

37
.5

m
g 

ta
bl
et

s

Ve
nl
af

ax
in
e 

75
m

g 
m

od
ifie

d−
re

le
as

e 
ca

ps
ul
es

Ve
nl
af

ax
in
e 

75
m

g 
m

od
ifie

d−
re

le
as

e 
ta

bl
et

s

Drug product description

F
re

q
u

e
n

cy

Figure 4: The frequency of first line treatment prescription.

getAgeGroupByEvents

In the next example we explore stratifying first-line prescription events by patient character-
istics, such as, age, gender, IMD, and number of medcodes (for instance, by comorbidities)
or prodcodes (for instance, to separate those patients by additional prescriptions), or by any
additional clinical event retrieved using CPRDLookups.R ?. rdrugtrajectory provides several
utility functions to stratify patients (see reference manual for further information). The func-
tion getAgeGroupByEvents calculates the number of first-line prescription events by patient
age. By specifying a set of patids and eventdates from the FirstDrugObject, we can calculate
the number of first-line prescriptions by age-group for patients linked with a specified medical
condition:

> library(rdrugtrajectory)
> fileLocation <- "product.txt"

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


22 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

> resultFDO <- getFirstDrugPrescription(df = testTherapyDF,
+ idList = NULL,
+ prodCodesVector = NULL,
+ descriptionFile = fileLocation)
> patidList <- resultFDO[[1]]
> eventdateList <- resultFDO[[2]]
> names(ageGenderDF) <- c("patid","age","gender")
> #The age-groups: [18,25), [25,30), [30,35), ..., [60,60+).
> ageGroupVector <- c(18,25,30,35,40,45,50,55,60)
> #CPRD database release year.
> ageAtYear <- "2017"
> ageGroupList <- getAgeGroupByEvents(idList = as.list(patidList[1:2]),
+ eventdateList = eventdateList[1:2],
+ ageDF = ageGenderDF,
+ ageGroupVector = ageGroupVector,
+ ageAtYear = ageAtYear)

> ageGroupList

[[1]]
18-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60+

1 103 94 106 131 165 182 153 185 240

[[2]]
18-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60+

1 45 39 35 23 43 34 32 18 25

In the above example, the age of each patient (ageDF) was provided using year-of-birth calcu-
lated against the release year of the CPRD Gold database (explained above). By providing the
database release year (in ageAtYear) and the first prescription eventdate (in eventdateList),
the age of each patient is adjusted against the prescription eventdate year. Finally, by using
a list slice on idList and eventdateList, (individual prescriptions can be specified using their
prodcode, for example, eventdateList$‘105‘), first prescription prescriptions frequencies by
age-group are retrievable (Figure 5).

> library(ggplot2)
> ageGroupDrugDF <- data.frame(Age=names(ageGroupList[[1]]),
+ Count=unlist(ageGroupList[[1]]),
+ Drug="Amitriptyline 10mg")
> ggAmitriptyline <- ggplot(ageGroupDrugDF, aes(x=Age, y=Count)) +
+ geom_bar(stat="identity") +
+ theme_bw() + ggtitle("Amitriptyline 10mg") +
+ theme(axis.text.x = element_text(angle=45, hjust=1)) +
+ xlab("Age-group") + ylab("Frequency")

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


Journal of Statistical Software 23

0

50

100

150

200

250

18
−2

4

25
−2

9

30
−3

4

35
−3

9

40
−4

4

45
−4

9

50
−5

4

55
−5

9
60

+

Age−group

F
re

q
u

e
n

cy

Amitriptyline 10mg

Figure 5: The distribution of Amitriptyline 10mg as a first-line treatment by age-group.

3.4. Prescription sequences

mapDrugTrajectory

Identifying patient prescription trajectories in longitudinal EHRs remains our biggest motiva-
tor behind the development of rdrugtrajectory. Therefore, we developed mapDrugTrajectory
to identify the chronological of patient prescription events. We restrict the calculation to only
look for prescription prodcodes as supplied to groupingList as a named list (named prodcode
vectors). The required number of grouped-prescription events is defined by specifying the
minDepth and the number of those changes to display is controlled by maxDepth maximum
number. By keeping minDepth and maxDepth the same, only patients with a valid number of
prescription changes are displayed (Figure 6 (A) and (C)). Patient records with fewer than
minDepth number of changes to prescription sequences are ignored (Figure 6 (B)). For further
information please refer to the reference manual.
In the code below, mapDrugTrajectory returns patients with at least first five grouped pre-
scriptions. prodcodes that have not been grouped are ignored. Duplication of prodcodes (those
from the same group) do not count as a change in treatment:

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


24 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

Figure 6: The distribution of grouped prodcodes across three patients. (A) Five groups of valid
prescription prodcodes, (B) only three groups, (C) five valid groups, in addition to prodcodes
101 and 1 which are ignored.

> library(ggplot2)
> library(ggalluvial)
> structureList <- list(Amitriptyline = c(83,49,1888),
+ Propranolol = c(707,297,769),
+ Topiramate = c(11237),
+ Venlafaxine = c(470,301,39359),
+ Lisinopril = c(78,65,277),
+ Atenolol = c(5,24,26),
+ Candesartan = c(531)
+ )
> resultList <- mapDrugTrajectory(df = testTherapyDF,
+ minDepth = 5,
+ maxDepth = 5,
+ groupingList = structureList,
+ removeUndefinedCode = TRUE)
> df <- resultList[[3]]
> ggSwitch <- ggplot(df,
+ aes(y = Freq, axis1 = FirstDrug, axis2 = Switch1,
+ axis3 = Switch2, axis4 = Switch3, axis5 = Switch4)) +
+ geom_alluvium(aes(fill = FirstDrug), width = 1/12) +
+ geom_stratum(width = 1/12, fill = "black", color = "grey") +
+ geom_label(stat = "stratum", infer.label = TRUE) +
+ scale_fill_brewer(type = "qual", palette = "Set1") +
+ theme_bw() + theme(legend.position = "none") +
+ scale_x_discrete(limits = c("First Drug", "1st Switch", "2nd Switch",
+ "3rd Switch","4th Switch"),
+ expand = c(.05, .05)) +
+ ggtitle("Migraine Preventative Switching Among Patients")

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


Journal of Statistical Software 25

Venlafaxine

Propranolol

Lisinopril

Atenolol

Amitriptyline

Candesartan

Venlafaxine

Propranolol

Lisinopril

Atenolol

Amitriptyline

TopiramateCandesartan

Venlafaxine

Propranolol

Lisinopril

Atenolol

Amitriptyline

Topiramate
Candesartan

Venlafaxine

Propranolol

Lisinopril

Atenolol

Amitriptyline

TopiramateCandesartan
Venlafaxine

Propranolol

Lisinopril

Atenolol

Amitriptyline

0

100

200

300

First Drug 1st Switch 2nd Switch 3rd Switch 4th Switch

F
re

q

Migraine Preventative Switching Among Patients

Figure 7: Prescription pattern switching of seven different migraine preventatives. A patient
required a a minimum of five changes in prescriptions (including the initial prescription) and,
equally, the display was set to five changes in prescription.

3.5. Prescription timeline construction

rdrugtrajectory contains several functions that transforms patient data into a format com-
patible with mean cumulative function (MCF) semi-parametric estimates, prescription per-
sistence, prescription incidence, and survival analysis.

generateMCFOneGroup

Prescription events are binned into weekly units to increase the statistical power at each time
point. The user presents a group at a time, for example, all clinical events of male patients
with a first-line prescription of amitriptyline for a migraine. The clinical data has already
been refined using the steps for first-line prescription, as described above. The function
generateMCFOneGroup accepts a dataframe or events, the MCF start date (eventdates are
adjusted so all patient records in the dataset begin at the same time), and the minimum
number of events per patients (by default this is two events). The following example presents
the calculation of first prescription events, the assignment of gender and the calculation of

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


26 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

MCF of prescription (therapy dataframe) burden of amitriptyline and propranolol:

> library(rdrugtrajectory)
> fileLocation <- "product.txt"
> resultList <- getFirstDrugPrescription(df = testTherapyDF,
+ idList = NULL,
+ prodCodesVector = NULL,
+ descriptionFile = fileLocation)
> patidList <- resultList[[1]]
> eventdateList <- resultList[[2]]
> drugFrequencyDF <- resultList[[3]]
> drugFrequencyDF <- drugFrequencyDF[order(drugFrequencyDF$Frequency,
+ decreasing = TRUE), ]
> amitriptylinePatid <- patidList$`83`
> propranololPatid <- patidList$`707`
> maleCode <- 1
> malePatidsDF <- getGenderOfPatients(idList = getUniquePatidList(testTherapyDF),
+ genderDF = ageGenderDF,
+ genderCodeVector = maleCode)
> amitriptylineMalePatids <- subset(amitriptylinePatid,
+ amitriptylinePatid %in% malePatidsDF$patid)
> propranololMalePatids <- subset(propranololPatid,
+ propranololPatid %in% malePatidsDF$patid)
> amiMaleTherapyDF <- subset(testTherapyDF,
+ testTherapyDF$patid %in% amitriptylineMalePatids)
> propMaleTherapyDF <- subset(testTherapyDF,
+ testTherapyDF$patid %in% propranololMalePatids)
> amiMaleMCFDF <- generateMCFOneGroup(therapyDF = amiMaleTherapyDF,
+ startDateCharVector = "2000-01-01",
+ minRecords = 2)
> propMaleMCFDF <- generateMCFOneGroup(therapyDF = propMaleTherapyDF,
+ startDateCharVector = "2000-01-01",
+ minRecords = 2)
> amiMaleMCFDF <- cbind(amiMaleMCFDF, Drug = "Amitriptyline")
> propMaleMCFDF <- cbind(propMaleMCFDF, Drug = "Propranolol")
> drugMCFDF <- rbind(amiMaleMCFDF, propMaleMCFDF)
> resultMCF <- reda::mcf(reda::Recur(week, id, No.) ~ Drug, data = drugMCFDF)
> mcfPlot <- reda::plot(resultMCF, conf.int=TRUE) +
+ ggplot2::xlab("Weeks") + ggplot2::theme_bw() + ggplot2::ggtitle("")

getFirstDrugIncidenceRate
Prescription incidence be calculated with getFirstDrugIncidenceRate. The following code
demonstrates how to use a FirstDrugObject to calculate incidence rates for a set of prodcodes.
The study observation starts from the enrollmentDate and ends at the studyEndDate:

> library(rdrugtrajectory)

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


Journal of Statistical Software 27

0

100

200

300

0 250 500 750
Weeks

M
C

F
 E

st
im

a
te

s

Drug

Amitriptyline

Propranolol

Figure 8: MCF of drug prescriptions of patients with a first drug prescription for either
amitriptyline or propranolol, stratified by gender. The dotted lines indicate a 95% confidence
interval.

> fileLocation <- "product.txt"
> drugList <- unique(testTherapyDF$prodcode)
> requiredProds <- drugList[1:10]
> firstDrugObject <- getFirstDrugPrescription(df = testTherapyDF,
+ idList = NULL,
+ prodCodesVector = requiredProds,
+ descriptionFile = fileLocation)
> medhistoryDF <- constructMedicalHistory(testClinicalDF, NULL, testTherapyDF)
> patidList <- unlist(firstDrugObject$patidList)
> resultMatrix <- getFirstDrugIncidenceRate(firstDrugObject = firstDrugObject,
+ medHistoryDF = medhistoryDF,
+ enrollmentDate = as.Date("2000-01-01"),
+ studyEndDate = as.Date("2016-12-31"))
> incidenceDF <- as.data.frame(t(resultMatrix), stringsAsFactors = TRUE)

The above example returns an incidence rate of 0.11 per 17 person years over a cohort of

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


28 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

3838 patients. For a detailed description please see Detail for getFirstDrugIncidenceRate
in the reference manual.

getDrugPersistence

Prescription persistence is calculated as the fraction of patients with a prescription for a
specific treatment N-days after the first prescription event. For example, if we wanted to
calculate the fraction of patients with a prescription 365-days after their first prescription,
with a 30-day buffer either side, one specifies a duration of 395-days and a preceding buffer
of 60-days (therefore, capturing the range 335 to 395, 30-days either side of one calender
year):

> library(rdrugtrajectory)
> patientList <- getDrugPersistence(therapyDF = testTherapyDF,
+ idList = NULL,
+ prodcodeList = NULL,
+ duration = 395,
+ buffer = 60,
+ endOfRecordDate = "2017-12-31")

Of 3838 patient therapy records, 954 patients had a prescription 365 (+/- 30) days after
the first prescription event on record, resulting in a crude fraction of only 0.25 patients.
getDrugPersistence only observes events recorded precisely duration days after the first
prescription. The buffer can be used to identify patients who received a prescription shortly
after the end of the duration, but more importantly, to ensure patients actively undergoing
treatment (indicated by a prescription shortly before the desired duration days) are included.
As the buffer is reduced, the fraction of prescription persistence is reduced until the algorithm
attempts to only identify patients with a prescription exactly duration of days after the first
prescription. Future software updates will incorporate repeat prescription data to increase
the accuracy of the calculation.

4. Closing remarks and future work
rdrugtrajectory is an R package which has the potential for exciting applications such as im-
proving clinical decision-making, identifying possible new treatments and analysing outcomes
from existing treatments. We have demonstrated several functions, some of which detail
sorting and matching records whilst others demonstrate fundamental statistical analysis. We
used fabricated clinical and prescription dataframes, along with the age, gender and index
of multiple deprivation score of each patient and presented analyses of cohort-wide prescrip-
tion patterns, first-line treatment distributions, how to stratify by patient characteristics, and
some basic tools to assist longitudinal analysis of prescriptions.
The descriptions presented in this publication are not substitutes for the material in the
reference manual. We recommend the reader consults the R ? help command or reference
manual before running a function. In particular, functions related to the construction of
timelines for survival analysis (time dependent/independent Cox regression, Kaplan Meier
survival curves and mean cumulative function) or a matrix for drug incidence rate requires
fine tuning of several parameters.

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


Journal of Statistical Software 29

0.0

0.1

0.2

0.3

0 25 50 75 100 125
Buffer size (n days before 365)

F
ra

ct
io

n
 o

f 
p

re
sc

ri
p

tio
n

 p
e

rs
is

te
n

ce

Figure 9: The fraction of prescription persistence adjusted by a buffer number of days before
a calender year. As the buffer approaches the value of duration the fraction approaches 1.

The latest release of rdrugtrajectory along with source code and reference manual is available
for download from https://github.com/acnash/rdrugtrajectory. Whilst active members
of the scientific research community we will continue to add new features to rdrugtrajectory
whilst making necessary improvements to existing features.

Acknowledgements
Oxford Science Innovation, NIHR Oxford Biomedical Research Centre and NIHR Oxford
Health Biomedical Research Centre (Informatics and Digital Health theme, grant BRC-1215-
20005). Thanks to Dr Michelle Hardy for assistance with the article.

References

Bally M, Dendukuri N, Rich B, Nadeau L, Helin-Salmivaara A, Garbe E, Brophy JM (2017).
“Risk of Acute Myocardial Infarction with NSAIDs in Real World Use: Bayesian Meta-

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

https://github.com/acnash/rdrugtrajectory
https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/


30 rdrugtrajectory: Analysing Drug Prescriptions in Electronic Health Care Records

Analysis of Individual Patient Data.” British Medical Journal, 357, j1909. doi:10.1136/
bmj.j1909.

Ghosh RE, Crellin E, Beatty S, Donegan K, Myles P, Williams R (2019). “How Clinical
Practice Research Datalink data are used to support pharmacovigilance.” Therapeutic
Advances in Drug Safety, 10, 1–7. doi:10.1177/2042098619854010.

Hepp Z, Dodick DW, Varon SF, Chia J, Matthew N, Gillard P, Hansen RN, Devine EB
(2017). “Persistence and Switching Patterns of Oral Migraine Prophylactic Medications
Among Patients with Chronic Migraine: A Retrospective Claims Analysis.” Cephalalgia,
37(5), 470–485. doi:10.1177/0333102416678382.

Oyinlola JO, Campbell J, Kousoulis AA (2016). “Is Real World Evidence Influencing Practice?
A Systematic Review of CPRD Research in NICE Guidance.” BMC Health Service Research,
16(299), 1–12. doi:10.1186/s12913-016-1562-8.

Affiliation:
Nuffield Department of Clinical Neurosciences
Medical Sciences Division
University of Oxford
Oxford
UK
OX3 9DU
E-mail: anthony.nash@ndcn.ox.ac.uk

Journal of Statistical Software http://www.jstatsoft.org/
published by the Foundation for Open Access Statistics http://www.foastat.org/
MMMMMM YYYY, Volume VV, Issue II Submitted: yyyy-mm-dd
doi:10.18637/jss.v000.i00 Accepted: yyyy-mm-dd

.CC-BY-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is 

The copyright holder for this preprintthis version posted January 9, 2021. ; https://doi.org/10.1101/2021.01.08.425952doi: bioRxiv preprint 

http://dx.doi.org/10.1136/bmj.j1909
http://dx.doi.org/10.1136/bmj.j1909
http://dx.doi.org/10.1177/2042098619854010
http://dx.doi.org/10.1177/0333102416678382
http://dx.doi.org/10.1186/s12913-016-1562-8
mailto:anthony.nash@ndcn.ox.ac.uk
http://www.jstatsoft.org/
http://www.foastat.org/
http://dx.doi.org/10.18637/jss.v000.i00
https://doi.org/10.1101/2021.01.08.425952
http://creativecommons.org/licenses/by-nd/4.0/