key: cord-0217221-ovsafiof
authors: Lyu, Tianchu; Liang, Chen; Liu, Jihong; Campbell, Berry; Hung, Peiyin; Shih, Yi-Wen; Ghumman, Nadia; Li, Xiaoming
title: Temporal Events Detector for Pregnancy Care (TED-PC): A Rule-based Algorithm to Infer Gestational Age and Delivery Date from Electronic Health Records of Pregnant Women with and without COVID-19
date: 2022-05-01
journal: nan
DOI: nan
sha: 50e3c5b0f78b0e47d6308369fea377bc1a02c204
doc_id: 217221
cord_uid: ovsafiof

Objective: To develop a rule-based algorithm that detects temporal information of clinical events during pregnancy for women with COVID-19 by inferring gestational weeks and delivery dates from Electronic Health Records (EHR) from the National COVID Cohort Collaborate (N3C). Materials and Methods: The EHR are normalized by the Observational Medical Outcomes Partnership (OMOP) Clinical Data Model (CDM). EHR phenotyping resulted in 270,897 pregnant women (2018-06-01 to 2021-05-31). We developed a rule-based algorithm and performed a multi-level evaluation to test content validity and clinical validity of the algorithm; and extreme value analysis for individuals with<150 or>300 days of gestation. Results: The algorithm identified 296,194 pregnancies (16,659 COVID-19 174 and 744 without COVID-19 peri-pandemic) in 270,897 pregnant women. For inferring gestational age, 95% cases (n=40) have moderate-high accuracy (Cohen Kappa = 0.62); 100% cases (n=40) have moderate-high granularity of temporal information (Cohen Kappa = 1). For inferring delivery dates, the accuracy is 100% (Cohen Kappa = 1). Accuracy of gestational age detection for extreme length of gestation is 93.3% (Cohen Kappa = 1). Mothers with COVID-19 showed higher prevalence in obesity (35.1% vs. 29.5%), diabetes (17.8% vs. 17.0%), chronic obstructive pulmonary disease (COPD) (0.2% vs. 0.1%), respiratory distress syndrome (ARDS) (1.8% vs. 0.2%). Discussion: We explored the characteristics of pregnant women by different timing of COVID-19 with our algorithm: the first to infer temporal information from complete antenatal care and detect the timing of SARS-CoV-2 infection for pregnant women using N3C. Conclusion: The algorithm shows excellent validity in inferring gestational age and delivery dates, which supports national EHR cohorts on N3C studying the impact of COVID-19 on pregnancy.

Recent findings suggested Coronavirus Disease 2019 (COVID-19) to be associated with an increased risk for adverse pregnancy outcomes and neonatal complications. 1, 2 However, there has been limited knowledge pertaining to the timing of SARS-CoV-2 infection during the pregnancy (e.g., infection in a specific trimester, gestational age, or labor and delivery) and its association with pregnant women's real-time clinical presentation. 3 Timing of viral infection is important because fetuses are more vulnerable to maternal complications and/or viral infection during certain gestations. 4, 5 Nevertheless, detecting the timing of viral infection poses substantial challenges to the quality of data and clinical information extraction methodology.

Pregnancy care consists of antenatal, labor and delivery, and postpartum care. 6 Because pregnancy care spans up to 21 months, it involves exceedingly rich clinical data. A complete episode of pregnancy care often involves multiple encounters with multiple health providers, clinical sites, and diverse clinical information systems, meaning that a vast number of clinical events are generated at an unprecedented granularity and data quality. Over decades, comprehensive clinical data for pregnancy care have not been widely available until recent advances in the use of normalized multi-system electronic health records (EHR), such as the National COVID Cohort Collaborative (N3C) 23 , which provide growing Real-World Evidence (RWE) to support pregnancy research during COVID-19 pandemic. [7] [8] [9] [10] One of the unique characteristics of EHR in pregnancy care is the complex temporal relations of clinical events. To better understand the impacts of COVID-19 infections or COVID-19 pandemic on pregnancy health, it is important to know the length of pregnancy and the timing of pregnancyrelated complications or events in relation to the duration of pregnancy (i.e., gestational age). Both mothers and fetuses experience crucial physiological changes and clinical complications during the pregnancy, which generates a substantial number of clinical events in the EHR. Accurate identification of temporal relations of clinical events across the entire episode of pregnancy care is a fundamental step for clinical decision making as well as downstream EHR data mining. Gestational age is also the prerequisite for tracing the timing of SARS-CoV-2 infection and COVID-19 vaccination, health care utilization, and causal pathway to adverse clinical outcomes. Although EHR data have the unique advantages in preserving temporal information of these clinical events, clinical information extraction methods tailored for pregnancy care have been scarce. 12 Challenges in such clinical information extraction are manifold. First, controlled vocabularies (e.g., ICD, SNOMED-CT, LOINC, RxNorm) along with the relational EHR database architecture are designed to preserve some temporal information of clinical events, yet the information suffers from a low level of granularity (e.g., LOINC 95656-5: Gestational age [GA] <30 weeks, LOINC 49085-4: First and Second trimester integrated maternal screen panel), unreliable data entries (e.g., laboratory test results sometimes have delayed time stamps), and incomplete data (e.g., laboratory test results sometime are missing due to many laboratory results are photocopies). Second, approximately 80% of the EHR data consist of unstructured data (i.e., clinical notes), with which a considerable amount of temporal information is in the form of free text that cannot be directly used for quantitative analysis. 13 Third, EHR data for pregnant women are distinct from most other medical specialties in that pregnancy care has scheduled routine visits that are involved with antenatal care, labor and delivery hospitalization, and postpartum care. Incomplete and/or inconsistent data are common because patients are often engaged with different providers, health care systems, and non-pregnancy clinical visits with important data relevant to pregnancy. 12 For example, antenatal care and other medical care during the pregnancy could be at locations different from labor and delivery hospitals. Missing critical information from one clinical site would require researchers to infer such information using data from other sites or other visits. Health records of individual visits span inpatient and various outpatient visits but may not contain explicit and consistent temporal information (e.g., last menstrual period [LMP] 11 , estimated date of delivery [DOD] , and GA).

Current informatics methods for extracting and inferring temporal relations of clinical events include rule-based methods, machine learning, natural language processing (NLP), ontology-based methods, and temporal reasoning. [13] [14] [15] [16] [17] [18] [19] [20] [21] Most of these methods utilized unstructured clinical notes in combination with structured EHR data, which is comprehensive for generic temporal information extraction. However, informatics methods focusing on temporal events among pregnant women's EHR are limited. Among studies that extract or infer DOD and GA, LMP and imaging/lab test results are commonly used data; chart review is a commonly used method. 15, 18, 20, 21 Using LMP data requires the providers to accurately document LMP in the EHR, yet in the real world, many EHR datasets have a lot of missing values in LMP. The use of ultrasound test results or other laboratory test results requires the EHR to comprehensively document both laboratory orders and testing results, yet testing results are often missing in real-world EHR datasets. Additionally, using laboratory data alone for inferring GA may not be accurate due to the individual physiological variation among pregnant women. A recent study utilized comprehensive ICD codes of diagnoses and procedures to infer delivery dates. 22 While the study focused on fullterm pregnancy with comprehensive medical records during the labor and delivery hospitalization, comprehensive methods remain needed for early-stage pregnancy (e.g., extreme preterm, very preterm) and those who have part of the pregnancy care data and conflicting data documented in EHR. Particularly, there is no published methods for extracting temporal relations of clinical events for pregnant women with COVID-19, in which temporal relations of clinical events suggestive for exact time of viral infection, acute phase of COVID-19, and vaccinations are unique yet challenging for studying pregnant women with COVID-19.

To identify temporal relations of clinical events imperative for pregnant women with COVID-19, we developed a rule-based algorithm, namely Temporal Events Extractor for Pregnancy Care (TED-PC), that can infer GA and DOD using both structured EHR data and annotated clinical notes. The algorithm is designed to capture any temporal information to be used for inferring GA and DOD so that the complete temporal relations in a pregnancy episode can be replicated and the timing of SARS-CoV-2 infection (in weeks) can be detected. This design is anticipated to be effective for pregnant women with regular labor and delivery hospitalization, without complete hospitalization records, and those who have pre-term delivery, miscarriage, early-stage pregnancy and termination, and multiple births. This algorithm is specialized for EHR from the National COVID Cohort Collaborate (N3C) because N3C has 1) individual-level data linked among multiple health systems nationwide and 2) normalized procedures, laboratory tests/results, and annotated clinical notes, which enable the reasoning of GA and DOD for patients who commonly have missing and conflicting data. These unique characteristics of N3C are critical for detecting the temporal information for pregnant women with COVID-19. The performance and clinical validity of TED-PC are tested systematically on the N3C platform. Presently, this algorithm is used as a critical clinical information extraction tool to identify comprehensive temporal relations of clinical events from multiple COVID-19 pregnant women cohorts on N3C. It provides a highthroughput informatics solution to the urgent need for mining large-scale pregnant women's EHR data for combating COVID-19.

We used the N3C database, a multi-center clinical data repository that contains de-identified EHR data of individuals with COVID-19 blended with controls (i.e., non-COVID-19). 23 N3C currently has EHR and medical claims data from more than 73 healthcare systems and institutes across 50 states. The EHR data are normalized using the Observational Medical Outcomes Partnership (OMOP) Clinical Data Model (CDM). 23,24 In order to find the full clinical course of each pregnancy, the study cohort included women who met the following conditions: (1) have at least one childbirth between 2018-06-01 and 2021-05-31, (2) be aged between 15 to 49 years old at the DOD, and (3) have at least one GA-related record during the pregnancy.

Because the N3C database is normalized by OMOP CDM, we utilized the following resources for EHR phenotyping. The ATHENA vocabulary is used for retrieving OMOP CDM concept IDs and phenotyping patients with the GA and childbirth-related records. The Algorithms section details the design and the procedures for using the ATHENA vocabulary. Algorithms

To retrieve the full spectrum of each gestation in the EHR data among pregnant women, it is crucial to identify the start date (i.e., pregnancy start) and the end date (i.e., childbirth delivery date) of the pregnancy. The start date can be estimated by the GA-related records that indicate the GA (e.g., in weeks, in a range of weeks, or a particular trimester) and the date of the record. The end date can be estimated by the identification of the DOD. Because some pregnant women's EHR data only have either GA-related records or childbirth delivery records, we first estimated GA and DOD, respectively, which resulted in a cohort of pregnant women with estimated GA, denoted as the GA cohort, and a cohort of pregnant women with estimated DOD, denoted as the DOD cohort. Then we estimated the start date and end date of the pregnancy by consolidating the temporal information from both cohorts. (Figure 1 ).

Phenotyping. The purpose of this step is to find OMOP CDM concepts that can be used to retrieve GA-related information from EHR data. For phenotyping the GA cohort, we used a keyword search strategy followed by a review of retrieved OMOP CDM concepts. Using the ATHENA vocabulary, a set of keywords were reviewed and determined: "trimester", "gestation", and "pregnan" (regular expression of "pregnancy" or "pregnant"). These keywords were used in conjunction with three filters in the ATHENA database: (1) "DOMAIN", which include "Condition", "Observation", "Procedure", "Measurement", etc.; (2) "CONCEPT", which include "Standard" and "Non-standard"; (3) "VALIDITY", which include "Valid" and "Invalid". The pseudo-query is: ("trimester" OR "gestation" OR "pregnan") AND (((DOMAIN = "Condition") OR (DOMAIN = "Observation") OR (DOMAIN = "Procedure") OR (DOMAIN = "Measurement")) AND (CONCEPT = "Standard") AND (VALIDITY = "Valid"))

In the review of the returned OMOP CDM concepts, we applied the following three criteria to narrow down the scope step by step to our target concepts: (a) "Whether a record indicates a pregnant patient"; (b) "If yes, whether the record contains GA information of the patient"; (c) "If yes, what is the value of the GA". Finally, we identified 138 OMOP CDM concepts (See Appendix 1). The researcher (TL) who performed the phenotyping was not involved in the phenotyping evaluation. Rule-based algorithm. We developed a rule-based algorithm to infer GA from EHR data ( Figure  2) . A critical feature of the algorithm is that we divided all extracted OMOP CDM concepts into four accuracy levels based on their clinical meanings and granularity of the date: high, moderatehigh, moderate-low, and low (Table 1 ) and prioritized the retrieval of GA-related information based on accuracy levels. Table 2 shows the pseudocode for the algorithm. Childbirth delivery cohort (DOD cohort) Phenotyping. For phenotyping the DOD cohort, we started with a list of CDC-recommended ICD, DRG, and CPT codes used for childbirth delivery and followed by exploring the relevant OMOP CDM concepts using the semantic relationships of concepts on the ATHENA vocabulary. First, we used a set of ICD-10, DRG, and CPT codes suggestive of childbirth delivery (see Appendix 2) . 25 These codes were used to retrieve corresponding OMOP CDM concepts in the ATHENA vocabulary in which the resulting CDM concepts were then used to identify the childbirth delivery records in the EHR. Second, since these codes may not comprehensively capture all the OMOP CDM concepts indicating childbirth delivery, we explored the semantic relationships of the OMOP CDM concepts retrieved by these codes and supplemented them with the newly identified concepts. 26 The final concept set contained 105 OMOP CDM standard concepts (See Appendix 3). Researchers (TL and YS) performed the phenotyping and were not involved in the phenotyping evaluation. Rule-based algorithm. Upon manual chart review of the EHR data, we found the OMOP CDM concepts with a domain type of procedure have the highest accuracy with respect to determining the DOD, followed by domain types of condition and then observation. Thus, we developed a rulebased algorithm to approximate the true DOD by prioritizing the OMOP CDM 'procedure' domain over the 'condition' domain over the 'observation' domain ( Figure 3 ). Table 3 shows the pseudocode for the algorithm.

All data manipulation, phenotyping, and algorithms were implemented using SQL, R, and PySpark on the N3C platform. Source programming codes are available at N3C, project "[RP-2B9622] Assessing and predicting the clinical outcomes of pregnant women with COVID-19 using machine learning approach." Evaluation We performed a multi-level evaluation to test the validity of the algorithm as well as inter-rater reliability. To test the content validity of the OMOP CDM concepts resulting from the phenotyping, two researchers (CL and NG) independently reviewed the concept IDs and their semantic meanings and properties on the ATHENA vocabulary and rated dichotomously on the relevance of all concept IDs. Inter-rater reliability was measured by Cohen's Kappa. Disagreements were discussed and resolved together with a senior OB/GYN physician (BC).

To test the clinical validity of the algorithm for inferring GA, we randomly selected 30 patients from the final cohort, which resulted in 40 distinct gestations, including multiple gestations. Their comprehensive medical records on GA, excluding laboratory data, were pulled from the EHR. We calculated the start date of pregnancy by subtracting the GA in weeks from the event date for each record. Two clinical experts (CL and NG) independently reviewed the retrieved records and rated them based on two metrics: accuracy (high/moderate/low) and granularity (high/moderate/low). Accuracy is concerned with the level that the selected OMOP CDM concepts can accurately indicate the GA. For example, GA-related records are typically documented during antenatal care visits. Multiple GA-related records, even when documented at different dates, can suggest a consistent GA. The algorithm-selected concept is most accurate if it is among these records. When there are other GA-related records suggesting a GA different from the algorithm-selected record, the accuracy would not be high. Granularity refers to the extent that the algorithm-selected concept can indicate a specific gestational week. For example, the "gestation period, 38 weeks" has a high granularity level whereas "third-trimester pregnancy" has a low granularity level.

To test the clinical validity of the algorithm for inferring DOD, we randomly selected 30 gestations from the final cohort. Their records consisting of procedures, conditions, observations, measurements were pulled from the EHR within ± 14 days of estimated DOD. Two clinical experts (CL and NG) independently reviewed the charts and labeled whether the DOD was correctly inferred by the algorithm.

Despite the average gestation being around 280 days, this estimation varies among individuals. To represent rare cases such as preterm birth, post-term birth, and early-stage termination, we also performed extreme value analysis, in which two clinical experts (TL and CL) performed chart review for 30 randomly selected samples with <150 or >300 days of gestation.

Using TED-PC, we performed descriptive analyses to explore maternal demographics and underlying conditions for pregnant women with (cases) and without COVID-19 (controls) which are characterized by temporal information of the gestational weeks when SARS-CoV-2 infection was identified.

We identified 2,773 OMOP CDM concepts from the ATHENA vocabulary, of which 2,370 indicated the patient being pregnant. Among the concepts relating to pregnancy, 336 have GArelated information. We excluded 189 concepts that either indicated the inaccurate time range broader than one trimester (13 weeks) (e.g., concept ID 21493940: US for pregnancy in the second or third trimester) or did not have corresponding records in the N3C database (e.g., concept ID 3025286: Gestational age estimated from foot length on US by Mercer 1987 method). Totally, 138 concepts contained useful gestational week information with a time range from one week to one trimester. Within the selected 138 concepts, 42 were in high accuracy, 9 were in moderate-high accuracy, 5 were in moderate-low accuracy, and 82 were in low accuracy.

To evaluate phenotyping results, the content validity of the selected concepts was assessed and rated blindfolded by two independent reviewers who did not participate in the phenotyping. Both reviewers rated all concepts as "valid" (100% agreement). sample as low accuracy. The other reviewer rated 35 (87.5%) samples as high accuracy, 3 (7.5%) samples as moderate accuracy, and 2 (5.0%) samples as low accuracy. The Cohen's Kappa with linear weighting is 0.62, CI = [0.35, 0.90]. For granularity, both reviewers rated the 39 samples as high granularity and one sample as low granularity (100% agreement, unweighted Cohen's Kappa = 1). See Table 4 for the confusion matrix.

For the DOD algorithm, a total of 30 patients' EHR were reviewed independently. Both reviewers rated the 30 samples to be accurate (100% agreement, unweighted Cohen's Kappa = 1)

We randomly selected 30 gestations with the gestation length either smaller than 150 days or greater than 300 days and extracted their EHR. After chart review, 28 of the 30 gestations were extracted with correct GA information, with an accuracy of 93.3%. For the two error cases, the first one was due to the contradiction between the GA records on different dates. For example, there was a record with the concept name "Gestation period, 38 weeks" on date 1, but other records with the same concept on date 2. The second error case was due to the contradiction between the GA records on the same date. For example, on the same date, one record was with concept name "Gestation period, 36 weeks" and another record was with concept name "Gestation period, 39 weeks". Among the correctly inferred cases, a certain level of inaccuracy existed. Five gestations only had low accuracy level GA records. Among the extreme case, one of them had only one GA record. Inter-rater reliability is 100% (unweighted Cohen's Kappa = 1).

Between 2018-06-01 and 2021-05-31, a total of 296,194 gestations in 270,897 pregnant women were identified from the N3C database. The mean and the median ages are 30.31 and 31, respectively. There were 245,892 women who had one pregnancy during the study time, 24,713 and 292 had two and three pregnancies, respectively. The mean gestation length was 274.14 days. The median was 278 days, with a minimum of 140 days and a maximum of 308 days. N3C data retrieval was completed on 02/12/2022.

Using TED-PC, we identified the timing of SARS-CoV-2 infections in gestational weeks. Figure  4 shows the frequency of infections across gestational weeks. More than half of the infections happened during late pregnancy (between 32 and 41 week), which might be related to the increased antenatal visits in late pregnancy. Table 5 provides the selected demographics and underlying conditions of the cohort captured by TED-PC, stratified by trimesters. There were 104,791 and 191,403 gestations before and during the COVID-19 pandemic, respectively, among which there are 16,659 gestations with COVID-19 and 174,744 without COVID-19 peri-pandemic. Age group 30-34 shared the largest proportion across the age groups, followed by the age groups 25-29, 40-44, and 20-24. White people made up the largest percentage of nearly 50% of the total population, followed by Black and Hispanic/Latino races. For mothers who had ever been infected by SARS-CoV-2 before the DOD (before or during the pregnancy), age groups 30-34, 20-24, and 25-29 had the largest percentage. Besides, the percentage of the White was smaller (39.7%) compared with that of pre-pandemic (57.1%). Hispanic/Latino made up the largest proportion across the races in those SARS-CoV-2 infected mothers (31.5%). Compared with the pregnant women without COVID-19, pregnant women with COVID-19 had a higher prevalence in obesity (35.1% vs. 29.5%), diabetes (17.8% vs. 17.0%), chronic obstructive pulmonary disease (COPD) (0.2% vs. 0.1%), respiratory distress syndrome (ARDS) (1.8% vs. 0.2%), myocardial infarction (0.2% vs. 0.1%), and HIV/AIDS (0.6% vs. 0.4%). The characteristics of the proportions shared similar trends when stratified by different trimesters.

Using N3C data, we created the first EHR-based cohort of SARS-CoV-2-infected pregnant women with complete temporal information of clinical events spanning the gestation length, which supports urgently needed COVID-19 research for pregnant women in the US. Our algorithm is among the first that can detect temporal information of pregnancy care including early-state pregnancy, preterm birth, early termination, and post-term birth. This algorithm shows the promise to underpin EHR deep phenotyping of pregnancy care as well as machine learning methods that require precise temporal information of clinical events. As a rapid development of clinical information extraction tool for combating COVID-19, our algorithm is currently supporting several EHR-based cohort studies on the N3C to examine the impact of COVID-19 on pregnant women's real-time clinical inflammatory progression and pregnancy complications.

The accuracy of the TED-PC is warranted by a few logic layers. First, compared with the previous studies that focused on the claims data or required labor efforts, our study took advantage of the OMOP CDM to normalize EHR data and categorized normalized concepts into different priority groups. 15, [20] [21] [22] [27] [28] [29] [30] For example, the GA algorithm prioritized the concepts in the "Procedure" domain over "Condition" domain over "Observation" domain, which logically prevented the algorithm from selecting the records that were at a higher risk of semantic ambiguity and low granularity. Second, our algorithm categorized the GA-related concepts into different accuracy levels by indicating time range. This step allows the TED-PC to prioritize the records with the most accurate information. Third, the 270-day interval in our algorithm enabled us to distinguish different gestations of the same mother within the time frame for both the estimations of GA and DOD. Fourth, the merging and matching process of the GA cohort and the DOD cohort can exclude the gestations with untrustworthy or missing values that are due to incomplete EHR data.

Detection of temporal information for pregnant women with COVID-19 is made available by using N3C data for its two major features. First, N3C has multi-system EHR data linked at individual level. This unique feature enables our algorithm to impute a huge amount of missing temporal values and to resolve conflicts of temporal values among health records from different hospital systems. Second, N3C includes annotated clinical notes, procedures, and laboratory tests/results that are normalized with OMOP CDM, which allow the algorithms to leverage multi-source contextual information for inferring temporal information at an adequate level of granularity.

A few limitations of this study warrant to note. First, because several GA-related OMOP CDM concepts do not indicate specific gestational weeks (e.g., "Spontaneous onset of labor between 37 and 39 week gestation with planned cesarean section"), we inferred the gestational weeks using the median time point of the range, which may impair the performance of the algorithm. This impact is mild on the concepts with high or moderate-high accuracy levels since the time range is small, but it could be severe in the concepts with the low accuracy level. Second, our EHR data may not be comprehensive. For example, some examinations or laboratory tests do not have time information, but they are often prescribed to the mothers during a specific time frame of gestation. Our future direction will aim to improve the performance of TED-PC and test the external validity. From error analysis, data incompleteness and inconsistency remain the major sources of error. Well-designed EHR data imputation methods and a hybrid model of rule-based and machine learning algorithms hold promises for addressing these issues. Although our algorithm is designed for N3C data, it could be potentially repurposed for other OMOP CDM normalized EHR.

We explored and compared the characteristics of pregnant women by different timing of SARS-CoV-2 infection with our newly developed technique: TED-PC, a rule-based algorithm to automatically infer comprehensive temporal information of clinical events from EHR during the pregnancy care. The performance of TED-PC is satisfactory as collectively, accuracy and granularity of temporal information are beyond 90%. TED-PC has been implemented on N3C, supporting multiple national EHR cohorts for desperately needed research on the impact of COVID-19 on pregnancy. TED-PC is designed for N3C data but remains generalizable for OMOP CDM normalized EHR.

This study is sponsored by NIH/NIAID under award 3R01AI127203-05S2. We thank Ms. Ashlee Kim for her support of medical coding.

The analyses described in this publication were conducted with data or tools accessed through the NCATS N3C Data Enclave https://covid.cd2h.org and N3C Attribution & Publication Policy v 1.2-2020-08-25b supported by NCATS U24 TR002306. This research was possible because of the patients whose information is included within the data and the organizations (https://ncats.nih.gov/n3c/resources/data-contribution/data-transfer-agreement-signatories) and scientists who have contributed to the on-going development of this community resource (https://doi.org/10.1093/jamia/ocaa196). IRB

The N3C data transfer to NCATS is performed under a Johns Hopkins University Reliance Protocol # IRB00249128 or individual site agreements with NIH. The N3C Data Enclave is managed under the authority of the NIH; information can be found at https://ncats.nih.gov/n3c/resources.

We gratefully acknowledge contributions from the following N3C core teams (leads designated with asterisks): We acknowledge support from many grants; the content is solely the responsibility of the authors and does not necessarily represent the official views of the N3C Program, the NIH or other funders. In addition, access to N3C Data Enclave resources does not imply endorsement of the research project and/or results by NIH or NCATS.

The following institutions whose data is released or pending: The concept name does not specify the value of GA in week but specifies the range of GA in week which is larger than 1 week and smaller than 6 weeks (e.g., Gestation 9-13 weeks) 4181468: Gestation 9-13 weeks 44791171: 9-13 weeks gestational age 45757118: Spontaneous onset of labor between 37 and 39 weeks gestation with planned cesarean section Moderate -Low 6-10 weeks

The concept name does not specify the value of GA in week but specifies the range of GA in week which is larger than 5 week and smaller than 11 weeks (e.g., Gestation 14-20 weeks) 

• Individuals at the sites who are responsible for creating the datasets and submitting data to N3C

Individuals who create the scripts that the sites use to submit their data, based on the COVID and Long COVID definitions

Elizabeth Zampino • Analytics Team (Individuals who build the Enclave infrastructure, help create codesets, variables, and help Domain Teams and project teams with their datasets

Clinical manifestations, risk factors, and maternal and perinatal outcomes of coronavirus disease 2019 in pregnancy: living systematic review and meta-analysis

Update: Characteristics of Symptomatic Women of Reproductive Age with Laboratory-Confirmed SARS-CoV-2 Infection by Pregnancy Status -United States

The effect of maternal SARS-CoV-2 infection timing on birth outcomes: a retrospective multicentre cohort study. The Lancet Digital Health

Timing of prenatal maternal exposure to severe life events and adverse pregnancy outcomes: a population study of 2.6 million pregnancies

Risks associated with viral infections during pregnancy. The Journal of Clinical Investigation

Pregnancy care

Definition, structure, content, use and impacts of electronic health records: A review of the research literature

Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research

Preeclampsia and Cardiovascular Disease in a Large UK Pregnancy Cohort of Linked Electronic Health Records

Ontology to identify pregnant women in electronic health records: primary care sentinel network database study

Term" Pregnancy: Recommendations From the Defining

A data extraction algorithm for assessment of contraceptive counseling and provision

Temporal reasoning over clinical text: the state of the art

Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models

Development of an Algorithm to Identify Pregnancy Episodes in an Integrated Health Care Delivery System

Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives

Time event ontology (TEO): to support semantic representation and reasoning of complex temporal relations of clinical events

Validation of an algorithm to estimate gestational age in electronic health plan databases

We evaluated the performance of the GA algorithm in two dimensions: accuracy and granularity. Among the 30 randomly selected mothers, eight of them had two gestations and one of them had three gestations during the study time frame. The mean gestation length was 270.15 days with a maximum of 299 days and a minimum of 159 days. Among the 40 pregnancies, one reviewer rated