key: cord-0266607-bdhuxduu
authors: Liu, Y.; Gaunt, T. R.
title: Triangulating evidence in health sciences with Annotated Semantic Queries
date: 2022-04-16
journal: nan
DOI: 10.1101/2022.04.12.22273803
sha: de11597418da71bcb17628d1d149b44280b4659d
doc_id: 266607
cord_uid: bdhuxduu

Integrating information from data sources representing different study designs has the potential to strengthen evidence in population health research. However, this concept of evidence "triangulation" presents a number of challenges for systematically identifying and integrating relevant information. We present ASQ (Annotated Semantic Queries), a natural language query interface to the integrated biomedical entities and epidemiological evidence in EpiGraphDB, which enables users to extract "claims" from a piece of unstructured text, and then investigate the evidence that could either support, contradict the claims, or offer additional information to the query. This approach has the potential to support the rapid review of pre-prints, grant applications, conference abstracts and articles submitted for peer review. ASQ implements strategies to harmonize biomedical entities in different taxonomies and evidence from different sources, to facilitate evidence triangulation and interpretation. ASQ is openly available at https://asq.epigraphdb.org.

[ Figure 1 here] 49 The ASQ platform is developed as a natural language interface component to the epidemi-50 ological evidence integrated in EpiGraphDB database and ecosystem ( Figure 1) , with the aim 51 of allowing users to access EpiGraphDB knowledge and triangulate the evidence using a sim-52 ple scientific claim of interest as a starting point. For example, instead of relying on bespoke 53 topic-specific web queries that are restricted to several entities or meta-entities or via struc-54 tural queries to the database, ASQ presents the integrated evidence from EpiGraphDB as in- 55 trospectable evidence items that "fact-check" a claim "glucose can be used to treat diabetes" 56 ( Figure 2 ) or a short piece of text containing multiple such claims. 57 [ Figure 2 here] 58 Various components of epidemiological evidence from EpiGraphDB are incorporated into 59 the ASQ platform as two evidence groups (Table 1) 2). Typically an evidence item in this group is comprised of a semantic triple in the form 64 of Subject-PREDICATE-Object (e.g. "Obesity CAUSES Asthma") and the multiple 2. An association evidence group which consists of various sources of curated systematic 70 statistical association analysis studies using systematic Mendelian Randomization analy-71 ses (the [MR_EVE_MR] 12 relationships in EpiGraphDB data; see Supplementary Table 4   72 for notation conventions), genetic correlations (the [GEN_COR] 13 relationships), and poly-73 genic risk score associations (the [PRS] 14 relationships), where the analyses are con-74 ducted between two human traits for which genome-wide association study (GWAS) data 75 are curated by OpenGWAS 6 (the (Gwas) nodes in EpiGraphDB). ASQ incorporates the 76 common properties of effect size, standard error, P-Value, as well as source/target GWAS 77 traits from the source analysis data as the common quantitative/qualitative information of 78 the evidence items, and additional detailed source-specific properties are also retrieved 79 for users' own investigation. 80 [ Table 1 here] 81 On the web interface, the main entry point for a user to interact with the platform is to input 82 short paragraphs of scientific text (e.g. the abstract of a journal article or pre-print). From this in- 83 put we use SemRep 9 as the query parser to derive query claim triples from the text in the form 84 of Subject-PREDICATE-Object (e.g. "Obesity CAUSES Asthma"). The user is then asked 85 to select a specific triple of interest as the target of the downstream stages of entity harmoniza-86 tion and evidence retrieval. Alternatively, users can either directly input a query claim in the 87 query triple view (https://asq.epigraphdb.org/triple), or start from the MedRxiv sys-88 tematic analysis summary results (https://asq.epigraphdb.org/medrxiv-analysis). 89 In the following entity harmonization stage, ASQ harmonizes the biomedical entities from 90 the claim triple with the Experimental Factor Ontology (EFO 5 ) entities, with the EFO ontology 91 serving as the anchor to connect the query entities and any evidence entities (Section 4.1). By 92 default ASQ attempts to retrieve entities that are semantically highly related (but not exclusively 93 identical) to the query entities to allow for exploratory discovery about further evidence of po-94 tential interest. This can be adjusted to more restrictive (specific) or more liberal (sensitive) 95 mapping. 96 In the evidence retrieval stage, evidence items from the two evidence groups are retrieved 97 based on the biomedical entities harmonized in the previous stage, as well as on the predi-98 cate direction group ("directional" and "non-directional") of the claim triple. Evidence items 99 are then categorised into several evidence types (Section 4.2): (a) supports the query claim 100 ("supporting"), (b) contradicts with the query claim with retrieved items indicating evidence in 101 the opposite direction to the claim ("reversal"), (c) fail to meet the required evidence threshold 102 to be supporting or contradictory ("insufficient") or (d) could be of additional information ("addi- 103 tional") to the claim. Retrieved individual evidence item and groups of items are then measured 104 with a score to reflect both the proximity of the involved entity to the query claim as well as 105 the strength of the evidence (Section 4.3). For triple and literature evidence the strength of the 106 evidence item is calculated based on the number of literature sources, whereas for associa-107 tion evidence the strength is calculated based on the standardized effect size. The evidence 108 strength score is then adjusted by a mapping score measuring the semantic similarities between dence, users can further introspect the context detail from which the semantic triples are derived, and for association evidence, ASQ displays the statistical results on forest plots as quan-115 titative comparisons. In addition to the default interactive session on the web interface, ASQ 116 offers programmatic access via the API (See "Code availability" section) which allows for batch 117 processing and analysis (e.g. Section 2.2). 118 [ Figure 3 here] 119 2.2 Systematic analysis of MedRxiv submissions 120 121 We demonstrate the use of ASQ by systematically analysing the preprint submissions on MedRxiv 122 in the sample period from 2020-01-01 to 2021-12-31 ( Figure 4 ). We will further discuss the tech-123 nical details and the relevant terminology covered here in Section 4. Using the MedRxiv/BioRxiv 124 API, we identified 28,846 unique submissions in the period (in the case of multiple versions in 125 a submission we kept only the initial version) and retrieved their abstracts as candidate text 126 documents containing multiple scientific claims to be parsed in SemRep. Out of all the candi-127 date documents, 13,999 documents were successfully parsed by SemRep to contain coherent 128 semantic triples at sentence level, and 6,870 documents were identified to contain suitable pred-129 icates for analysis in ASQ. In total we extracted 13,295 document-triples (14,436 claim triples) 130 as the sample dataset. 131 Each claim triple was processed through ASQ programmatically to map with EFO and ev-132 idence entities in the entity harmonization stage using a set of parameters which are equiv-133 alent to the default settings used in the web interface (see Supplementary Table 5 for con-134 figuration of parameters), with 1,446 document-triples identified to be valid and associated 135 with entities in EpiGraphDB. Amongst these document-triples, we found that "Disease or Syn-136 drome" is the most numerous semantic group (888 query terms, 7,831 EFO entities), followed 137 by "Mental or Behavioral Dysfunction" (125 query terms, 1,708 EFO entities), and "Neoplas-138 tic Process" (138 query terms, 1,516 EFO entities) (Supplementary Table 7 ). In order to 139 avoid the document-triple dataset for analysis being too large we used the intersection sub-140 set where a document-triple must contain at least one type of evidence in both the triple 141 and literature evidence group and the association evidence group (Supplementary Table 6 149 In the entity harmonization stage of the systematic analysis, the retrieval of EFO entities is 150 determined by an initial stage where EFO candidates are retrieved by the semantic similarities 151 between the EFO candidates and the query subject/object terms by their encoded text vectors, 152 and a subset is subsequently selected based on proximity of the query term and the candidates 153 in the EFO graph as indicated by the identity scores, which then gets mapped to evidence 154 entities via semantic similarities (See Section 4.1). Figure 5 shows the distribution of score 155 4/25 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 16, 2022. ; metrics for the entity harmonization process where query terms are mapped to EFO entities and Supplementary Figure 1 shows the distribution of scores for mappings of evidence entities 157 to the original query terms. For selected entities which have identity scores below the threshold 158 in the automated process, they would also be semantically closer to the query terms than the 159 rest of the retrieved candidates (with mean semantic similarity scores above 0.9), and therefore 160 from a systematic scale ASQ is able to select a set of corresponding EFO entities that have high 161 association to the query terms of interest as the basis for further retrieving evidence entities 162 related to these query terms. Similar automated approach applies in the interactive session, 163 and users are able to optionally override the automated processing of entity harmonization with 164 manual selection of EFO entities of interest or re-adjust the entity selection afterwards. 165 [ Figure 5 here] sufficient strength of effect size in order to qualify into the "supporting" type or the "reversal" 176 type (otherwise they would be of "insufficient" evidence), and therefore the strength scores for 177 "supporting" and "reversal" types are markedly higher than items in the "insufficient" type. In 178 general, the evidence scores for "supporting" and "reversal" association evidence are found to 179 be distributed around the baseline score of 1. In addition, as constituent scores the mapping 184 Whilst results from the systematic analysis reflect the availability of evidence in ASQ and 185 EpiGraphDB in various areas, they also show the popular research topics and themes reflected 186 from MedRxiv submissions in 2020-2021. Figure 7 shows several clusters of research areas 187 with central terms as measured in Table 2 , and example claim triples can be found in Sup-188 plementary Table 9 . In addition to research associated with the COVID-19 pandemic ("Coro-189 navirus infections"), the two areas with highest research submissions and retrieved evidence 190 are regarding obesity and associated diseases ("Obesity", "Diabetes", "Diabetes Mellitus, Insulin-Dependent", "Chronic Kidney Diseases", etc.) and mental health ("Depressive disorder", 192 "Parkinson Disease", "Alzheimer's Disease", "Schizophrenia", etc.). Notably when SemRep fails 193 to recognize a more specific term it will fall back to more general terms, and therefore the term 

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 16, 2022. ;

[ 202 Here we showcase an example in demonstrating the use of ASQ for researchers in triangulat- 203 ing evidence regarding epidemiological research questions. From the systematic dataset, ASQ 204 extracted a claim triple "Obesity CAUSES Heart failure" from a preprint abstract regarding a 205 Mendelian randomization analysis investigating causal relationships between body mass index 206 and heart failure risk 15 , derived from the context "About 40% of the excess risk of HF due to 207 adiposity is driven by SBP, AF, DM and CHD", where "HF" is recognised as heart failure and "adi-208 posity" as obesity. These results can be found on ASQ ( https://asq.epigraphdb.org/ 209 triple?subject=Obesity&object=Heart%20failure&predicate=CAUSES&analysis 210 ). The query subject "Obesity" and object "Heart failure" were mapped to their corresponding on-211 tology counterparts then to evidence entities, from which ASQ then identified suitable evidence 212 items. At aggregate level for triple and literature evidence there are more supporting evidence 213 items (11) with higher aggregated scores (12.80) compared to reversal evidence items (6) with 214 lower aggregated scores (5.46), similarly there are 5 supporting association evidence items 215 with an aggregated score of 5.11 with no reversal evidence identified. Users are able to further 216 investigate the literature that either associate with the claim triple (e.g. 16 and 17 ) or the rever-217 sal claim that heart failure might cause obesity (e.g. 18 ), viewing the surrounding context from 218 the abstract directly in the ASQ interface, or clicking a link to access the original paper. For 219 association evidence ASQ identified several individual findings from the pairwise Mendelian 220 randomization studies with sufficient statistical significance as supporting evidence. ASQ also 221 identified a range of findings that are insufficient in statistical significance to qualify as support- 222 ing evidence, which are useful both in showing the scope of evidence identification but also in 223 determining the cause of a lack of reversal evidence. In this case, the lack of reversal evidence 224 was due to absence of results from the MR-EvE data source (as there were no retrieved in-225 sufficient counterparts to reversal evidence items). In addition, ASQ identified several findings 226 from the PRS Atlas data source, and since the identified trait term "Target heart rate achieved" 227 was not directly equivalent to the query object "Heart failure" ASQ would assign low evidence 228 scores to these findings in the context of the original claim. In general ASQ is able to assist We developed the Annotated Semantic Queries (ASQ) platform as an approach to improve the 235 accessibility of the EpiGraphDB data and ecosystem for users through the implementation of a 236 natural language interface (whilst also enhancing programmatic access). There is an intrinsic 237 problem with integrated data platforms containing rich and complex data: experienced users 238 wish to be presented with flexible access to the data in order to navigate to the elements they 239 want, yet new users can find this complexity overwhelming (even if well documented). From 240 this perspective ASQ provides an accessible natural language query interface for such users to 241 find the evidence relating to a specific claim/question e.g. "Can obesity cause asthma?", which 242 6/25 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 16, 2022. ;

can either be parsed from a short piece of text containing scientific claims, or directly input as 243 a claim triple of Subject PREDICATE Object . In addition to providing a more accessible 244 interface to EpiGraphDB this approach provides a novel way to systematically evaluate a piece 245 of text (such as a pre-print abstract) to identify whether claims within that text are supported 246 by other data. Heterogeneous knowledge types are harmonized in ASQ into intuitive evidence 247 groups making triangulation of evidence in different groups more accessible, without either the 248 need to navigate to various area-specific topics or the need to formulate complex queries. As 249 we have demonstrated with our systematic analysis, the evidence retrieved by ASQ can be 250 of high value and relevance to a wide range of researchers epidemiology and health science 251 to assist the triangulation of evidence in their research. This is a generalisable approach that 252 could be applied to a wider array of knowledge graphs and evidence sources to support the 253 development of tools for rapid "semi-automated" (assisted) review of pre-prints. 254 Recent advances in deep learning modelling have contributed to a significant improvement 255 in natural language processing, and ASQ applies our previous method development 19 in com-256 bining sequence classification Transformer models with text vector embeddings for the harmo-257 nization of entities in different taxonomies. ASQ is able to combine the functionalities of parsing 258 free text to generate structural claims with the harmonization of heterogeneous entities and 259 evidence to enable claims to be mapped to with evidence both from literature and semantic 260 knowledge as well as evidence from systematic association analysis. As part of the ASQ plat-261 form we developed a scoring mechanism to prioritise the retrieved evidence item, accounting 262 for the semantic relevance of entities to the query of interest, as well as the strength of the 263 evidence item per se. This score enables users to rapidly evaluate a wide range of evidence, 264 whilst at the same time being able to assess the value of individual evidence items or evidence 265 groups to the query to enable prioritisation. 266 On the other hand, it is worth pointing out that users should not be relying on metrics 267 (whether they are ranking metrics, P-Values, or discrete categories of "accepting" / "incon-268 clusive" / "rejecting") as sole criteria when assessing evidence or as a substitute for detailed 269 investigation, not just in ASQ but also interacting with data platforms. The nature of the hetero- approach should therefore be considered as a support tool that aids evidence identification to 286 assess a claim, but not a comprehensive "fact-checker". 287 7/25 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 16, 2022. ; 289 Here we denote a taxonomy as a catalogue of terms in a specific domain, and an ontology 290 as a tree representing the taxonomy terms in a hierarchical order (e.g. parent terms that are 291 more generic versus descendant terms with more specific meanings; Figure 3B ). In that sense, 292 EFO is the ontology which ASQ uses to infer relationships between biomedical terms from the 293 taxonomies of UMLS and GWAS traits. This is because EpiGraphDB incorporates semantic 294 terms/triples in UMLS, but not its ontological hierarchies (since SemMedDB primarily curates 295 derived triples with mechanistic predicates such as "CAUSES" but not comprehensive ontolo-296 gies), and the GWAS traits are phenotypic trait names from genetic studies collated in the 297 OpenGWAS platform. An entity is then defined as a member of a taxonomy, i.e. a biomedical 298 concept can be represented in a taxonomy as one of its predefined members with an identifier 299 and a label (e.g. UMLS term C1305855 "Body mass index") to various degrees of semantic 300 affinity. Conceptually we refer to the process of resolving the mapping of terms from the claim 301 triple with those from EpiGraphDB evidence as entity harmonization, as it harmonizes entities 302 from different taxonomies into a unifying structure in the ontology (i.e. Figure 2 ). Our objective 303 is to retrieve entities from EpiGraphDB that are semantically similar and ontologically meaning-304 ful with respect to the query terms, while ensuring broader relevant terms are retrieved by not 305 restricting to identical token-level resemblance (which can also be achieved in ASQ by setting 306 very high semantic similarity thresholds). To this end ASQ retrieves EFO entities that would 307 sufficiently represent the query terms in the ontology hierarchy, then retrieve evidence entities 308 that are semantically similar to the selected EFO entities. 309 In ASQ we measure the proximity between two entities in the semantic space by the se- is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 16, 2022. ;

EFO terms and GWAS-EFO terms to infer the distance (number of steps/nodes) between a 333 query term and an EFO term in the ontology tree. An identity score of 0 suggests the two terms 334 are equivalent in the ontology, whereas a score of 1 suggests that the term of interest can be 335 considered as either a direct parent term or a direct descendant term of the reference ontology 336 term (in practice this can be relaxed to 1.5 as the inference model produces a regression score 337 rather than a classification score) and scores above 2 suggest greater distance between the 338 two terms. In previous research on the performance of entity retrieval by various methods 19 339 we showed that BLUEBERT-EFO as a task-specific bespoke model is able to retrieve candidate 340 terms that are closer to a term of interest in the semantic rankings, than naive embeddings from 341 general purpose models (e.g. ScispaCy, BioSentVec 21 , etc.). The retrieval of EFO candidates 342 is also augmented with a pre-filtering step to remove ontology candidate terms that are overly 343 generic to mitigate scenarios where retrieved evidence entities in subsequent steps are less 344 relevant to query terms due to these evidence entities being mapped to generic ontology terms 345 (such as an ontology term "disease"). This is done via the pre-computed information content . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 16, 2022. ; as the supporting evidence, whereas for non-directional claims, evidence is identified from MR-EvE, PRS, and GEN_COR with strong statistical evidence. 378 • Reversal evidence items are those that could sufficiently contradict the claim with iden-379 tified evidence from the reverse direction, and therefore is only applicable to directional 380 predicates. In other words, for a claim "Obesity CAUSES Asthma" evidence that would 381 support a claim "Asthma CAUSES Obesity" would be considered as a reversal evidence 382 item, because it reverses the direction. For both triple and literature evidence group and 383 association group, evidence item where its source node is a mapped evidence object 384 entity and its target node is a mapped evidence subject entity is identified as a reversal 385 evidence item. For association evidence the statistical threshold for supporting evidence 386 also applies. 387 • Insufficient evidence items are identified as candidates for supporting evidence and re-388 versal evidence (when applicable) which fail to meet the desired strength of evidence. This 389 only applies to association evidence, for which P-Value is a quantitative measure. The aim 390 of identifying insufficient evidence is to provide findings on the existence of systematic re-391 sults, i.e. to determine whether the lack of evidence for a claim of interest is due to the 392 absence of evidence (e.g. not curated by EpiGraphDB), or due to existing results failing 393 to support/contradict a claim with sufficient strength. 394 • Additional evidence items are identified as evidence that could be of potential interest 395 to users for further investigation, but which may not be sufficiently specific to inform the 396 acceptance or rejection of a claim. For association evidence, when the claim is direc-397 tional, non-directional evidence from PRS and GEN_COR are candidates for additional 398 evidence. 399 [ Table 3 here] 400 4.3 Score metrics to measure retrieved evidence 401 We introduce scores for the retrieved evidence in order to facilitate the assessment of individual 402 evidence items and provide a simple way to compare between evidence items and groups. 403 However as naive assessment metrics they should be used for simple comparisons and should 404 not replace the actual investigation into specific evidence details. 405 The mapping score P mapping ([0, 1]; Equation 1) of retrieved evidence measures the overall 406 deviation in terms of semantic similarity (S) between the retrieved evidence entities and the orig-407 inal query claim terms, which is a product of semantic similarity scores of associated entities in 408 the entity harmonization stage. A high score indicates that the retrieved evidence is of high se-409 mantic proximity to the query claim of interest, whereas a low score suggests that the semantic 410 relevance of the retrieved entity to the claim is low and therefore the relevance of the evidence 411 to the query should be discounted by the low semantic relevance. If multiple j EFO entities 412 are identified for a query term, but these map to the same evidence entity, the route with the 413 highest score value is chosen as the basis for mapping score calculation. In addition, for triple 414 entities the query terms are added as pseudo-ontology entities as they share the same UMLS 415 taxonomy.

(1)

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 16, 2022. ; https://doi.org/10.1101/2022.04.12.22273803 doi: medRxiv preprint

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 16, 2022. ; Tables  Table 1 

Distribution of the EpiGraphDB knowledge triples which are the source evidence in this study, harmonized into the two evidence categories. Column "Triples" and column "Literature" report respectively the number of literature triples and number of associated source literature articles in a triple and literature evidence group, and column "Associations" report the number of statistical associations in an association evidence group. For example, there are 37,423 literature triples in the form of Term 1 AFFECTS Term 2 where Term 1 and Term 2 are from the term types of aapp ("Amino Acid, Peptide, or Protein"), dsyn ("Disease or Syndrome"), gngm ("Gene or Genome") (a UMLS term can have multiple associated types), and there are 57,928 source literature articles from which the 37,423 literature triples are derived. Similarly, there are 8,966,440 statistical associations from the MR-EvE study 12 between GWAS-es in the UKBiobank categories ( ukb-a , ukb-b , etc.) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 16, 2022. ; Table 2 . Systematic analysis results: top claim terms by retrieved evidence Top claim terms sorted by the number of cases where the query claim triple is associated with both triple and literature evidence as well as association evidence ("T&L. + Assoc."). For example, there are 41 claim triples involving the term "Disease" as either a subject term or an object term where these claim triples are identified with both supporting evidence in triple and literature evidence ("T&L.") and association evidence ("Assoc.") groups, 74 cases identified with supporting triple evidence, 44 cases identified with supporting association evidence, 77 cases identified with any evidence types ("Any"; See Section 4.2 and Table 3 for all evidence types), and 715 cases from the triples in the initial claim parsing stage dataset ("Init."; Figure 4) Summary of how retrieved evidence items are classified based on the predicate direction group, evidence group, and evidence type. The notation S − P → O means a Subject PREDICATE Object triple where the predicate is directional (e.g. a "CAUSES" predicate versus a non-directional predicate "ASSOCIATED_WITH") and the notation S − P → O means a triple with non-directional predicate.

Reversal Insufficient Additional Directional predicates CAUSES , TREATS , PRODUCES , AFFECTS Triple and literature

It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 16, 2022.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Overall architecture design of the EpiGraphDB-ASQ platform and its associated components in the EpiGraphDB ecosystem. Left: EpiGraphDB's biomedical entities (in the form of graph nodes) from different taxonomies are encoded into vector representations which allows for fast information retrieval against the query of interest. Epidemiological evidence (in the form of graph edges) are incorporated into ASQ as harmonized evidence groups. Right: Internal processing workflow of the EpiGraphDB-ASQ platform by the three stages: the claim parsing stage, the entity harmonization stage, and the evidence retrieval stage.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 16, 2022.

A summary network diagram on the retrieved entities and evidence from the ASQ platform regarding a claim "Glucose TREATS Diabetes". The subject and object terms of the query claim are represented as nodes in red, and the predicate as a directed edge. The ontology term (green nodes) "glucose" is identified as the mapped term for the claim subject, and ontology terms "diabetes mellitus", "monogenic diabetes", "Maternal diabetes" are identified as the mapped terms for the claim object in the default setting (which can be adjusted at an interactive session or updated after initial results 

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 16, 2022.

Overview of the web interface functionalities of EpiGraphDB-ASQ. A: Summary of harmonized entities and retrieved evidence regarding the query claim. B: Sub-graph representation of a retrieved ontology entity in the EFO graph. C: Summarised literature information and context details for a retrieved semantic triple "Obesity CAUSES Asthma". D: Forest plot on the statistical association evidence regarding the query claim.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 16, 2022. 

Overview diagram on the systematic analysis results and the primary metrics in the various stages discussed in Section 2.2. This figure complements Figure 1 regarding an individual case with the aspect of systematic scale. Further discussions on the parameter configuration as shown in each of the stages are available in Supplementary Table 5 .

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Distribution of semantic similarity scores, information content scores, and identity scores for retrieved EFO entities in the process of mapping with query UMLS terms, categorised by the semantic type of the UMLS term ("term_type") and score metrics ("score_type") in the retrieval process. Category type: "candidate" for entities retrieved as a candidate, "select" for candidates that are selected in the automated process (Section 4.1), and "not_select" for candidates that are not selected. Left ("all"): Distribution across all semantic types. Middle 1 ("dsyn"): In the "Disease or Syndrome" group. Middle 2 ("mobd"): In the "Mental or Behavioral Dysfunction" group. Right ("neop"): In the "Neoplastic Process" group. This figure reports distributions in the top 3 semantic type groups by entity count (Supplementary Table 7 reports entity counts of all semantic types). Top ("similarity"): By semantic similarity score to measure similarity of term embedding vectors. Center ("ic"): By information content score to measure the concreteness of the term in EFO. Bottom ("identity"): By identity score to measure the inferred relative distance of the UMLS term. The roles of the score metrics take in the harmonization retrieval process are discussed in detail in Section 4.1. Supplementary figure 1 reports the distribution of score metrics for retrieved UMLS and trait entities.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Distribution of evidence scores and its constituent scores (entity mapping scores and evidence strength scores), for the "supporting", "reversal", and "insufficient" evidence types (by rows) in the triple and literature evidence group and the association evidence group (by columns). This figure reports aggregated distributions across directional and non-directional predicate groups, and Supplementary Tables 4 and 5 report detailed distributions by evidence groups, evidence types, and predicate groups. Note an "insufficient" evidence type is only applicable to the association ("assoc") evidence group and not the triple and literature ("triple") evidence group.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted April 16, 2022. ;

Clusters representing research interests as parsed from the MedRxiv abstract sample from 2020-01-01 to 2021-12-31 as well as their corresponding evidence retrieved from EpiGraphDB-ASQ as network diagrams. Nodes coloured in red correspond to the primary claim terms (Table 2 ) and edges coloured in red correspond to relationships involving a primary claim term. A: Obesity cluster with primary terms "Obesity", "Diabetes", "Diabetes Mellitus, Non-Insulin-Dependent", "Chronic Kidney Diseases"; B: Mental illness cluster with primary terms "Depressive disorder", "Alzheimer's Disease", "Schizophrenia", "Parkinson Disease"; C: COVID-19 cluster with primary terms "Coronavirus infections"; The diagrams are generated by retrieving first-degree neighbour nodes for each of the top term nodes, where node size corresponds to term count, and edge width correspond to aggregated supporting evidence scores between nodes. Interactive diagram is available on https://asq.epigraphdb.org/medrxiv-analysis.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted April 16, 2022.

Triangulation in aetiological epidemiology

 3  3  3  3  14  Testosterone  3  5  3  6  21  Diabetes Mellitus  3  4  5  6  25  Triglycerides  3  4  4  5  16  Heart Diseases  3  3  3  3  10  Unipolar Depression  3  4  3  4  32  Myocardial Infarction  3  4  5  6  15  Malignant neoplasm of prostate  3  4  3  4  18  Enthesitis-Related Arthritis  3  4  3  4  25  Behavior  3  3  3  4  13