key: cord-0168953-2wzgjpr1
authors: Hou, Bojian; Zhang, Hao; Ladizhinsky, Gur; Yang, Stephen; Kuleshov, Volodymyr; Wang, Fei; Yang, Qian
title: Clinical Evidence Engine: Proof-of-Concept For A Clinical-Domain-Agnostic Decision Support Infrastructure
date: 2021-10-31
journal: nan
DOI: nan
sha: fccc8b3b5bbc5194ea75455b4be2797a2399e785
doc_id: 168953
cord_uid: 2wzgjpr1

Abstruse learning algorithms and complex datasets increasingly characterize modern clinical decision support systems (CDSS). As a result, clinicians cannot easily or rapidly scrutinize the CDSS recommendation when facing a difficult diagnosis or treatment decision in practice. Over-trust or under-trust are frequent. Prior research has explored supporting such assessments by explaining DST data inputs and algorithmic mechanisms. This paper explores a different approach: Providing precisely relevant, scientific evidence from biomedical literature. We present a proof-of-concept system, Clinical Evidence Engine, to demonstrate the technical and design feasibility of this approach across three domains (cardiovascular diseases, autism, cancer). Leveraging Clinical BioBERT, the system can effectively identify clinical trial reports based on lengthy clinical questions (e.g.,"risks of catheter infection among adult patients in intensive care unit who require arterial catheters, if treated with povidone iodine-alcohol"). This capability enables the system to identify clinical trials relevant to diagnostic/treatment hypotheses -- a clinician's or a CDSS's. Further, Clinical Evidence Engine can identify key parts of a clinical trial abstract, including patient population (e.g., adult patients in intensive care unit who require arterial catheters), intervention (povidone iodine-alcohol), and outcome (risks of catheter infection). This capability opens up the possibility of enabling clinicians to 1) rapidly determine the match between a clinical trial and a clinical question, and 2) understand the result and contexts of the trial without extensive reading. We demonstrate this potential by illustrating two example use scenarios of the system. We discuss the idea of designing DST explanations not as specific to a DST or an algorithm, but as a domain-agnostic decision support infrastructure.

Biomedical literature can provide valuable "decision support" for clinicians. From best practices to clinical trial reports, the literature contains scientifically proven information that can aid numerous diagnostic and therapeutic dilemmas across all clinical domains. With the rise of Evidence-Based Medicine (EBM), clinicians increasingly turn to literature at point-of-care to inform their decisions [38, 41] ; Medical students receive training on how to formulate their clinical questions into good search terms, for example, training on the PICO (Population, Intervention, Comparison, and Outcome) clinical knowledge representation framework [30] .

Interestingly, literature is rarely in the spotlight of clinical decision support (CDS) research. With the explosive growth in machine learning (ML), CDS systems are increasingly characterized by patient-data-driven inferences and abstruse risk models, each tailored for one clinical decision or one clinical domain. In this context, literature-based systems can appear particularly valuable for clinicians today. They can complement other CDS systems and offer scientific evidence that clinicians can easily understand [8, 14, 47] . They can support many vastly different clinical decisions and domains, including those in data-poor or resource-constrained hospitals. Document-level literature retrieval systems such as PubMed and Google Scholar have proven their worth in clinical practice. However, at points-of-care, clinicians need far more fine-grained tools to identify information that is precisely applicable to their patient case and clinical question at hand [17, 30] . Clinical decision-making is a continuous and iterative process, consisting of a series of micro-decisions [4, 44, 49] . For each micro decision in practice, clinicians could only afford 2-3 minutes on literature search [12] . To be useful, literature systems need to be able to directly answer point-of-care clinical questions such as"What are the risks of catheter infection if an adult patient in intensive care unit who requires arterial catheters is treated with povidone iodine-alcohol rather than chlorhexidine-alcohol?" [30] Existing systems (e.g., PubMed, Trialstreamer [36] ) struggle with processing such complex queries. Searching the query above on PubMed simply returns no results, not to mention eliciting key information from the literature.

This work aims to more precisely address point-of-care clinical questions with biomedical literature. We present Clinical Evidence Engine, a proof-of-concept system that demonstrates the technical feasibility of achieving this goal.

On clinical trial reports of three domains (cardiovascular diseases, autism, cancer), Clinical Evidence Engine can:

(Capability 1) Identify relevant clinical trial reports based on complex, long, clinical-question-like search queries (up to 512 words), queries such as "Would the addition of radiotherapy on top of androgen-deprivation therapy lead to higher risk of bowel toxicity of an adult male patient with locally advanced prostate cancer?" The literature retrieval model of Clinical Evidence Engine achieves an accuracy of 99.44%, when evaluated on synthetic queries based on an established expert-annotated literature dataset [35] . (Capability 2) Identify critical information in clinical trial report abstracts can serve as clinical evidence, i.e., PICO (Population, Intervention, Comparison, and Outcome) information [30] . The PICO classification model of Clinical Evidence Engine achieves a F1 score of 0.74. Both models outperform existing state-of-the-art models. around designing clinical decision supports as domain-agnostic, intelligent information infrastructure, rather than decision-or domain-specific applications. We concretize these opportunities and questions through two example use scenarios of Clinical Evidence Engine : aid clinicians in crutinizing (i) their self-derived decision hypothesis and (ii) an abstruse patient-data-based risk model. This paper makes two contributions. First, it demonstrates novel bioNLP capabilities of harnessing biomedical literature as point-of-care decision. Key to this technical advance is the integrative use of a clinical knowledge representation framework (i.e. PICO [30] ) and large pretrained language models (i.e. Clinical BioBERT [2] ). Second, Clinical Evidence Engine offers an initial design exemplar of a domain-agnostic, intelligent information infrastructure. It offers an alternative perspective to the traditional idea of clinical decision supports as decision-or domain-specific applications. It can serve as a valuable point of reference for the research discourse on future AI-infused healthcare.

Clinicians routinely consult biomedical literature for decision-making. When facing diagnostic and prognostic dilemmas, clinicians search the literature -most often on PubMed -for valuable clinical evidence such as clinical trial reports, best practices, expert-annotated case studies, and biological explanations [17, 21, 27] . Such evidence compliments clinician judgments and patient value, forming the cornerstones of Evidence-Based Medicine [19] .

At points of care, clinicians search literature for highly-specific information to address the clinical question at hand.

Such searches are often very challenging given the overwhelming amount of literature available [17] . To address this challenge, clinicians and medical students receive mandatory trainings on how to translate messy clinical situations into effective literature search queries, for example, by using the PICO framework [18, 30] . Besides specificity, standards of rigor also drive clinicians' literature search. Healthcare communities differentiate the levels of evidence in biomedical literature [7] . They consider population-level evidence (e.g., randomized controlled trials and systematic reviews) more rigorous and trustworthy than isolated observations (e.g., peer-reviewed case studies) [38, 41] . The importance of rigor is also evident in clinicians' choice of literature search tools. In research, clinicians built expert-curated-and-maintained literature databases [1, 15] . In practice, more clinicians use PubMed (a keyword-matching-based search engine) than Google Scholar. Despite higher accuracy, the latter is considered less rigorous, as its rankings consider article popularity [39, 40] .

Empirical research is sparse on how clinicians use literature in practice. This is in sharp contrast with nonmedical domains, where sense-making and HCI research have extensively studied people's organic information search/foraging and decision-making. They created tools to support sense-making, such as novel visualization techniques and crowdsourced knowledge representation structure [9, 10, 37].

Computational decision support (CDS) systems are systems that assist point-of-care decision-making. They promise to improve patient outcomes [31] , reduce medical error [20] , and reduce healthcare disparities [3] . In recent years, with the increasing digitalization of Electronic Health Records (EHR) and the explosive growth in machine learning (ML), clinician-facing CDS research is increasingly characterized by complex algorithms and patient-data-driven inferences [22, 32] . Since the focus of this work is on biomedical literature, we only briefly review this body of work, particularly the challenges it has reported. This is not meant to be a criticism. Instead, these challenges motivated us to use biomedical literature to complement non-literature-based CDS systems. More comprehensive reviews of CDS work and their achievements can be found elsewhere [14, 32, 45, 48] .

• Interpretability and accessibility to clinicians. Abstruse, data-driven algorithms increasingly characterize modern CDS. Clinicians often found these systems too time-consuming to understand, their explanations overly complicated [5, 13, 47] . Over-trusting or under-trusting CDS recommendations are frequent, leading to preventable diagnostic or treatment errors [6, 16, 46, 47] . • Sense of rigor. Clinicians did not always trust the diagnostic/treatment predictions, even when they fully understand how the ML correctly generated the prediction [46, 47] . This is because the standards of rigor for clinical evidence are often at odds with the basis on which the rigor of ML is premised. For example, inference-based diagnostic predictions do not qualify as "empirical evidence" under the levels of evidence pyramid [7] . Empirical research reported that, when judging the trustworthiness of CDS, clinicians asked whether it has been published in top-tier clinical journals [47] .

• Generalizability challenges. Researchers most often tailored data-driven CDS models and designed explanations for particular clinical decisions, data types, and/or algorithms [8, 46] . Given the multitude of clinical decisions involved in caring for each patient, it can seem that, at point-of-care, future clinicians will need to make sense of multiple CDS predictions in quick succession (e.g., blood-test-based diagnostic support, computer vision-based Xray reading, tabular-disease-trajectory prediction, etc.). They are also responsible for scrutinizing each prediction and accounting for its potential biases. This will be extremely difficult.

Within such a milieu, the inherent characteristics of literature -scientifically-proven, domain-agnostic, accessiblecan be particularly valuable for clinicians today [8, 14, 47 ].

Algorithmically retrieving and processing biomedical literature are challenging tasks at the frontier of NLP research. In comparison to other documents, biomedical literature includes complex terminologies, concepts, and relationships that even scientists may not fully understand [33] . Popular literature mining systems, such as PubMed, utilize keyword matching techniques. They can struggle with many search queries that clinicians need at point-of-care [21] .

Large pretrained language models promise substantial advances on these fronts. For example, BERT [26] can perform literature mining tasks such as documents retrieval and key information extraction. Clinical BioBERT [25] can even more effectively capture biomedical and clinical knowledge as it has been fine-tuned with biomedical and clinical documents.

Literature datasets created for enhancing Evidence-Based Medicine (EBM) can also catalyze novel literature mining and information retrieval capabilities. One such dataset is the EBM-NLP dataset [34] , a corpus of 4991 clinical trial report abstracts annotated with PICO elements. The abstracts are extracted from PubMed and focus on cardiovascular diseases, cancer, and autism. Medical experts and crowd-workers collaboratively annotated the PICO elements. These PICO annotations can serve as the ground truth for many EBM-related NLP research efforts.

Researchers have started to leverage these large language models and datasets in creating novel biomedical literature mining systems. For example, a COVID-19 literature mining system that can surface emergent research directions on the topic [23] and a question answering systems also on scholarly COVID literature [42] . The most closely related to point-of-care is TrailStreamer [28] , which uses both ML and rule-based methods to extract PICO information from human-subject Randomized controlled trials (RCTs) reports. While advancing on classification performance, Trialstreamer has limited success in retrieving literature with long or highly-specific queries. For example, Trialstreamer can only define a patient population based on a single clinical condition (e.g., all patients with diabetes). This limits the system from identifying RCTs that match more meaningfully with patient cases, e.g., according to their medical histories, commodities, and demographics.

We set out to create a literature-based system that can support clinicians' point-of-care decision-making. Drawing up prior work, we had two goals: (1) To process long, complex clinical questions as literature search queries, e.g., "what are the risks of catheter infection if an adult patient in intensive care unit who requires arterial catheters is treated with povidone iodine-alcohol rather than chlorhexidine-alcohol?"; (2) To identify not only relevant literature documents but the clinically-relevant information within. These orientations require state-of-the-art bioNLP capabilities. They also differ from the convention of CDS designs, in which each system is tailored for particular clinical decisions or domains.

Towards these goals, we designed a system architecture that integrated both large pre-trained language models (i.e.

Clinical BioBERT [2] ) and a clinical knowledge representation framework (i.e. the PICO framework and annotations [2, 30] .) The former offers the capabilities to process long, complex biomedical texts across domains (up to 512 words);

The latter allows us to identify clinically relevant information from the literature. Figure 1 illustrates the system architecture design. Fig. 1 . Clinical Evidence Engine system architecture. Key to its design is the organic integrative use of large pretrained language models (i.e. Clinical BioBERT [2] ) and a clinical knowledge representation framework (i.e. PICO [30] , highlighted in green.

Clinical Evidence Engine 's backend includes two modules. One is the Document Retrieval Module, which retrieves relevant biomedical literature articles according to long, complex clinical questions as search queries. To the best of our knowledge, no prior work has created such retrieval models. The other is the Information Extraction Module, which identifies and extracts the PICO elements within the article's abstract. Prior research reported challenges in balancing such models' precision and recall, thus causing relatively low F1 scores [34] . We aim to address this challenge.

Given our focus on biomedical literature, we chose to train our models using the EBM-NLP dataset [34] , a clinical trial report dataset with expert annotations of PICO elements for each report. It focuses on three clinical domains: cardiovascular disease, autism, and cancer. As a result, our system will also focus on trial reports on these domains.

We set out to train a document retrieval model by fine-tuning the Clinical BioBERT models on concatenated (query, abstract) pairs:

where ℎ is the BERT model, "[CLS]" and "[SEP]" are the special signs for the input of BERT. At test time, we used this model to predict the probability that a query is associated with each abstract in the dataset. The model then returns a literature document ranking based on the probability scores. Fig. 2 illustrates this process in detail. [18, 30] .

Clinical questions in practice do not always include all PICO elements, for example, some diagnostic questions do not have a comparator [18] . In this light, we generated synthetic queries that included all PICO elements as well as those that only included a subset. This data generation process (Fig. 3 ) generated 4 positive instances and 4 negative instances for each abstract.

Next, we randomly selected 4000 abstracts for training and used the rest 991 for testing. We trained the retrieval model on the 4000 × 8 = 32000 training instances and evaluate it on the 991 × 8 = 7928 testing instances. We run 5

times with different splittings of the training and testing datasets.

This evaluation process revealed a F1 score of 0.9945 for positive document relevance, and 0.9944 for negative document relevance. It also shows that quantitatively, our model outperforms the best results of the keyword matching approach 1 , a common strategy used by popular literature retrieval systems. Table 2 details the performance comparison between this work and the best results from keyword matching approaches, in terms of accuracy, F1 score, confusion matrix. One limitation of this evaluation is that it focused solely on whether the system possesses the ability to identify all relevant documents. Confusion matrix, accuracy and F1 score are sufficient to measure this ability. However, it did not assess the model's ranking abilities (e.g., calculating the NDCG or AUC values). Unfortunately, there exists no ground truth datasets that could enable such evaluation.

The information extraction module aims to identify and summarize the salient elements of clinical evidence from literature abstracts. Based on the PICO framework, we consider Population (P), Intervention/Comparator (I/C), and Outcome (O) as salient information [18, 30] . We used Clinical BioBERT tokenizer to tokenize the words into tokens that are expressed as numerical vectors. Using "(token, label)" pairs as training data, we trained a linear four-class classifier with the EBM-NLP dataset; a classifier that predicts whether each token in the abstract describes P, I/C, or O. After obtaining the classification results, the system groups adjacent words with same annotations and remove duplicates to generate the final PICO phrases. Fig. 4 summarizes the information extraction module workflow. We evaluated this model and compared it to the three state-of-the-art PICO classifiers in [34] . These include logistic regression (LogReg), LSTM-CRF [24] and LSTM-CRF-BERT, which uses BERT as the embedding tool for LSTM-CRF.

For fair comparison, we tested all four models on the extra 200 withheld testing data from EBM-NLP [34] . Our model outperforms other methods in F1 score, achieving balanced precision and recall ( 

Scenario: The clinician team just diagnosed a male adult patient with prostate cancer. At that point, the cancer cells have only been developing locally. Based on their clinical acumen and standard practice, the clinicians have decided to deploy androgen-deprivation therapy (ADT). However, they are unsure, for this particular patient, whether the addition of radiation therapy (RT) would further improve their chance of survival. This is a critical decision, as radiation has substantial side effects and should not be used casually.

Facing this therapeutic dilemma, clinicians start searching for relevant trials on Clinical Evidence Engine using a nuanced description of the scenario as search term.

Clinician's search query: Would the addition of "radiotherapy" on top of "androgen-deprivation therapy" help improve the survival of an adult male patient with "locally advanced prostate cancer"?

Technical capabilities. Based on this query, Clinical Evidence Engine can identify a list of relevant randomized controlled trial reports. The highest ranking result is the report "Final report of the intergroup randomized study of combined androgen-deprivation therapy plus radiotherapy versus androgen-deprivation therapy alone in locally advanced prostate cancer" [29] . It precisely matches the clinical question in terms of clinicians' population, interventions, and outcome of interest. Further, the PICO classifier of Clinical Evidence Engine extracts the following information and can concisely answer the clinical question raised.

• Population: "patients with locally advanced prostate cancer", and more specifically, "Patients with T3-4, N0/Nx, M0

prostate cancer or T1-2 disease with either prostate-specific antigen (PSA) of more than 40 g/L or PSA of 20 to 40 g/L plus Gleason score of 8 to 10"; • Intervention and Comparator: "Combined Androgen-Deprivation Therapy Plus Radiotherapy Versus Androgen-Deprivation Therapy Alone". specifically, "lifelong ADT alone"; • Outcome: "overall survival", "deaths from prostate cancer", and "frequency of adverse events related to bowel toxicity".

Design opportunities. How can literature-based CDS systems best support clinicians' literature sense-making and clinical decision-making with the retrieved PICO elements? Extensive prior HCI work has studied how to support information foraging, sense-making, and decision-making [9, 10] . Clinical Evidence Engine 's novel technical abilities illuminate a clear design space for expanding this research into biomedical domains. For example, prior research has shown that scaffolding search process and results can reduce users' cognitive efforts, enabling them to build up a deeper understanding of the decision being made [9] . PICO information extraction capabilities enable such scaffolding, for example, by allowing clinicians to compare populations and outcomes on comparable interventions across studies ( Figure 5 bottom right) . Prior research has also built tools that allow users to create a collection of composable and reusable "lenses" to reflect their different latent interests [10] . Such tools were effective in improving users' depth of information understanding. Future literature-based CDS tools can explore enabling clinicians to create dynamic and reusable lenses. In this particular search scenario, clinicians may find the patient-population-match lens valuable, as they can use it to identify the trials that most closely align with the patient at hand in terms of cancer type, severity, and spread. Later in the search process, clinicians can shift to a temporal lens, rapidly and effectively examining the temporal progression of ADT-plus-radiotherapy treatment effects (Figure top right) . Finally, visualizing the key clinical Other decision support systems at play. In this context, the nurse practitioner looks up her decision support systems.

One autism detection CDS she uses [43] analyzes a 3-minute video of the girl interacting with her family. It predicts that the girl is 54% likely to have ASD. This system is trained on large video databases of children with and without ASD and has a 92% accuracy. However, the system deploys eight complex ML models to make a diagnostic prediction collectively. The nurse practitioner struggles to decide whether to trust this prediction or not.

Facing this diagnostic dilemma, the nurse practitioner opens up Clinical Evidence Engine and gives it permission to use this patient's medical industry and the featurized video data that the autism detection CDS has produced.

Auto-generated search query: Female child, "disruptive behaviors", "secure parental attachment", "Autism Spectrum Disorder", "social behavior after separation from parents. " • Population: "children aged 12 and 24 months diagnosed with ASD", "children with Autism Spectrum Disorder (ASD) ". • Intervention: "Focused Playtime Intervention (FPI)", a type of "parent-mediated intervention"

• Comparator: n/a • Outcome: "parental perceptions of child attachment", "attachment related outcomes", "attachment-related behaviors", "similarities to those of typically developing children"

Design opportunities. Abstruse algorithms and complex patient-data-driven inferences increasingly characterize modern CDS systems. Clinicians have frequently reported challenges in understanding the trustworthiness of such systems and their predictions, especially under the time constraints of busy clinical work [5, 13, 47] . The multi-learningalgorithm, video-based autism detection system offers merely one example.

Via Clinical Evidence Engine , this work proposed biomedical literature as an alternative approach to providing "explainability" to these complex CDS systems, aiding clinicians in scrutinizing the correctness of each prediction. As shown in the example scenario, to clinicians, evidence from biomedical literature can be much more easily understandable and intuitively convincing of algorithmic inner-workings. This work opens up a clear design space around supporting otherwise-abstruse CDS predictions with evidence from clinical literature, for example, helping clinicians scrutinizing ML predictions by surfacing the clinical-trial-proven casual relations between ML features and its predicted diagnoses ( Figure 6 .) Fig. 6 . Example interface designs of Clinical Evidence Engine as it aids clinician decision-making alongside other diagnostic or prognostic CDS predictions. The literature contains clinical evidence that can validate or invalidate many CDS predictions, therefore helping clinicians scrutinizing them. This functionality can be particularly valuable for complex systems such as deep-learning-based video analysis because their predictions and predictive mechanism are difficult to explain to clinicians.

Open research opportunity: Designing a better blend of heterogeneous clinical decision supports. A key premise of this work is to explore what an AI-infused clinical decision-making process might look like in future healthcare. While research has created numerous CDS systems, most focused on one system, one clinical decision, and one clinical domain.

However, clinical decision-making is a continuous and iterative process; It consists of a series of micro-decisions. These micro-decisions are often cross-modal and cross-disciplinary, therefore involving distinct CDS systems and potential risks. We encourage future research to investigate how heterogeneous AI decision supports (e.g., literature-based and EHR-based) can best collaborate with clinician teams, forming an effective multi-AI, multi-clinician team. This example scenario offers a small first step towards this ambitious goal.

Abstruse learning algorithms and complex datasets increasingly characterize modern decision support system. As a result, clinicians cannot easily or rapidly scrutinize the CDSS recommendation when facing a difficult diagnosis or treatment decision in practice. Over-trusting or under-trusting CDSS recommendations are frequent, leading to preventable diagnostic or treatment errors. Prior research has explored supporting such assessments by explaining DST data inputs and algorithmic mechanisms. This paper explores a different approach: By providing precisely relevant, scientific evidence from biomedical literature. We present a proof-of-concept system, Clinical Evidence Engine, to demonstrate the technical and design feasibility of this approach across three domains (cardiovascular diseases, autism, cancer). It can effectively identify clinical trial reports based on lengthy clinical questions (e.g., "risks of catheter infection among adult patients in intensive care unit who require arterial catheters, if treated with povidone iodine-alcohol"). This capability enables the system to identify clinical trials relevant to diagnostic/treatment hypotheses -a clinician's or a DST's. Further, Clinical Evidence Engine can identify key parts of a clinical trial abstract, including patient population (e.g., adult patients in intensive care unit who require arterial catheters), intervention (povidone iodine-alcohol), and outcome (risks of catheter infection). Through two example use scenarios of the system, we have demonstrated the many design opportunities and open research questions that this capability opens up.

At a higher level, this work proposes a future where intelligent literature tools can serve as a decision support infrastructure and support many clinical decisions across domains. Such an information infrastructure should be valuable both independently (as illustrated in use scenario 1) and when supporting other intelligent systems, particularly for practitioners in rural or low-resource hospitals where data-intensive CDS is less available (scenario 2). A decisionsupport infrastructure -because it operates at a PubMed scale -can have an outsized impact on clinical practice and improving the quality of patient care.

Journal Club for iPhone

Publicly Available Clinical BERT Embeddings

Developing public health clinical decision support systems (CDSS) for the outpatient community in New York City: our experience

How clinical decisions are made

Why Expert Systems Fail

Identification of Gleason pattern 5 on prostatic needle core biopsy: frequency of underdiagnosis and relation to morphology

The Levels of Evidence and their role in Evidence-Based Medicine

Hello AI": Uncovering the Onboarding Needs of Medical Practitioners for Human-AI Collaborative Decision-Making

Mesh: Scaffolding Comparison Tables for Online Decision Making

SearchLens: composing and capturing complex user interests for exploratory search

Using Information Scent to Model User Information Needs and Actions and the Web

Clinical questions raised by clinicians at the point of care: a systematic review

Barriers and facilitators to clinical decision support systems adoption: A systematic review

a systematic review of the implementation of patient decision support interventions into routine clinical practice

UpToDate: a comprehensive clinical database

Frequency and determinants of disagreement and error in gleason scores: A population-based study of prostate cancer

Users' guides to the medical literature: a manual for evidence-based clinical practice

PART II Processes of Developing EBP and Questions in Various Clinical Settings. Evidence-Based Practice for

Evidence-Based Answers to Clinical Questions for Busy Clinicians

Literature review on clinical decision support system reducing medical error

PubMed, Web of Science, or Google Scholar? A behind-the-scenes guide for life scientists

Medication-related clinical decision support in computerized provider order entry systems: a review

Neural architectures for named entity recognition

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Continuous control with deep reinforcement learning

PubMed and beyond: a survey of web tools for searching biomedical literature

Trialstreamer: A living, automatically updated database of clinical trial reports

Final report of the intergroup randomized study of combined androgen-deprivation therapy plus radiotherapy versus androgen-deprivation therapy alone in locally advanced prostate cancer

What is the Best Evidence Medical Education? Research and Development in Medical Education

Seyed Jafar Ehsanzadeh, and Shayan Poursharif. 2021. The effects of clinical decision support system for prescribing medication on patient outcomes and physician practice performance: a systematic review and meta-analysis

Chronic Disease Progression Prediction: Leveraging Case-Based Reasoning and Big Data Analytics

Clinical natural language processing in languages other than english: opportunities and challenges

A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature

Trialstreamer: Mapping and Browsing Medical Evidence in Real-Time

Trialstreamer: mapping and browsing medical evidence in real-time

The Sensemaking Process and Leverage Points for Analyst Technology as Identified Through Cognitive Task Analysis

Evidence-based medicine: revisiting the pyramid of priorities

Retrieving clinical evidence: a comparison of PubMed and Google Scholar for quick clinical searches

Comparing test searches in PubMed and Google Scholar

What clinical information do doctors need?

CAiRE-COVID: A Question Answering and Query-focused Multi-Document Summarization System for COVID-19 Scholarly Information Management

Mobile detection of autism through machine learning on home video: A development and prospective validation study

Enhancing clinical decision making: development of a contiguous definition and conceptual framework

Commentary: Prognostic models: clinically useful or quickly forgotten?

CheXplain: Enabling Physicians to Explore and Understand Data-Driven, AI-Enabled Medical Imaging Analysis

Unremarkable AI: Fitting Intelligent Decision Support into Critical, Clinical Decision-Making Processes

Review of Medical Decision Support Tools : Emerging Opportunity for Interaction Design

Investigating the Heart Pump Implant Decision Process: Opportunities for Decision Support Tools to Help