key: cord-0491115-zunsrw1u authors: Eslami, Mohammad; Tabarestani, Solale; Adeli, Ehsan; Elwyn, Glyn; Elze, Tobias; Wang, Mengyu; Zebardast, Nazlee; Navab, Nassir; Adjouadi, Malek title: Affective Medical Estimation and Decision Making via Visualized Learning and Deep Learning date: 2022-05-09 journal: nan DOI: nan sha: 1d424d5f58d26c5c8f5b3ef7e95287e54fd2df45 doc_id: 491115 cord_uid: zunsrw1u With the advent of sophisticated machine learning (ML) techniques and the promising results they yield, especially in medical applications, where they have been investigated for different tasks to enhance the decision-making process. Since visualization is such an effective tool for human comprehension, memorization, and judgment, we have presented a first-of-its-kind estimation approach we refer to as Visualized Learning for Machine Learning (VL4ML) that not only can serve to assist physicians and clinicians in making reasoned medical decisions, but it also allows to appreciate the uncertainty visualization, which could raise incertitude in making the appropriate classification or prediction. For the proof of concept, and to demonstrate the generalized nature of this visualized estimation approach, five different case studies are examined for different types of tasks including classification, regression, and longitudinal prediction. A survey analysis with more than 100 individuals is also conducted to assess users' feedback on this visualized estimation method. The experiments and the survey demonstrate the practical merits of the VL4ML that include: (1) appreciating visually clinical/medical estimations; (2) getting closer to the patients' preferences; (3) improving doctor-patient communication, and (4) visualizing the uncertainty introduced through the black box effect of the deployed ML algorithm. All the source codes are shared via a GitHub repository. When processing and analyzing medical data, machine learning (ML) is proving to be an effective tool capable of producing clinical-grade results that are comparable to the steps taken by healthcare providers in their decision-making process [1] - [4] . By world-wide collective knowledge and different modalities of data, AI could create a super-human physician. This is by no means a man versus machine * contest, but its an assistive way and the human intellect collaborating with machines [3] and human doctors provide dimensions that artificial intelligence is not yet capable of rendering [5] . In many sectors and applications, including medical settings, assistive AI is preferred over autonomous AI. Integrating the physician or clinician in the decision-making process on cases that are run on automated ML algorithms is key to raising the confidence level in ascertaining a given classification or prediction outcome, and to augmenting our understanding of disease state and its progression in time [3] , [6] . Moreover, patients and their relatives, who have a lot at stake, are eager to understand their health status and be confident in the future steps or plan of action they will have to consider moving forward. This type of communication between doctors and patients has made shared decision-making (SDM) highly recommended by the current healthcare guidelines [7] . This SDM concept requires patients and clinicians to work together to better understand the patient's condition and discuss and plan for a treatment in a most effective way. Therefore, it is better that such AIbased algorithms produce outcomes and predictions that are visually appealing and can be perceived intuitively, overcoming the complexity of tabulated data or graphs and bringing added insight into the opacity of the black box effect inherent to machine learning [8] - [11] . In accordance with the adage "a picture is worth a thousand words" [12] , this study aims to investigate the prospect for producing novel visual displays of medical/clinical estimations made through machine learning (ML) to be communicated to patients and medical professionals in a concise and straightforward way. As means to gauge the confidence level, another aim of this method is to create possibilities for showing the uncertainty of the ML algorithm per case (i.e. subject or patient data) that has been processed. The main idea is to encode the information visually by using colors, intensities, patterns, and shapes rather than using numbers, probabilities, and complex diagnosis labels and graphs with so many overlapping regions. The proposed method named VL4ML stands for visualized Learning for machine learning and to the best of the authors' knowledge, it is a first-of-its-kind. Figures 1A and 1B demonstrate the concept of the proposed VL4ML estimation approach, in which the machine produces its estimations in a visualized way. Figure 1C summarizes the ecosystem of the proposed approach and its many application domains in support of its generalized construct. There are several aspects that guide the motivation behind this visualization platform, among them are: Faster Brain Analysis: Because the human brain did not evolve with the ability to read and write naturally, we rely on a variety of different visual cues to help us comprehend and analyze information better. There is clear evidence that our brain understands and reacts faster to information or data that is visually relayed [13] , [14] . Furthermore, it is found that analyzing a single image takes the brain as little as 13 milliseconds, which is much faster than analyzing any other type of information [15] . Since minimising responsetime to a risk-prone situation is critical, a method called FiToViz was proposed as a visualisation approach the to track and identify abnormalities when a person is under a risk-prone situation. Inspired by a traffic light model of a one-class classifier, they convert the output of the one-class ML classifier to a visualized display [16] . Enhanced Memorization and Recall Ability: Superior memorization and recall for pictures or images over words is empirically well supported, known as Picture Superiority Effect, granted that the behind mechanisms are still a matter of discussion [17] . The National Academy of Sciences reports that statistically the ability of a person for remembering and recalling pictures is significantly high [18] and in a period of several days a human can remember and recall more than 2,000 pictures with at least 90% accuracy in recognition tests even with short presentation times during the learning phase [18] . Augmented Decision-Making: Visualization can facilitate and even augment decision-making. The human brain responds to and processes visual data more effectively than any other data type. The brain receives 90% of its information visually, and it is typically much faster at processing images than text [19] , meaning that visual-data speeds up informed decision-making. In addition, studies show that visualization of data/graphical information/results leads to increased confidence and more correct decisions [20] , [21] . Comprehensive overviews on how visualization affects the decisions are addressed in detail in [22] , [23] and show widespread use and social impact in many domains including cognitive psychology, information visualization, and medical decision making. Patient Preference Supports Visualization: Factors that relate to a diagnosis or prognosis that is easier to understand and interpret leads to the preference for visualization of data and related results rather than having to deal with complex tabulated data. In light of this, patient-centered point-of-care systems and shared decision-making strategies strive to provide patients with meaningful visual representations of information [24] , [25] . Enhanced communication between doctor and patient can also be established with the help of visualization. Moreover, the ease for understanding and with the memorizing/recalling ability of information visually relayed would be more effective, entertaining and engaging for patients to collaborate and take part in the decision making process, especially with the pediatric population and their family members [26] , [27] . Visualizing Uncertainty Injected through Machine Learning: In decision-making, the ability to observe the nature or the degree of uncertainty injected by the design nature of the machine learning algorithm and the mechanics of its implementation is also essential at figuring the extent at which our results can be relied upon. In addition, while the data and outcomes may be discrete and often correct, machine learning techniques are based explicitly on statistical analysis and the population of data, highlighting that uncertainty, out of distribution samples and risk-of-bias always exists for each new case or patient [28] - [30] . This concern always present when using automated algorithms in the era of digital health and being unaware of the so-called black box effect of ML algorithms [28] . Machine-assisted medical decisionmaking calls for the quantification of uncertainty for each patient's dataset that is processed [30] . Learning with a reject option, also known as rejection learning, has been studied for a long time. When there is a lot of ambiguity about a patient, ML should be able to state "I don't know" and abstain from making prediction. In this way, to lessen the ambiguity and make a more informed diagnosis, extra human expertise might be sought (i.e. learning to defer) or additional data can be gathered [30] - [34] . Therefore, to present algorithm outcomes adequately in assistive way to the physicians and patients, ML methods should be able to accommodate the uncertainty introduced by the data or algorithm for each patient/case. It was demonstrated that uncertainty visualizing has an impact on decision-making. [30] , [35] - [37] . The proposed approach is able to visualize a form of uncertainty and may be used to determine criteria for the quantification of subjective or objective uncertainty. This work examines the creation of a pipeline that displays clinical estimations visually, embedding all the aforementioned advantages in a most practical visualized form. • In a first-of-its-kind design approach, deep learning and convolutional neural networks can be deployed to produce clinical estimations visually. • As a proof of concept, and to demonstrate the generalized nature of this visualization approach, five different case studies are examined for different types of tasks including classification, regression, and longitudinal prediction. • A survey analysis is conducted to obtain users' feedback on the merits and benefits of this visualization approach. Fig. 1 . A) The conceptual demonstration of the proposed VL4ML approach. B) Comparing the outputs of the VL4ML approach and standard ways with an example of the target outcomes for patients whom their disease status is converted from Mild to Severe at the 4th time-point. C) The circumstance of the proposed approach. Thus, this work represents an important step towards an end-to-end pipeline for visualization through machine learning with the ability to deliberate on the obtained results through a visually-enhanced decision making process. The main idea is to produce an image or a visual display wherein colors, intensities and patterns are to express the resulting diagnosis, prognosis and prediction outcomes. The first step is therefore the design of the visual display itself, which if the ML model worked properly, the ML-derived visual outcome will be exactly the same as the target image which serves as the ground truth (examples are shown in Figure 1B ). Additionally, we propose to have a black area with zero values to display and reserve as the uncertainty area (RU). Adding such an area would help researchers assess how much the machine learning algorithm has affected such a region. It is important to add such an area to assess if the machine learning algorithm does affect such a region and by how much, to mean that the algorithm itself adds some degree of uncertainty in the output. This criterion can be valid for both ML method developing/training and also for testing phase with new input samples. Out of distribution input samples can affect this area as well as the rest of the designed target image. Next, we need to develop a deep learning network to generate the results in a compelling visualized form as means to overcome the burden of assessing tabulated numerical values, probabilities or categorical labels. The design of the network is flexible, but we need to embed some layers to produce an output image as well as layers for feature extraction and fusion. In general, in most cases, the network needs to include forward convolutional layers (FCL), and transposed convolutional layers (TCL) [38] . The output image should be a tensor of size NxNx3, where NxN defines the image size and 3 is to account for the RGB channels. Obviously if the input data is not in the form of 2D or 3D tensors, reshaping the input into 2D/3D tensor form [39] becomes necessary. The network architecture as proposed is flexible, and any feature extraction and fusion technique could be utilized in order to fit the network with different applications or to improve the system's accuracy and performance for the task at hand. As an example, Figure 2A shows the network developed for experiment 1, the survival prediction in ICU. The loss criteria between the target image and network's output is the mean of absolute error. Five different experiments were conducted to demonstrate adaptability of the proposed VL4ML approach to different domains of application. Empirical evaluations are conducted to evaluate the performance of the proposed method on different tasks including classification, regression, crosssectional and longitudinal studies and to assess the visual displays generated to express the results of classification and predictions from which diagnosis and prognosis can be made for different medical applications. Furthermore, the input modality as well as the target patterns, intensities, colors, and shapes vary with each experiment in order to reflect the flexibility of this visualization scheme. The loss criteria between the target image and network's output is the mean of absolute error and all the experiments are conducted with a 5-fold or 10-fold cross-validation scheme depending on the size of the available dataset. Training is early stopped monitoring the loss of test with 30 epochs patience. The case studies addressed here are as follows: 1) Classification of the patient survival in Intensive Care Unit (ICU): In this case, the tabular data from electronic health records (EHR) are fed to a network to produce visualized survival status using distinct colors in a 23x23 image. This case study is a binary classification application. 2) Assessing the Diabetic Retinopathy grade: In this case, the retina Fundus images are analyzed by a network to estimate the disease grade and image quality by visualizing intensity and colors in a 23x23 image. This experiment involves a multiclass classification as well as a multi-task application. 3) Estimating the patient's length-of-stay in a hospital: The EHR data is fed to a network to estimate visually the length of stay in which the amount of colored columns in a 45x45 image represents the length of stay. This is a regression case study.. 4) Assessing the severeness of COVID-19: Chest X-ray data is used and the results expressed through colorful circles, green for healthy state and red for disease state, where severity of disease state is proportional to the size of the red circles. This experiment is a both a classification and a regression problem. 5) Diagnosis and prognosis of Alzheimer's Disease (AD): In this experiment, a multimodal tabular data are fed into a network for producing a color-coded AD prediction labels in a colored 23x23 image. This experiment is a multiclass classification and prediction in an ADNI (Alzheimer's Disease Neuroimaging Initiative) longitudinal study. Only publicly available datasets are used for the experiments and since the characteristics of each experiment is different, the details are mentioned in the corresponding subsections of the result section. Although the proposed visualization method is practical with a design construct that is adaptable to several medical applications as the results below will demonstrate, there are some limitations to this study. The first is the lack of evaluation scenario. At this stage, we use three expert raters that had to agree on the agreement between the ML outcome and the target image, but outright misclassification/ prediction and ambiguous cases would require expert review of the baseline diagnosis and of the input measures used when making that diagnosis. Although, we illustrate the loss curves of the training and testing phases of each experiments and confirm the convergence ability and average amount of error between the target and ML visual outcome (Figure 7) , the main goal of each experiment is the classification or regression outcome of a clinical phenotype that is presented visually. Since the rater/decision maker is the observer in our pipeline, that would be a subjective decision that could differ between the type of observer (severe/ more open and accepting of subtle change0) and expertise of the observer on the type of disease that is gauged. Thus, we need to figure out a better evaluation criteria for each task type that could include measuring the effect on the region of uncertainty. Another limitation is the amount of participants which at this time include only 103 persons that responded worldwide to the survey. While literature strongly suggests the superiority (or user friendliness) offered through visualization (as our our survey confirms), but the outcome of the proposed approach is subject to persons and needs more comprehensive and careful survey analysis as well as considering the target estimation/application which is visualized and analyzed. In this experiment, colors are used to discriminate among classes. Using data from the first 24 hours of intensive care, this experiment seeks to predict patient survival. The data used in this example is from WiDS Datathon 2020 challenge [40] , [41] which focuses on patient health through data from MIT's GOSSIS (Global Open Source Severity of Illness Score) initiative [42] . This dataset contains tabulated data of electronic health records (EHR) of patients. Survival and death are expressed by 0 or 1, and all the other variables, except for identification information are used as input features, creating a feature vector of 183 values. For the target visualization, the hospital death information is used as labels. An image with size 23x23x3 (RGB format) is designed as a target image in which a white square symbolizes a label of 0 (survival) and an Orange square symbolizes a label of 1 (fatal). Also the surrounding pixels are reserved as RU areas with zero intensities (black color). Figure 7A demonstrates the training/testing loss curves and Figure 2B shows the results for a few patients of the testing set. The reported cases from (a) to (f) have estimations close to targets. Case (c) is rather interesting, this means that although the patient survived his symptoms and the information fed as input to the machine algorithm indicates that the subject had a small probability of dying but survived. For case (e), the patient had a high probability of dying and did not survive. Cases (g) and (h) are two cases in which the machine's estimation is not correct. Case (g) is estimated to have a high probability of dying but survived and it is rather a false alarm, and the last case (h) had a small probability of dying and unfortunately died. This experiment is a multi-class classification problem for assessing the severeness level of the Diabetic Retinopathy (DR) disease by exploiting a large set of retina images taken with fundus photography under a variety of imaging conditions. The investigated dataset is from the APTOS 2019 Blindness Detection challenge [43] . The images were gathered from multiple clinics using a variety of cameras over an extended period of time. The associated dataset contains 3,662 retinal images and a clinician has rated each image for the severity of diabetic retinopathy on a scale of 0 to 4: 0-No DR, 1-Mild, 2 -Moderate, 3 -Severe, 4 -Proliferative DR. All the Fundus images are reshaped to 256x256 resolutions while preserving the colors. The desired output is an image with size 23x23x3 and the shape of the output image is an inner square with a narrow peripheral strip surrounding it. The rest of pixels are reserved for the RU area using black color. Three different kind of outputs are considered for this experiment. • Type II, five different intensity levels are considered for peripheral strip while the color of the inner square is coded. Indeed as shown in Figure 3B , the color of the inner square is bright green (RGB(0,1,0) ), dark green (RGB(0.5,1,0)), yellow (RGB (1,1,0) ), orange(RGB(1,0.5,0)) and red(RGB(1,0,0)) for 0 to 4 levels of disease. • Type III (multi-task): this type is an extension of type II, and in this multitask scenario, the quality of the input Fundus image is also assessed. For this reason, the design includes a rectangle whose intensity represents the quality of the Fundus image. As shown in figure 3C , bright white and dark gray rectangle shows high quality and low quality images, respectively. For this experiment, 800 retina Fundus images of 400 subjects from the DeepDRiD challenge dataset is used [44] . Figure 3 shows the results for some illustrative cases of the test set and the loss curves of training/testing phases are shown in Figure 7B -C. Cases (a) through (e) for types I and II, shows successful predictions in comparison to their target ground truth. On the other hand, the network could not estimate the severity of disease accurately for cases (f), (g) and (h). It can also be inferred that color coding makes interpreting the results faster and easier. Another interesting observation in application II is the peripheral strip. Using color adds complexity to the network, making it harder to determine whether the exterior colors will match the inner colors. As in case (h), where the model could not create a color of orange in the inner square, it also has a hard time keeping the outer square in gray scale. Furthermore, figure 3C illustrates some illustrative examples of the multitask scenario in which a rectangle represents the quality of the Fundus image. Experimental results show that the multitasking scenario is also applicable for VL4ML. Length-of-stay (LOS) is measured in days between the time of hospital admission and the time of discharge. In hospitals, prolonged stays can be expensive for patients as well as for the health system, so identifying patients at high risk of LOS from the onset of their stay is important. This experiment is also a regression problem and in contrast to experiment 2 it is not restricted to 5 levels. The goal of this experiment is to predict the length-of-stay for each patient at the time of admission visually and the LOS value would be any number between 0 to 40 days. The MIT MIMIC-III dataset [45] is used for this experiment due to its availability to the public in the Physionet platform [46] , [47] . There are 52 features selected as inputs, and a 45x45x3 target image is constructed, in which each column corresponds to a day as an indicator of time. (i.e. 45 days). Indeed, if the LOS value is M , all columns from left to M th column would be in cyan color (RGB(0,1,1) ). The 10 rows on top and bottom are colored as black to reflect the RU area. Figure 4A shows the results for a few cases of the test set. Cases (a) through (f) shows the same length of stay in comparison to their ground truth which is on their left side, while cases (g) and (h) seems to have an error in their estimations. The training/testing loss curves are shown in Figure 7D . The aim of this experiment is a binary classification of COVID-19 disease from chest X-ray images along with the regression task of estimating the severity of the disease. The extent of lung involvement by geographic ground-glass opacity is selected as the severeness score. For the COVID-19 cases, data is obtained from COVID-19 image data collection [48] , [49] and normal cases is collected from chest X-Ray Images (Pneumonia) [50] , [51] . There are 94 labeled COVID-19 samples with different severity levels with the same number of normal control samples selected. The target estimation images is designed such that the normal cases are illustrated by green circles, while COVID cases are shown by red circles where radius enlargement reflects disease severity. Figure 4B illustrates the results from some cases of the test set and the loss curves are shown in Figure 7E . As can be observed, healthy cases of (a) and (b) are correctly identified as normal cases using a green filled circle, and the unhealthy cases of (c) to (f) are recognized as subjects that are affected by different severity stages of COVID-19. Cases (g) and (h) are two examples where the network could not make a correct decision and also had difficulty tracing a circular pattern for the resultant shape. There are two important observations here. The first one is related to the healthy cases. While the network has two degrees of freedom for presenting the shape, which is color and radius, for healthy cases, it is restricted to color only. Meaning that if the network interferes with the size of a green circle, this should be interpreted as an error. The second important point is the degree of sharpness in the perimeter of the circle. Cases (e) and (d) are an example A of this scenario, where the blurriness of the resultant circle could be affected by the uncertainty introduced by the system. In this experiment, we investigate a multiclass classification and prediction of disease progression in Alzheimer's disease in a longitudinal study. The ML model is to generate automatically a visual display with different colored stripes to express the different states of the disease. The ) ) is demonstrated by a sequence of 4 stripes in which each strip denotes the disease state by its corresponding color. Stripes are followed by a black strip to denote the RU area. Figure 5a shows four examples of desired target images for four different subjects with four different longitudinal trajectories. The clinical data used in this experiment is obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (http://ida.loni.usc.edu). Only the subjects that have a baseline scan and showed up for follow-up visits at 6, 12 and 24 months later, have been considered in this longitudinal study. A total number of 1043 subjects met the follow-up condition, and were categorized into 3 classes AD, MCI, and CN, in baseline and referral sessions. Multimodal tabular data containing features extracted from MRI and PET sequences, as well as demographic information and cognitive measurements were used as input features as in our prior works [52] , [53] . The training/testing loss curves are plotted in Figure 7F . Figure 5B demonstrates some of the achieved results of the subjects in test set, including a visual representation of the disease trajectory predicted by the network. Furthermore, the method seems to provide a more realistic outcome for patients, in particular for longitudinal analysis. Fro example, while all of the cases (g),(h),(i), and (j) are for patients with steady AD status, the estimated results are different and interesting. Case (g) is diagnosed and labeled correctly, while cases (h) and (i) show different disease progression, where the ML model show that subject in (h) was MCI before it converted to AD at T1, and patient in (i) was also MCI that converted to AD around T2. Case (j) is a misdiagnosed case with moderate certainty. These are illustrative cases that are complex and are shown here to emphasize the challenges faced when dealing with multimodal classification and regression. An online survey has been created with the Qualtrics platform for collecting users' feedback on the practicality of this visualized estimation approach. The objective is to assess user preference in the ability to interpret and understand the visual display. It is also an assessment on how easy it is to remember the visual display and how easy or difficult in making a decision, especially when the outcome is ambiguous and may be infused with some degree of uncertainty. Few initial samples were given to participants to familiarize them with the meaningfulness of the visualization format and its color coding. Ten questions related to three application of survival analysis, disease progression estimation and multitask disease severity estimation were posed in this survey. We shared the survey on Facebook Ads and LinkedIn, and 103 individuals responded to the survey. Figure 6a shows the distribution of collected votes with a few sample questions and the results of the survey. Three exemplary questions and their corresponding results are also shown in Fig. 6 . The full survey and results are available in supplementary materials. Among those who responded to the survey, 51.5% were female, and the remaining 47.47% were male. A large majority of participants (81.55%) were between 25 and 44 years of age, with 15.5% of the participants identified themselves as being healthcare professionals or working with the medical industry. In the application of survival analysis, 95.15% declared that understanding the visualized format is easy or somewhat easy to remember. Among the respondents, only 18.63% preferred the numerical format while 83.47% preferred the visual format. With regards to the application of determining the progression of a disease, 83.49% of the voters agree that the visual representation is easy to learn and to interpret, with 79.55% stating that they would prefer to receive the results in visual format. With an overwhelmingly positive vote of 82.35%, the ease for memorizing/remembering the outcome was favored through visualization. Moreover, 73.79% voted that the visualized format speeds up the decision-making process. As for the level of uncertainty, 81.65% stated that different level of uncertainty is visible in the visualized format. In the multitask case study, again the majority of the participants (87.38%) voted that it is easy to interpret the meaning of what was displayed to them. On the question of interpretability, only 12.62% thought that this format was hard to interpret. And finally, on the uncertainty level, only 9.71% stated that they were not able to discern any level of uncertainty. In this study, a visualized learning method named (VL4ML) is proposed to depict a given medical outcome as means to asses disease state and disease progression by means of classification or regression analysis. The VL4ML method is based on deep convolutional neural networks and five different experiments have been conducted to demonstrate the feasibility of the proposed method and its generalized design construct that is capable of dealing with different experiments involving tasks of multimodal classification, regression analysis and predictions. Various visualization criteria, including color, intensity, shape, size and pattern encoding, have also been considered. According to the survey results, the visualized form is considerably in people's preferences. Figure 6B shows an exemplary experiment over two cases that are diagnosed with progressive AD, where the outcomes of our visualized approach is compared with the numerical results and probability values from conventional methods. Comparing the top image with numerical values on the bottom demonstrates that outcomes of the VL4ML present different levels of information, sensation, and memorization impact that could be conveyed to humans and exploited when making a decision as to the visual outcome that is being evaluated. Furthermore, according to survey study results, over 80 percent of them can detect some level of uncertainty in outcomes. VL4ML embeds within it a display of the uncertainty level when making a decision, especially when the visual display provided through machine learning is different than that of the target image. Therefore, the potentials of this novel approach in delivering a medical diagnosis or a prediction through a visual format is promising. Its simple color-coded format is seen through the survey as highly effective and informative when it comes to understanding the meaningfulness of the display and as an alternative approach for medical experts to communicate with their patients the results that were obtained, leading to a shared decision-making process between doctor and patient. While this study shows the capabilities and advantages of the proposed approach, in addition to the mentioned limitations, more interesting questions could be asked, e.g., what would be the best shape of the RU area and its location regarding to the task types, clinical applications, and machine learning aspects? What is the best image size and the color/shape/pattern to be used for each clinical application with respect to human impression and decisionmaking and also with respect to gauging the accuracy of the machine? What would be the best network architecture? and more importantly how to quantify the region of uncertainty in both subjective and objective ways to unlock the mystery behind the blackbox effect of machine learning? Is uncertainty quantification independent of the designed target images and its color/shape/pattern chosen? Is checkerboard artifact [54] in the output images helpful in quantifying the uncertainty, or reversely should we try to compensate this effect through an inverse transformation? And some more questions. Designing contextual target images that define disease state or disease trajectory within this visualization context is flexible and is left to the imagination of other research groups that would like to replicate such work or create more sophisticated displays in relation to the disease or medical application considered (e.g. image size, color codes, shapes, etc). As an example, Figure 8A shows the AD experiment for prognosis under different lifestyles (e.g. active, rural, ...) [55] and treatment plans (e.g. drug 1, drug 2, ... ) [56] . Fig. 8B shows an example of this potential work in which the vertical axis is also quantitatively meaningful. But, the availability of network models, accuracy's and user's preferences should be considered. In this paper, VL4ML, a first-of-its-kind visualized estimation approach is offered to showcase how medical estimations under different domains of application can be assessed visually as means to gauge disease state and disease progression by means of classification or regression analysis. The practicality of the suggested approach is investigated by creating deep convolutional neural networks-based networks. In support of the flexibility afforded through this new design construct that integrates visualization to machine learning, five different experiments were explored to illustrate the ability of this generalized construct to deal with a variety of experiments under different domains of application. Color, intensity, form, size, and pattern encoding have all been examined as the potential visualization criteria. According to the results of the survey study, individuals prefer the visualized form and find it to be an easier and faster assistive decision-making approach as well as more memorable and easier to recall. In addition, the majority of participants believe that the amount of uncertainty infused by the ML algorithm is perceivable. All of the authors discussed the findings, offered their perspectives, shared their ideas and comments, and critically reviewed the manuscript. M.E. and S.T are responsible for all aspects of the work. Artificial intelligence in medicine. Routledge International evaluation of an ai system for breast cancer screening High-performance medicine: the convergence of human and artificial intelligence Big data and machine learning in health care How is the doctor feeling? icu provider sentiment is associated with diagnostic imaging utilization Interactive machine learning for health informatics: when do we need the human-in-the-loop? When guidelines recommend shared decision-making Do no harm: a roadmap for responsible machine learning for health care Age-dependent health data visualizations: a research agenda Human factors in information visualization and decision support systems Challenges to the reproducibility of machine learning models in health care Information visualization: perception for design Survey of surveys (sos)-mapping the landscape of survey papers in information visualization A survey of information visualization books Detecting meaning in rsvp at 13 ms per picture. Attention, Perception, & Fitoviz: A visualisation approach for real-time risk situation awareness The picture superiority effect in associative memory: A developmental study Neural correlates of the episodic encoding of pictures and words Human information processing: An introduction to psychology The effects of visualization and interactivity on calibration in financial decision-making A graph is worth a thousand words: How overconfidence and graphical disclosure of numerical information influence financial analysts accuracy on decision making Decision making with visualizations: a cognitive framework across disciplines Data-driven healthcare: Challenges and opportunities for interactive visualization Data-driven healthcare: Challenges and opportunities for interactive visualization Patient preferences for visualization of longitudinal patient-reported outcomes data Readability and visuals in medical research information forms for children and adolescents How children learn the meanings of words Artificial intelligence, bias and clinical safety Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift The need for uncertainty quantification in machine-assisted medical decision making Consistent estimators for learning to defer to an expert Predict responsibly: Improving fairness and accuracy by learning to defer Second opinion needed: communicating uncertainty in medical machine learning A snapshot of the frontiers of fairness in machine learning Effects of visualizing uncertainty on decision-making in a target identification scenario Visualizing uncertainty about the future. science End-user development for interactive data analytics: Uncertainty, correlation and user confidence Recent advances in convolutional neural networks 2020 global women in data science (wids) conference, datathon challenge Karen Matthys, and Leo Anthony Celi. Wids (women in data science) datathon 2020: Icu mortality prediction The global open source severity of illness score (gossis) Aptos 2019 blindness detection Mimic-iii, a freely accessible critical care database components of a new research resource for complex physiologic signals. circulation Covid-19 image data collection: Prospective predictions are the future Covid chest x-ray dataset Identifying medical diagnoses and treatable diseases by image-based deep learning Chest x-ray images (pneumonia) challenge A distributed multitask multimodal approach for the prediction of alzheimer's disease in a longitudinal study A tensorized multitask deep learning network for progression prediction of alzheimer's disease Deconvolution and checkerboard artifacts Cohort profile update: The doetinchem cohort study 1987-2017: lifestyle, health and chronic diseases in a life course and ageing perspective Atopic dermatitis in diverse racial and ethnic groups-variations in epidemiology, genetics, clinical presentation and treatment The authors declare no significant competing financial, professional, or personal interests that might have influenced the performance or presentation of the work described in this manuscript. All the data used in experiments are from researchpurposed publicly available datasets which are cited through the manuscript. The source codes of the experiments are provided in https://github.com/mohaEs/VL4ML.