key: cord-0802679-cn6v1aiv authors: Kalliokoski, Tuomo title: Requirement analysis for an artificial intelligence model for the diagnosis of the COVID-19 from chest X-ray data date: 2021-10-24 journal: 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021 DOI: 10.1109/bibm52615.2021.9669525 sha: b6791981d96041629a0f8ffeb6ae3176b99cd9a9 doc_id: 802679 cord_uid: cn6v1aiv There are multiple papers published about different AI models for the COVID-19 diagnosis with promising results. Unfortunately according to the reviews many of the papers do not reach the level of sophistication needed for a clinically usable model. In this paper I go through multiple review papers, guidelines, and other relevant material in order to generate more comprehensive requirements for the future papers proposing a AI based diagnosis of the COVID-19 from chest X-ray data (CXR). Main findings are that a clinically usable AI needs to have an extremely good documentation, comprehensive statistical analysis of the possible biases and performance, and an explainability module. Ever since the World Health Organization classified the COVID-19 as a Public Health Emergency of International Concern (PHEIC) [1] , which is more commonly called as a pandemic [2] , the AI field has produced a multitude of papers related to diagnosing the COVID from various data. Reviews which focus on the clinical suitability of the models presented like references [3] , [4] , [5] , [6] , [7] , and [8] have been critical about various aspects of the reviewed papers. Thus there is a need for a proper requirements analysis for the AI based diagnosis of the COVID-19 from CXR data so that these shortcomings can be remedied. This requirement analysis includes general ethical considerations, general AI model building considerations, and clinical considerations for both the AI in general and in the radiology diagnosis. For these considerations I will use relevant publications as sources of the requirements. Many of the other sources are reviews of previous work in which I will concentrate on the criticism of the analyzed publications. In the next section I will go through the selected sources in order to find the information about requirements. This is followed by a section in which I formulate those requirements in a more concrete way. The fourth section is the proposed solutions for the given requirements. The last section is the conclusions. The issue of the ethics in the AI and data field is a field of study in its own right [9] as is the field of the medical ethics. I will refrain from the full discussion of these matters and focus on the issues from the general computer science ethics point of view. There is plenty of literature available about the ethics for AI in healthcare see for example references [10] , [11] , [12] , [13] , [14] , [15] , and [16] . In the software development practically every design decision has to be justifiable after ethical analysis [17] . Simplest question is about how resource intensive can we be, how much do we value the output accuracy versus the resource usage to get that accuracy? Each technological project should also undergo proper impact assessment [18] . This analysis is based on the framework provided in the reference [18] which provides a list of questions for consideration. I will go through most of them in order, but skip those which are trivially irrelevant to this work. Questions about respect for the autonomy are mostly trivially irrelevant, except the case of curtailing personal freedom of movement. If a person is diagnosed with the COVID he will most likely be quarantined. This is justified due to the public health risk which the infected persons pose to the others. In the area of the dignity the questions are mostly trivially irrelevant, as the goal is to provide a quicker and less invasive way of diagnosing the COVID from patients with a pneumonia. The patients would be in any case subjected to the chest X-rays and the diagnosis tool would be used to analyze those images. Next section in the reference [18] is about the informed consent. In the case of the data collection one should be using datasets only from respectable institutions which have followed the required ethical practices in their work. When the AI model would be used by the medical practitioners can only give their informed consent if the quality of the AI system is evaluated properly and it has an explainability functionality. This explainability is required by the medical practitioners to do informed decisions based on the output of the system [19] . Another issue here is the collection on data during the use of the system. Non-maleficence section of the reference [18] starts with safety related questions. As this would be used as a clinical tool the safety aspect is a critical one. There will be a great harm coming from the errors in the diagnosis. A false positive diagnosis will cause psychological harm to the patient and a false negative will slow down the treatment and puts other people in risk as well. The algorithm needs a thorough testing before it can be used in a clinical settings. Second part of this section in the reference [18] is the social solidarity, and inclusion and exclusion. This is mainly about the information society inclusion and only relevant to this discussion is the fact that the system should be available for offline use in the areas where the internet connectivity is not good. The beneficence section of the reference [18] has multiple questions which are relevant to this work. The goal is to benefit the individuals and the society by a faster and less invasive way to diagnose the COVID. With the X-ray image analysis the diagnosis can be done in 30 minutes [20] , while the nasal swap and the RT-PCR will take at minimum multiple hours to complete [21] . This should be a great benefit for all humans. Next section in the reference [18] is the universal service. It should be available in all medical stations equipped with the X-ray machinery and a computer. Here one should also take into account the issue of the global computer part shortage which limits the ability to get the newer and more powerful computing equipment [22] , [23] , [24] , [25] . The diagnosis software thus should be usable in any modern computer. The accessibility is also an issue which needs some discussion. In the case of fully usable software an extremely simple user interface so that it is easy to use with a minimal training. If the goal is only produce the model for diagnostic, it has to have a simple and well documented API. The value sensitive design has some relevance here. The explainable nature of the AI will provide empowerment to the medical personnel. With a "regular" AI they will get only a value stating that the patient has the COVID with some probability, with an eXplainable AI they also get information on why the AI has come to this conclusion. This will provide them a lot more information and they can use it for their benefit. For the sustainability we have some issues with possible change in the standards. The system should be built using the current standards and a modular design so that it could be easily updated. The justice section in the reference [18] discusses about distributive justice for all individuals and groups. This is highly relevant matter for any diagnosis tool as it is widely known that there are problems with many diagnosis methods when the patient is not a Caucasian male. See for example references [26] , and [27] . To overcome this problem the dataset needs to have data with an excellent reach over all humanity, not just in a single demographic group. The less desirable alternative solution is to state clearly the issues and limitations of the diagnosis model in the publication. The equality and fairness (social justice) as defined in the reference [18] is not as relevant as the previous part. The main point in it is the availability of the service for all, not just to a segment of the population based on their privileges. This is solved by the same solution as the universal service issues. Another point raised is the risk for the diagnosis being used for detriment of the patient. Next part in the reference [18] talks about privacy and data protection. This is mainly relevant to the training data which I have discussed earlier. These issues is addressed when selecting the data source and by not collecting data during the use of the system. Almost every AI project can be seen as a data science project in which we are tasked to find new knowledge from the data. For this there are well established workflows like KDD [28] and CRISP-DM [29] , and others for which one can see reference [30] which provides a review. Both the KDD and the CRISP-DM start with the understanding of the domain. One has to have enough domain knowledge to know what is needed and which things are relevant. This includes the knowledge on what has to be reported. Next step is the data understanding and preparation. One has to know the data properly for identifying possible biases, confounding factors which could lead to shortcut learning [31] , imbalance [32] and other possible issues in the data quality and suitability for the task. Data selection is one of the most critical tasks. This is followed by the modeling. This requires selection of suitable modeling tools for the data and the goal. The statistical model analysis is then performed, which includes proper review of models properties related to the goal. Here we need to remember that any data which has been used in the model building cannot be used in the evaluation [33] , [34] (including those which are not independent from the data used in the building.) One should remember that there are multiple ways of doing the estimation of the generalization performance and choose the most appropriate one [35] . Last step is deployment in to the production. The usual structure of the project is also iterative one, so based on the discoveries in each step one goes back to a suitable step and acts upon the changed situation. Like if you find confounding factors then you go to the data preparation to remove them. For AI tasked to image analysis CNN [36] , [37] is the default choice of the architecture, but there are options like Capsule Neural Network (CapsNet) [38] , Graph Neural Network (GNN) [39] , and their combination CapsGNN [40] . One interesting approach is Multiple Instance Learning [41] and [42] . These all require proper understanding of the data for the proper architecture implementation. The decision between different options depends on the task specific details and data availability. A bias in data, as an extreme example all in one group having notable but non-relevant feature while on the other group no-one has this feature, can influence how the model does the classification. There are options for handling data imbalance [43] , [44] and scarcity.Before using any of the standard image augmentation techniques [45] one must have enough domain knowledge to refrain generating incorrect data, as a simple example I use rotating 6 to 9 in number recognition. Example of this is found in the figure 1. C. The analysis of the reviews done to the previous COVID-19 diagnosis models There has been many papers published for reviewing work done to diagnose the COVID-19 from the CXR-and CTimages. Reviews presented in this section are focused on the clinical applicability of the reviewed models and thus are extremely critical as the field has extremely strict regulation and tight tolerances. I have used these reviews for gathering requirements. The reference [6] was among the first ones and it studied 14 publications of the AI models for the COVID-19 detection published by the end of March 2020. It found issues with testing for biases, the used datasets, and with used algorithms, including the use of out-of-the box models and lack of the explainability. The use of different classification methods (binary, multiple classes, multiple labels, and hierarchical) in the COVID diagnosis was researched in the reference [7] . They found 11 studies related to their interest by May 5 2020. They found out that on those there were lack of consistency in the evaluating the quality of their model predictions. Publications between March 2020 and May 2020 were also reviewed in the reference [8] . They found 34 publications and found that many publications had issues with dataset selection and possibly used multiple copies of same images. Other reported issues with used data were the class imbalances, private datasets. They also criticized the lack of uniformity in the quality evaluation, including bias evaluation. Explainability was also an issue which they brought up. According to reference [4] studied papers proposing ML based diagnosis of the COVID-19 with data from the CXR or the CT scans from 1 January 2020 to 3 October 2020. They found 320 papers for their quality review and 258 failed in the first section of their analysis. Insufficient documentation in the model selection (132 failures), the methods of preprocessing of the images (125), and the details of the training approach (105) were the three most common failures. One critical failure was not disclosing the dataset used in the analysis. For the papers which passed this screening among the reasons for failing the clinical suitability were the lack of the proper validation, the robustness or sensitivity analysis, the demographics of the people in the data, the statistical testing for results, and the reporting issues regarding to the generalization. The paper criticized the lack of attention given to the features of the used datasets, for example using the dataset [46] as the control set while it consists of pediatric patients aged between on and five, and the COVID-19 patients were adults. Another issue raised was the downward scaling of the images due to the use of the ready of-the-shelf models. This and the lack of the demographic data is also related to the use of JPEG and PNG images instead of the DICOM [47] which has metadata of the image acquisition parameters and other important information. The code availability and other replicability issues were also mentioned as was the call for the interpretability. Another paper dedicated for creating proper basis for responsible deep learning for diagnosing COVID-19 from medical images is reference [3] . This paper is centered around use of the Explainable Artificial Intelligence to find the possible errors in the model, but was not limited to it. They analyzed 25 models, which were collected by August 14 2020. They found mistakes in data acquisition, model development, and explainability. As an example the paper mentioned that one should not use image augmentations which produce "impossible" images. For future models it produced a checklist for creating a responsible deep learning model for this task. Reference [5] presents a systemic review of 169 studies. It provides a critical appraisal of the prediction models for the diagnosis and the prognosis of the COVID-19 in the selected studies which also included other implementations than the image analysis of CXR or CT. They found that all studies had either a high or an unknown risk of a bias in the results. This was mainly caused by non-representative selection of control patients, overfitting, issues with result validation, and unclear reporting. Many other papers have raised the issue of lack of generalization of the AI diagnosis due to confounding factors in the data [48] , [49] , [50] , [51] , [52] , [53] . This leads to diagnosis based on other issues than the actual relevant features on the lung area, like learning from which data set the image is based on other information visible in the image. Another recognized source for lack of generalization was the dataset biases [54] , [55] , [56] , [53] . Possible sources for these biases include patient demographics, procedures (for example the direction from which the X-ray image was taken), and procedures performed before taking the X-ray (for example an intubation tube visible in the CXR image.) Reference [57] calls for uncertainty evaluation for predictions. This is related to lack of statistical analysis of the prediction quality mentioned in both references [3] and [4] . There are papers written on the use of the AI for diagnosis or for other purposes in the medical science. In this section I will discuss the issues which they have raised. The medical papers on the AI use outside of the COVID-19 are also relevant as they provide information on what is needed from a AI product used in the medicine. Reference [58] called for the good training data, the performance validation, and was critical of the "black-box" nature of some AI models. Papers like [59] , [60] , [61] , [62] , and [63] reminded on issues with the data and the "black-box". Reference [63] even called the "black-box" AI as unacceptable in the medical domain. The lack of the documentation for reproducibility was also brought up by the reference [62] . Among other papers calling for the XAI are references [64] , [65] , [66] , [67] , [68] , and [69] . The reference [70] points out issues with amount of data available for training and imbalance issues in the data, and it also point out the lack of the confidence intervals in the predictions. The Checklist for Artificial Intelligence in Medical Imaging (CLAIM) is available in the reference [71] . There is also a radiomics quality score (RQS) [72] , which can be used. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) [73] gives a reporting guidelines for diagnosis tools. The PROBAST (Prediction model Risk Of Bias ASsessment Tool) [74] tool is commonly used for the estimation of the risk of bias. The bias is usually related to the dataset not being representative due to the data scarcity, the population shift, the prevalence shift, or the selection bias [75] , [76] . The dataset shifts are also discussed in reference [77] . The reference [78] noted confounding factors in the medical image analysis of the knee is also. The reference [79] describes the application of the ITU/WHO FG-AI4H (Focus Group on Artificial Intelligence for Health) assessment guidelines [80] for a machine learning tool. The application includes a questionnaire about the project, the bias & fairness analysis, the interpretability (explainability), the robustness evaluation and the reporting guidelines. In this section I will transfer the thoughts presented in the ethical considerations section into more proper requirements. 1) The data collection has to be done with an informed consent or use data from a collection by a respectable source. 2) The quality of the diagnosis has to be known to the medical practitioners. 3) Medical practitioners needs to have explanation why the software made the diagnosis. 4) Product needs to be usable in a low infrastructure area. This means that the software should be able to be used without an internet connection and with a low end computer. 5) Faster or better than the competing technologies (see for example the references [20] , [21] , and [81] ). The diagnosis quality better than rapid antigen testing and result should be available in less than 10 minutes with an standard medium or low power computer. 6) Good and simple user interface. It can be limited to a simple and well documented API. 7) Use eXplainable AI to empower the users. 8) Use medical imaging standards (DICOM [47] ). 9) The model should work as well with all segments of the human population. If this is not possible it should be clearly stated. 10) Do not collect data during the operation without consent. Here we basically have only two main requirements. 1) Learn the domain issues. 2) Study the data properly to find any possible issues. while others are 3) Select only proper data. 4) Fix any issues with the data. 5) Select suitable modeling tools for the data. 6) Perform proper analysis of the model created. Repeat if result is not good enough. In order to generalize the results (or at least know the limits of the generalization) we need to: 1) Remove the confounding factors. 2) Handle the biases in the datasets. 3) Handle the data scarcity. 4) Handle the shifts between the datasets and the reality. For reliability review which is needed for every medical application we need to: 5) Do proper the documentation of choices made. 6) Do the result validation. 7) Do the robustness and sensitivity analysis. 8) Select the image size for the analysis based on scientific reasons, not based on the convenience. 9) Use the metadata available. 10) Have the code and other material available for replication. 11) Explain the reasons for given diagnosis. Here we have following: 1) Use checklist(s). 2) Explainability. 3) Bias analysis. 4) Proper data issue handling. 5) Documentation. 6) Performance validation. Many of these requirements are overlapping and thus the list of the requirements can be simplified to following. • Learn the basics of the medical imaging domain. • Use checklist(s). • Use the DICOM data from a respectable source. • Study the available data properly. • Handle data issues: selection, biases, confounding factors, scarcity, and shifts. • Select a suitable model with explainability. • Document every decision, and the reasons for them, regarding to the data, the model architecture, and the (meta)parameters of the model. IV. PRACTICAL SOLUTIONS TO THESE REQUIREMENTS The requirements and solutions listed here are aimed to be used for the preliminary studies to find suitable models. For the proper handling the issues do take a look at the references given here and previously in this work and legal requirements for clinical applications. My recommendations are also found in a condensed form in the table I. A. Learn the basics of the medical imaging domain. The best solution here is to include a domain expert into the group, the absolute minimum is a proper review of the previous work done in the field. There are multiple checklists e.g references [3] , [71] , [74] , [80] , [72] , [73] C. Use full sized DICOM data from a respectable source This has its own solution written directly in to the requirement. As DICOM is an industry standard it is well documented and there exists ready libraries for its use. The respectable source would be some proper institute, not a private collection. Do proper analysis to the raw data. This has to include study of demographics, biases, duplicates, outliers, pixel intensities, etc. E. Handle data issues: Selection, biases, confounding factors, scarcity, and shifts. In the data selection one needs to be careful and one has to remove the confounding factors and the incorrect data. Especially the area outside of lungs holds many confounding artifacts [3] , while the secondary source of confounding is the pixel intensity [52] . There are plenty of ready lung segmentation models available, but they should be reviewed properly before selection. Incorrect data should not be included, examples of this are duplicates and failed images. In the case of the bias this can sometimes be handled with the proper selection of the used data or with generating new data via the augmentation [82] , [83] or via the generative adversarial nets (GAN) [84] . The last resort is to record properly the existing biases and continue with them. On the new data generation one needs to be extremely careful not to use incorrect techniques [3] . For data scarcity we have a possibility to generate new data as in the case of bias and other solution is the transfer learning [85] , [86] , [87] , [88] . But one should remember that transfer learning is not always useful [89] , [90] . Dataset shifts are discussed with detail in the references [75] and [77] . CNN is still the default choice, but if other relevant factors points toward choosing something else do not discard them. The available out of the box models are not always the best choice [89] , [83] . There are multiple different explanation tools for the image classification AI. The reference [3] gives some overview what has been used previously and gives some pointers on the issues related to them. The suitability of these explanation tools for other than the CNN is also an issue. Document every decision, and the reasons for them, regarding to the data, the model architecture, and the (meta)parameters of the model. According to the reference [7] we need following statistical information. Binary categorization Accuracy, precision, recall (sensitivity), F score, specificity, AUC. Average accuracy, error rate, precision µ , recall µ , F score µ , precision M , recall M , F score M Multi-label classification Exact match ratio, labeling F score, retrieval F score, Hamming loss. Hierarchical classification Precision↓, recall↓, F score↓, precision↑, recall↑ and F score↑, which are defined in the reference [7] . Compare the model to the current state of the art with other techniques like the RT-PCR. Remember to use up to date information on this comparison. Use the GitHub or some other similar service. There has been a great effort and lots of enthusiasm for providing an AI solution to the clinical diagnosis of the COVID using the CXR data. While promising results have been published, unfortunately most of the publications lack the rigor needed in the medical field. This issue is shown by multiple reviews [3] , [4] , [5] , [6] , [7] , and [8] and our field needs to pay a proper respect to the actual requirements for such tools. There are two major limitations in this work, the lack of proper medical expertise by the author and reliance to previous review papers which all always behind the state of the art implementations which have been published after articles for reviews are selected. This work is a start into this direction and a pointer towards more thorough work done by the domain experts. This work can also be used as a basis for other analyses on similar issues. Personally this is the starting point for developing new AI model for the diagnosis of the COVID-19 from chest X-ray images. Lists are found in references [3] , [71] , [72] , [73] , [74] Data gathering Use the DICOM data from a respectable sources Data preview Do a proper preview of the data properties to find the possible issues. Handle the issues with care, see at least the references [3] , [75] , and [77] . Select model Find the proper model suitable for the task and equip it with the explanability module. Documentation Document and explain all choices made to the regarding data, the architecture, and the (meta)parameters. Follow the reference [7] Model performance Compare to other technologies like the RT-PCR Replication Store everything needed for it in the GitHub or in an other similar service. World Health Organization World Health Organization Checklist for responsible deep learning modeling of medical images based on COVID-19 detection studies Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal Curbing the AI-induced enthusiasm in diagnosing COVID-19 on chest X-rays: the present and the near-future Systematic review of artificial intelligence techniques in the detection and classification of COVID-19 medical images in terms of evaluation and benchmarking: Taxonomy analysis, challenges, future solutions and methodological aspects Deep learning approaches for detecting COVID-19 from chest X-ray images: A survey A high-level overview of AI ethics Identifying ethical considerations for machine learning healthcare applications Ethics of artificial intelligence in radiology: Summary of the joint European and North American multisociety statement Artificial intelligence as a medical device in radiology: ethical and regulatory issues in Europe and the United States Machine learning in the EU health care context: exploring the ethical, legal and social issues Artificial intelligence in hospitals: providing a status quo of ethical considerations in academia to guide future research The debate on the ethics of AI in health care: a reconstruction and critical review The ethics of AI in health care: A mapping review Is there an ethics of algorithms A framework for the ethical impact assessment of information technology Opening the black box of AI-medicine How to optimize the radiology protocol during the global COVID-19 epidemic: Keypoints from Sichuan provincial people's hospital How is the COVID-19 virus detected using real time RT-PCR? Chip shortage will hit IT-hardware buyers for months to years: Tech executives and analysts say the current processor-chip shortage and disruption of supply chains thanks to COVID-19 could have a long-term impact on price and availability The global chip shortage is here for some time: Loading, please wait You're not imaging things, there is a serious chip shortage: CPUs, GPUs, and memory are all in tight supply due to manufacturing issues and high demand Chips in a crisis Malignant melanoma in African-Americans: A population-based clinical outcomes study involving 1106 African-American patients from the surveillance, epidemiology, and end result (SEER) database Sex-Based Differences in Lung Physiology From Data Mining to Knowledge Discovery: An Overview. USA: American Association for Artificial Intelligence CRISP-DM 1.0 step-by-step data mining guide A survey of data mining and knowledge discovery process models and methodologies Shortcut learning in deep neural networks Learning from imbalanced data Validity, reliability, and baloney A research test of the Rorschach test On splitting training and validation set: A comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position Gradient-based learning applied to document recognition Dynamic routing between capsules A new model for earning in raph domains Capsule graph neural network Integrated segmentation and recognition of hand-printed numerals Solving the multiple instance problem with axis-parallel rectangles Classification of imbalanced data: A review Learning not to learn: Training deep neural networks with biased data Research on data augmentation for image classification based on convolution neural networks Identifying medical diagnoses and treatable diseases by image-based deep learning NEMA PS3 / ISO 12052, Digital Imaging and Communications in Medicine (DICOM) Standard, National Electrical Manufacturers Association Std A critic evaluation of methods for COVID-19 automatic detection from X-ray images Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study AI for radiographic COVID-19 detection selects shortcuts over signal Discovery of a generalization gap of convolutional neural networks on COVID-19 X-rays classification Can we trust deep learning based diagnosis? the impact of domain shift in chest radiograph classification Unveiling COVID-19 from CHEST X-ray with deep learning: A hurdles race with small data An adversarial approach for the robust classification of pneumonia from chest radiographs A review on lung boundary detection in chest X-rays Potential limitations in COVID-19 machine learning due to data source variability: A case study in the nCov2019 dataset Objective evaluation of deep uncertainty predictions for COVID-19 detection Machine learning applications in cancer prognosis and prediction Current applications and future impact of machine learning in radiology Machine learning in medicine Deep learning in medical imaging: General overview How far have we come? artificial intelligence for chest radiograph interpretation Deep learning in generating radiology reports: A survey Explainable artificial intelligence for neuroscience: Behavioral neurostimulation The practical implementation of artificial intelligence technologies in medicine What do we need to build explainable AI systems for the medical domain Causability and explainability of artificial intelligence in medicine A survey on deep learning in medical image analysis Artificial intelligence in healthcare Going deep in medical image analysis: Concepts, methods, challenges, and future directions Checklist for artificial intelligence in medical imaging (CLAIM): A guide for authors and reviewers Radiomics: the bridge between medical imaging and personalized medicine Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement PROBAST: A tool to assess the risk of bias and applicability of prediction model studies Causality matters in medical imaging Multiple skin lesions diagnostics via integrated deep convolutional networks for segmentation and classification From development to deployment: dataset shift, causality, and shift-stable models in health AI Deep learning predicts hip fracture using confounding patient and healthcare variables ML4H auditing: From paper to practice The ITU/WHO focus group on artificial intelligence for health Performance of a SARS-CoV-2 antigen rapid immunoassay in patients admitted to the emergency department A survey on image data augmentation for deep learning An efficient deep learning approach to pneumonia classification in healthcare Generative adversarial nets A survey of transfer learning A survey on deep transfer learning DeTrac: Transfer learning of class decomposed medical images in convolutional neural networks Transfer learning in medical image segmentation: New insights from analysis of the dynamics of model parameters and learned representations Transfusion: Understanding transfer learning for medical imaging Cats or CAT scans: Transfer learning from natural or medical image source data sets?