key: cord-291612-j7xz1qaz authors: Albahri, O.S.; Zaidan, A.A.; Albahri, A.S.; Zaidan, B.B.; Abdulkareem, K.H.; Al-qaysi, Z.T.; Alamoodi, A.H.; Aleesa, A.M.; Chyad, M.A.; Alesa, R.M.; Kim, L.C.; Lakulu, M.M.; Ibrahim, A.B.; Rashid, N.A. title: Systematic Review of Artificial Intelligence Techniques in the Detection and Classification of COVID-19 Medical Images in Terms of Evaluation and Benchmarking: Taxonomy Analysis, Challenges, Future Solutions and Methodological Aspects date: 2020-07-01 journal: J Infect Public Health DOI: 10.1016/j.jiph.2020.06.028 sha: doc_id: 291612 cord_uid: j7xz1qaz This study presents a systematic review of artificial intelligence (AI) techniques used in the detection and classification of coronavirus disease 2019 (COVID-19) medical images in terms of evaluation and benchmarking. Five reliable databases, namely, IEEE Xplore, Web of Science, PubMed, ScienceDirect and Scopus were used to obtain relevant studies of the given topic. Several filtering and scanning stages were performed according to the inclusion/exclusion criteria to screen the 36 studies obtained; however, only 11 studies met the criteria. Taxonomy was performed, and the 11 studies were classified on the basis of two categories, namely, review and research studies. Then, a deep analysis and critical review were performed to highlight the challenges and critical gaps outlined in the academic literature of the given subject. Results showed that no relevant study evaluated and benchmarked AI techniques utilised in classification tasks (i.e. binary, multi-class, multi-labelled and hierarchical classifications) of COVID-19 medical images. In case evaluation and benchmarking will be conducted, three future challenges will be encountered, namely, multiple evaluation criteria within each classification task, trade-off amongst criteria and importance of these criteria. According to the discussed future challenges, the process of evaluation and benchmarking AI techniques used in the classification of COVID-19 medical images considered multi-complex attribute problems. Thus, adopting multi-criteria decision analysis (MCDA) is an essential and effective approach to tackle the problem complexity. Moreover, this study proposes a detailed methodology for the evaluation and benchmarking of AI techniques used in all classification tasks of COVID-19 medical images as future directions; such methodology is presented on the basis of three sequential phases. Firstly, the identification procedure for the construction of four decision matrices, namely, binary, multi-class, multi-labelled and hierarchical, is presented on the basis of the intersection of evaluation criteria of each classification task and AI classification techniques. Secondly, the development of the MCDA approach for benchmarking AI classification techniques is provided on the basis of the integrated analytic hierarchy process and VlseKriterijumska Optimizacija I Kompromisno Resenje methods. Lastly, objective and subjective validation procedures are described to validate the proposed benchmarking solutions. In the last days of 2019, a group of patients infected with a novel coronavirus disease (coronavirus disease 2019, was recognised in Wuhan, China. Since then, the contagions of COVID-19 have spread around the world. COVID-19 affects people in different ways. Most infected patients develop common symptoms (i.e. fever, fatigue and dry cough) [1] , and others may experience additional symptoms (i.e. aches and pains, nasal congestion, runny nose, sore throat and diarrhoea) [2] . COVID-19 exposed weaknesses in the healthcare system of many countries, and the inability of healthcare systems to manage patients has caused anxiety. One of the important reasons behind the rapid spread of COVID-19 is the lack of specificity in clinical detection methods [3] . Molecular approaches such as quantitative real-time reverse transcription-polymerase chain reaction (rRT-PCR) [4] and other methods such as serologic tests [5] and viral throat swab testing [6] are necessary and widely utilised for the detection of COVID-19. However, studies have shown that chest radiographs (X-rays) [7] and chest computed tomography (CT) scans [8] can assist and reveal anomalies indicative of different lung diseases, including COVID-19. CT scan and X-ray tests could be utilised as a primary detection tool to evaluate the severity of COVID-19, monitor the emergency case of infected patients and predict COVID-19 progression [9] . However, time is often limited in such emergencies and does not allow these experiments to be performed using existing traditional manual diagnosis [10] . These procedures require a specialist doctor and are susceptible to human error during testing or reading and interpreting findings, which are not acceptable in crucial cases. Given the recent spread of COVID-19, hospitals are filled with numerous patients who are either improving from the viral infection or becoming worse (dying) [11] . In this case, CT scan and X-ray tests should be performed with maximum speed and efficiency to save as many lives as possible [9] . The role of intelligent technologies would effectively help in the diagnosis and classification processes [7] . The use of artificial intelligence (AI) has increased in different fields, especially in medical detection [12] . AI has been widely used to gain more accurate detection results and decrease the burden on the healthcare system [13] . It can decrease the decision time associated with the detection process of traditional methods [14] . The development of AI techniques to recognise the risks of epidemic diseases is considered a key factor in the improvement of the prediction, prevention and detection of future global health risks [15] . Numerous types of AI classifiers have been reported by a few researchers with real COVID-19 datasets with different case studies and targets [9] . Although AI techniques can be beneficial in the diagnosis and classification of COVID-19, selecting the appropriate AI technique that can produce accurate results is challenging [16, 17] . The large diversity amongst available AI techniques creates difficulties in deciding which of them to use in the development of COVID-19 diagnosis and classification particularly when there is no dedicated AI technique that is far better than the other. In addition, the majority of these techniques suffer from low accuracy and computational efficiency [18] . On the other hand, the difficult part is associated with the evaluation and comparison because of the multiple evaluation criteria and conflict between them are increasing the challenge [19] . The evaluation and benchmarking procedures of AI techniques are critical in acquiring a technique that can produce the best results [17, 20] . A similar process is essential since there will be affected on the persons who suspected with COVID-19 and medical organisation due to this process could result in losing their life and spreading the virus amongst others. In order to evaluate and benchmark AI classification techniques that can be used in the detection of COVID-19 medical images, several requirements guarantee the reliability of these techniques given that they are associated with patients' lives. However, two main questions can be encountered in this process. Firstly, what are the appropriate criteria that could be used in the evaluation? Secondly, what is the correct benchmark procedure that could be used to select a suitable AI technique amongst others? Therefore, the present study aims to (i) shed light and systematically review the research efforts of emerging and new technologies of COVID-19 medical image detection based on AI approach; (ii) map related studies into coherent taxonomy and highlight the AI techniques, datasets, case studies and AI classification types used; (iii) highlight and analyse different aspects such as research gabs and future challenges with respect to evaluation and benchmarking; and (iv) propose a potential pathway solution with detailed methodology to tackle the identified research gabs and future challenges of evaluation and benchmarking of AI classification techniques used in COVID-19 medical image detection. The remaining sections of this study are presented as follows. Section 2 presents the methods used in reviewing systematic literatures of the topic. Section 3 presents the taxonomy analysis highlight points of the included final set of studies. Section 4 presents a critical review and analysis of the identified studies. Section 5 presents the future challenges related to the evaluation and benchmarking of AI classification techniques used in COVID-19 medical image detection. Section 6 presents a proposal of potential future solutions for the identified research gabs. Section 7 provides the methodology of the proposed solutions. Finally, Section 8 presents the conclusion. relevant keywords, the second group was meant for COVID-19 relevant keywords, whilst the third group was meant for medical images with different relevant keywords to make sure all literature associated with the three groups are included. In this SLR, different criteria were enforced for the selection of related literature. All articles were selected if they were English and conducted between 2010 and May 5, 2020. For the publication types, all articles were selected if they were journal, conference or review papers [29] [30] [31] . As far as the topic of interest is concerned, this SLR only selected publications that discuss any form of AI and COVID-19 using medical images. The exact query is presented in Figure 1 . The search was conducted in the middle of May 2020 using the advanced search boxes of the five digital databases. The initial search yielded 36 publications after duplication process, which excluded a total of six duplicated records. Next, the titles and abstracts of the publications were scanned. Twenty articles were excluded as they failed to meet the criteria. The remaining articles underwent another round of screening through full-text reading to investigate the relevancy of the selected articles from the previous phase and determine whether they are suitable to be included in the final set. After this process, five articles were excluded, and only 11 articles met all criteria and were deemed suitable for inclusion in this review. Furthermore, the key demographic statistical findings from the articles are presented on the basis of two aspects, namely, database used and countries ( All these 11 articles identified from the literature are scattered over the databases and the countries. For the databases, four studies were obtained from ScienceDirect, three from IEEE, two from PubMed, and two from Scopus. The only database with no identified articles was Web of Science. As for the countries of the corresponding authors, three studies came from China, three from Turkey, one from South Africa, one from Italy, one from UK, one from Korea, and one from Egypt. This section elaborates the final set of articles (11 articles) that have been collected in this systematic review regarding AI techniques used in the detection and classification of COVID-19 medical images. This final set was divided into two clusters, namely, review cluster and research cluster. The taxonomy and classification of the related articles are shown in Figure 3 . The primary aim of reviewing articles on AI techniques used in the detection and classification of COVID-19 medical images is to understand the current thinking in this field and justify the need for future research on related topics that have been overlooked or understudied. This cluster contained only one article. In [9] , the study reviewed the rapid responses in the community of medical imaging (empowered by AI) towards COVID-19. The authors emphasised that AI-empowered image acquisition can significantly help automate the scanning procedure and reshape the workflow with minimal contact to patients, providing the best protection to the imaging technicians. They focused on the entire pipeline of medical imaging and analysis techniques involved with COVID-19, including image acquisition, segmentation, diagnosis and follow-up, using the integration of AI with X-ray and CT images. The second cluster focused on research studies and contained 10 articles, which consist of four sub-clusters: binary, multi-class, integrated multi-class and binary, and integrated hierarchical and multi-class. The flat classification refers to binary classification problems with only two different classes. One article involved this sub-cluster. The study of [32] demonstrated the ability of deep learning method in the diagnosis of COVID-19 on the basis of medical images acquired by CT. Regarding the class labels that have been used in identifying the existence of the infection, this study relied on false-negative (FN) results, which jeopardise the epidemic from being prevented and controlled and affect decisions for health monitoring or discharging. The dataset utilised was made out of the information of 10 patients. Out of the 10 negative cases, two were positively identified for COVID-19 by utilising the rRT-PCR test. The previous clearly indicated and yielded almost 20% FN rate for rRT-PCR. There are numerous issues and challenges linked to multi-class classification. However, there is one output for a sample. This sub-cluster includes four different publication works identified. The first work [33] involved the development of a scoring tool aimed at COVID-19 severity. Such tool was proven to be important in assisting healthcare workers in identifying and determining which patients suspected or confirmed for COVID-19 are in high need for respiratory interventions. [35] proposed a patch-based technique with convolutional neural network. The technique makes use of a small number of training parameters for the diagnosis of COVID-19. The work was inspired by a statistical analysis for potential imaging biomarkers of chest X-rays. Their results indicated that pre-processing for normalisation of data helped in the processing of cross-database and significantly improved the accuracy of segmentation (Jaccard similarity coefficients from 0.932 to 0.943, p < 0.001). According to the results, pre-processing was a significant aspect in ensuring the performance of the segmentation in the cross-database. In [36] , CoVID-19 was identified with the use of MobileNetV2 and SqueezeNet, a deep learning technique, in addition to feature sets obtained by the techniques. They were processed using the social mimic optimisation method. Fuzzy colour technique was used to restructure data classes as a pre-processing measure, and the structured images were stacked with the original images. Thereafter, efficient features were grouped and classified with the use of support vector machines (SVMs) with an overall classification rate of 99.27%. This sub-cluster contains three articles that focused on integrated multi-class and binary classification problems. [37] emphasised the deployment of AI to support the work of a radiologist. They indicated that the application of AI in COVID-19 infection will allow monitoring the course of the disease. [38] indicated the importance of AI in maintaining the spread of COVID-19. [7] presented a model for detecting COVID-19 by using 125 X-ray images in accurately diagnosing binary classification (COVID vs. no findings), in addition to multi-class classification (COVID vs. no findings vs. pneumonia). The accuracy of the model was 98.08% for binary classes and 87.02% for multi-class cases. Another classification problem type is hierarchical classification where the learning output is identified over special class taxonomy. [39] defined hierarchical classification as follows: 'the input is to be classified into one, and only one, each class which are be divided into subclasses or grouped into superclasses. The hierarchy is defined and cannot be changed during classification. Hierarchical classification can be transformed into flat classification.' This sub-cluster contains two articles that focused on integrated hierarchical and multi-class classification problems. [40] identified COVID-19 pneumonia from various healthy lung types and developed a classification approach, which takes into consideration multi-class and hierarchical perspectives. In addition, resampling algorithms were used for re-balancing the distribution of the classes. The approach acquired a macro-average F1 score of 0.65 with the use of multi-class method and F1 score of 0.89 for the identification of COVID-19 in hierarchical classification scenario. [41] developed a model with hybrid capability for detecting COVID-19 with the use of improved marine predators algorithm (IMPA) and ranking-based diversity reduction strategy to acquire particle numbers that are not capable of finding suitable solution within a consecutive number of iterations. Nine chest X-ray images were utilised for the validation of IMPA performance. The threshold levels were between 10 and 100 and compared with five algorithms: (1) whale optimisation algorithm, (2) salp swarm algorithm, (3) sine cosine algorithm, (4) equilibrium optimiser and (5) Harris hawks algorithm. Results showed that the hybrid model proposed based on the experiment outperforms all other algorithms for a range of metrics. Furthermore, on all threshold levels, the performance was convergent in Structural Similarity Index and Universal Quality Index metrics. The academic literature in the research cluster was further discussed from different points of view on the basis of three perspectives, namely, dataset, AI technique and case study used. The dataset that has been used or proposed to be used in the development and evaluation of the COVID-19 diagnosis system was categorised as primary or secondary. The primary dataset is the dataset that was collected during the research and approved by the ethical approval committee. Conversely, the secondary dataset was acquired online (public dataset) and published by researchers to help other researchers test their AI methods and techniques. Moreover, regarding the AI techniques, this study identified the AI algorithms into traditional machine learning algorithms, such as SVM and decision tree, or deep learning algorithms such as convolutional and deep neural networks. Table 1 presents a summary of the studies described in this cluster, focusing on their most important characteristics for COVID-19 diagnoses such as the type of datasets used and summarising AI techniques utilised to solve the case study problems for detecting the COVID-19 medical images. AI approaches for the detection of COVID-19 are considered one of the latest and most trending topics due to the growing pandemic. It is difficult to represent the true state-of-the-art for this purpose considering that new works are emerging every day. Nevertheless, we concluded that majority of the literature aimed to investigate hybrid AI techniques by combining deep learning and traditional machine learning, which is contributed by different types of datasets. In addition, the standard image diagnosis tests for pneumonia are chest X-ray and CT scan. X-ray is more useful amongst these studies because it is cheaper, faster and more widespread than CT. The primary aims of the studies are to identify pneumonia caused by COVID-19 from other types using either X-ray or CT scan. Given that pneumonia can be structured as a hierarchy, a classification scheme considering the multi-class and hierarchical perspectives requires attention and leads to the best COVID-19 recognition rate. The reason behind this is that there is a hierarchy between the pathogens that cause pneumonia. However, only one study [40] considered hierarchical classification approach in the literature. On the basis of previous literature, classification tasks for COVID-19 were different in terms of aspects related to the accuracy of results, in spite of the differences of the overall performance. Previous literature was solely focused on accuracy enhancement, time reduction or even overall performance improvements for the classification. Furthermore, differences exist in previous literature with respect to classification techniques, phases and classification procedures. On the one hand, the developed COVID-19 classification techniques in the analysed studies provide three COVID-19 classification tasks (i.e. binary classification, multi-class classification and hierarchical classification). On the other hand, [39] indicated that all relevant label distribution in a classification problem changes, which explains why four classification types can be performed in the AI techniques, namely, binary, multi-class, multi-labelled and hierarchical classifications. Multi-labelled classification is described in [39] as follows: 'the input is to be classified into several of non-overlapping classes. When the learning task is document topic classification, multi-labelling is often referred as multi-topic classification. In the multi-labelled classification problem, categories are isolated and their relations are not considered important.' However, no study has provided multi-labelled classification for the detection of J o u r n a l P r e -p r o o f COVID-19 medical images. This is considered the first research gab identified in the literature reviewed. Furthermore, the growing number of classification techniques developed for COVID-19 is considered a major problem for health organisations and other treatment centres. The reason behind that these medical organisations that aim to adopt classification techniques for detection of COVID-19 will be encountered a challenge on how to select the best and an appropriate classification technique that would provide an accurate and rapid detection of COVID-19 medical images. Apart from the disparity in COVID-19 classification techniques in terms of their overall performance, all results confirm the difficulty of making a decision to choose a better technique amongst others. In the analysed studies, there is no evidence or proposed solution confirmed to be superior over the rest. Moreover, although multi-labelled classification AI techniques used in the detection of COVID-19 have not been developed, they might be developed in the near future. In the case of this development, another important question will arise: 'which classification technique is appropriate for such purpose?' According to the included final set of articles that met the search query used, no study has provided a comprehensive evaluation and benchmarking solution for AI classification techniques (i.e. binary, multi-class, multi-labelled and hierarchical classifications) used in the detection of COVID-19 medical images. This is considered the second research gab identified in the literature reviewed. [17] recommended that an evaluation and benchmarking solution for multilabelled and/or hierarchical classification techniques could be beneficial and essential to determine which AI technique is appropriate amongst others. To explain the detailed solution for the identified gabs, two problems should be discussed: 'what are the evaluation criteria used in each classification type (i.e. binary, multi-class, multi-labelled and hierarchical classifications), and what are the calculation processes of these criteria? Each of these classification methods has its own evaluation criterion. The calculation procedure for each evaluation criteria is completely different from each classification type [39] , [17] . Thus, the evaluation and benchmarking procedure will be different within each classification method (the evaluation criteria and calculation procedures are specified in detail in the methodology section). This study attempts to fill the gap in the evaluation and benchmarking of different classification types that will be used in the detection of COVID-19. The proposed solution shall assist the administrations of health organisations to evaluate and benchmark COVID-19 AI classification techniques. It can also ensure that the selected classification techniques meet all necessary requirements. To provide such a solution, three specific challenges need to be addressed in the process of evaluation and benchmarking classification techniques, which are described in the next section. In this section, three future challenges will be encountered in the processes of evaluation and benchmarking AI classification techniques used in the detection of COVID-19 medical images as discussed in the following subsections. As stated in the previous section, four categories of classification tasks are identified. Each category is different in terms of criteria type, where the calculation procedure is different for each evaluation criterion. Furthermore, the number of criteria is different within each classification category, for example, six evaluation criteria for binary classification, eight criteria for multi-class classification, four criteria for multi-labelled classification and six criteria for hierarchical classification [39] . In general, most evaluation processes for COVID-19 classification techniques need to consider more than one criterion. For example, the reliability of classification techniques can be measured on the basis of a confusion matrix that contains four parameters: true positive (TP), false positive (FP), true negative (TN) and FN. In other words, the rate of correct and incorrect classified samples is compared between actual class and the predicted class. Thus, this status will affect the results if only one or a full set of parameters is considered in the evaluation process. However, in this regard, there are no suggested solutions to handle these particular issues in terms of evaluation and benchmarking of COVID-19 AI classification techniques. Furthermore, the recommended solution must consider the issue that the evaluation of COVID-19 classification techniques is based on multiple evaluation criteria and consider the difference amongst classification tasks in terms of type of criteria. The issue of trade-off is defined as a situation when a reliability or aspect of something decreases whilst the reliability or aspect of another increases. According to the nature of the evaluation criteria used in AI techniques, different types of trade-off utilised by researchers for different criteria were performed, which in turn were confusing for decision-makers. In addition, in the scope of this study, the different use ratio in different criteria demonstrated effect that explains the conflict on other criteria utilised by researchers. Thus, the evaluation criteria conflict for COVID-19 classification shows important challenges in our intention towards creating a COVID-19 classification approach. Fundamentally, these types of challenges are due to terms confliction, especially the one between the criteria and the data. Thus, it is crucial to realise the advantages and disadvantages of a particular choice whilst making a decision. The trade-off term is frequently used in the context of evaluation, where the process of selection acts as a decision-maker [45] [46] [47] . The trade-off, also known as conflicting criteria problem, between the evaluation criteria concentrated on the application reliability, time complexity for the COVID-19 classification procedure and error rate within the dataset in the benchmarking and evaluation of AI classification techniques used in COVID-19. With the aim of evaluating the COVID-19 classification techniques, these sorts of criteria are considered main necessities. The reliability should possess a high rate; time complexity to conduct the output that also need to below. In addition, the apparent error rate from the training of the dataset has to be simultaneously low. The generated conflicting data are monitored because the matrix of parameter section contains TP, FP, TN and FN, which displays the rise in TP and TN when FP and FN are minimised [48, 49] . This phenomenon shows an apparent conflict amongst the probability criteria. These parameters have a considerable effect on some of the remaining criteria values because some of the criteria rely on the values of these four parameters. Therefore, the process of evaluation and benchmarking must take into consideration such requirements. As a result, a new approach for the evaluation that handles all conflict criteria and data problems should emerge, and this method should be flexible. However, in this regard, there are no suggested solutions to handle these particular issues. Another challenge that might be encountered is associated with the importance of the criteria through the evaluation and benchmarking phases despite their conflict. In addition, this conflict between the criteria poses a significant challenge during the evaluation stage [50] . A suitable procedure for this kind of objectives needs to be developed whilst boosting the significance of a certain evaluation criterion and minimising others [51] . Two major key points must be considered. The first one is to achieve a sufficient understanding of the COVID-19 classification technique behaviour whilst assigning certain significance to the design. The next point is the evaluation approach whilst bearing in mind the issue of trade-off. However, a conflict might exist between the opinions of the evaluator and the objective of the developer, which poses an effect over the last evaluation of the needed approach [52] . From a technical point of view, the COVID-19 classification technique by means of evaluation and benchmarking simultaneously considers multiple criteria and then assign a suitable weight for all evaluation criteria of the COVID-19 classification technique. After making a comparison for all scores of the approach, the approaches with the most balancing rate should be assigned with the highest priority level, whereas the approaches with the least balancing rate should be assigned with lowest priority level. In addition, because COVID-19 classification techniques have to consider multiple criteria, it considered as a difficult and challenging task in time and error rate in the dataset which also could be significantly important in the COVID-19 classification. In addition, each decision-maker assigns a different weight for all these previous criteria [53] . On the other hand, the experts who are in charge of assigning a score for the COVID-19 classification techniques could assign more weights to different features aside from the ones that acquire less interest than any other criteria. By contrast, experts who aim to make use of benchmarking method in order to address such problems would consider different criteria as the most significant ones. This section describes the potential future direction of the process of evaluation and benchmarking the COVID-19 classification techniques used in medical image detection. According to the future challenges discussed, such process could face a multi-complex attribute problem; like that all the AI techniques are considered available alternatives to be a suitable technique. Therefore, adapting candid and structured techniques for decisions using multiple criteria could boost the decision-making quality. Aside from analysis, assessment and ranking, multicriteria decision analysis (MCDA) is considered a solution that aids decision-makers to organise and solve any problem [54, 55] . MCDA is defined as 'an extension of decision theory that covers any decision with multiple objectives. MCDA is a methodology for assessing alternatives on individual, often conflicting criteria, and combining them into one overall appraisal' [56] . The techniques of decision-making are widely recognised, and amongst them, MCDA is the most significant. It is also considered as an important part of operation research that handles problems of decision-making with respect to decision criteria [57] . The technique involves various processes including structuring, planning and solving different decision problems with the use of many criteria [58] . MCDA is increasingly being used as it can promote the decision quality [59] . It is achieved by making the process of the decision more reasonable, efficient, clear and explicit compared with other traditional processes [60, 61] . The most significant goals of MCDA include the allocation of the data miner to choose the most suitable alternatives, assigning a rank to the alternatives in decreasing order with regard to the efficiency and classifying the applicable alternatives amongst groups of available alternatives [62, 63] . On this basis, the ranking will take place on the most suitable alternative(s) [64] . There is a need for fundamental terms in MCDA to be defined, in addition to containing the decision matrix (DM) and its associated criteria [65, 66] . An evaluation matrix contains n attribute and m alternatives, which need identification [67, 68] . The intersection of both criteria and alternatives is defined as z_ij. Therefore, we have a matrix (z_ij) _ (m*n) explained as follows: where 1 , 2 , … . , are probable alternatives, which decision-makers need to rank (i.e. COVID-19 classification AI techniques). 1 , X, … , are the criteria against which the performance of each alternative is evaluated. Finally, z ij is the rating of alternative Y i with respect to criterion X j . There is an improvement possibility for the decision-making process by means of comprising decision-makers and stakeholders, which will enable the process with support and structure [69, 70] . With the use of candid, the structure of multi-criteria decision methods can aid towards improving the decision-making quality and set of techniques [71, 72] . These techniques could identify which of the criteria are relevant and provide information for evaluating the current alternatives [73] . By performing this process, they are able to improve transparency, consistency and decision validity [74] . MCDA can contribute to fair, transparent and rational priority-setting processes [75] . MCDA has been widely used in many areas for different applications [76] . MCDA works by means of ranking and finding the suitable solution to select appropriate alternatives in different domains [77] [78] [79] [80] [81] [82] , especially in healthcare domain [83] [84] [85] . Several MCDA methods can be found in the literature, including the analytic hierarchy process (AHP), weighted product method, hierarchical adaptive weighting, best-worst method, multiplicative exponential weighting, weighted sum model, simple additive weighting, analytic network process, VlseKriterijumska Optimizacija I Kompromisno Resenje (VIKOR), technique for reorganisation of opinion order to interval levels and technique for order of preference by similarity to ideal solution (TOPSIS). Each technique uses different representations [86] [87] [88] [89] . The diversity of MCDA techniques raises a challenge in terms of the selection of the most suitable method for a single scenario. Each technique has its own limitations and strengths [90, 91] . Therefore, selecting the most appropriate MCDA technique is highly important. According to our analysis, all the presented methods in the literature were not used for evaluation and benchmarking of COVID-19 medical image classification over AI techniques. These methods are challenged by non-adoption requirement-driven approach, which makes them unsuitable for measurement and scoring in decision-making [76, 89] . However, for cases that involve numerous alternatives and criteria, TOPSIS and VIKOR are applicable. VIKOR and TOPSIS are convenient to use when the given data are quantitative or objective. TOPSIS can create a shortest distance solution towards the ideal solution and also the largest distance away from the negative-ideal solution. Nevertheless, there is no consideration for the relative significance of these distances [92] . On the other hand, VIKOR has functional relationship to discrete-alternative problems. TOPSIS and VIKOR are considered the most practical techniques in solving real-world problems. The advantage of TOPSIS and VIKOR is that they can rapidly decide the best alternative. Furthermore, they are suitable techniques for cases where there are many alternatives and criteria situations [92] . Nevertheless, the major drawback of TOPSIS and VIKOR is the lack of provisioning for elicitation of weight and checking for judgment consistency [92] . Thus, TOPSIS and VIKOR need an effective technique to acquire the relative importance of various criteria with respect to the objective, and AHP is able to provide such a technique. However, AHP is utilised for setting objective weights on the preferences of the stakeholder [93] , and it is restricted majorly by the human capacity for information processing. Therefore, 7 ± 2 would be the comparison ceiling [94] . The latest trend in MCDA techniques integrates two or more techniques to compensate for the drawbacks of single techniques. AHP and VIKOR are commonly used MCDA approaches in various studies and especially in the medical domain [58] . To evaluate and benchmark AI classification techniques used in the detection of COVID-19 medical images, the present study recommends to integrate AHP for assigning weights for the evaluation criteria of each classification type subjectively by relying on the judgment of experts, and VIKOR is needed to offer a comprehensive ranking of COVID-19 AI classification techniques. This section describes and explains the evaluation and benchmarking methodology of AI classification techniques used in COVID-19 medical image detection. Figure 4 illustrates all elements of our study in the overall proposed methodology. According to the proposed methodology, three phases have been performed for evaluating and benchmarking the COVID-19 AI classification techniques. The first phase is identification, which illustrates the datasets and required pre-processing and identifies the evaluation criteria used in the evaluation and benchmarking of COVID-19 AI classification techniques and the number and type of techniques. The output of this phase are four DMs, one for each classification type. In the second phase, integration of MCDA methods is presented. The AHP method is used to weigh the evaluation criteria subjectively, and the VIKOR method is used for In this phase, four main stages are conducted. First, the dataset and required pre-processing procedure (presented in Section 7.1.1) are identified. Second, the evaluation criteria within each type of classification (presented in Section 7.1.2) are identified. Third, the number and type of COVID-19 AI classification techniques are described in Section 7.1.3. Fourth, the construction of the four types of DMs based on identified elements is described in Section 7.1.4. In this step, three main portions should be defined, namely, target dataset, required pre-processing technique for dataset and most suitable features for classification task [95] [96] [97] [98] [99] . Different COVID-19 datasets can be found in the literature. Some are based on X-ray images [34] , whilst others are based on CT scan images [33] . Each dataset has some limitations. For example, the number of training samples is small, the provided images are of low quality, and the size of the images is not equal. Thus, pre-processing steps (e.g. using data augmentation [34] techniques to generate more medical image samples in order to provide a comprehensive training) are needed to tackle such issues. Furthermore, because COVID-19 can overlap with other pneumonia cases, image segmentation [35] can be used to define the region of interest as a pre-processing step for further analysis of COVID-19 cases. The features extracted from images have a great impact on classification [100] in terms of improving accuracy and minimising error rate, over-fitting and under-fitting issues [101, 102] . Thus, all mentioned scenarios will have a great impact on the results of evaluation and benchmarking for COVID-19 classification techniques. Accordingly, three steps should be provided to achieve an efficient evaluation and benchmarking process for COIVID-19 classification over AI techniques. To train and test COVID-19 classification techniques, the dataset will be separated into two parts. The first part will be used towards training the set, whereas the second part will be used for testing the set. As mentioned before, each classification type has its own evaluation criteria. Accordingly, in this section, the criteria within each classification type are identified, which will involve DMs. As mentioned in the critical review and analysis section, classification tasks are divided into four types, namely, binary, multi-class, multilabelled and hierarchical. On the basis of each classification task, the evaluation criteria of COVID-19 AI classification techniques are listed in Table 2 . Average per-example per-class total error Precision↓ Positive agreement on subclass labels with regard to the subclass labels given by a classifier Recall↓ Positive agreement on subclass labels with regard to the subclass labels given by data Relations between data positive subclass labels and those given by a classifier Precision↑ Positive agreement on superclass labels with regard to the superclass labels given by a classifier Positive agreement on superclass labels with regard to the superclass labels given by data Table 2 , the evaluation criteria for COVID-19 classification techniques are different in terms of the number and calculation procedures within each type of classification. For example, binary classification has eight evaluation criteria, multi-class and hierarchical classifications have six criteria each, whereas multilabelled classification has four criteria. Furthermore, as shown in Table 2 , the precision criteria in the binary type are different from the criteria of precisionµ in multi-class type because the formulas for the two types are different, and other criteria belong to a single classification type. The usage of criteria depends on the target of classification (binary, multi-class, multi-labelled and hierarchical). Thus, different numbers and types of criteria will be involved in a particular DM of each classification task. In this step, the number and type of COVID-19 AI classification techniques are identified, which will be included in each DM type. In general, different types of COVID-19 classification techniques can be found in the literature. Some studies are based on traditional machine learning classification techniques (e.g. [33] ). On the other hand, the majority of classification tasks are based on deep learning techniques (e.g. [7, 34, 36] ). However, the classification techniques that belong to a similar type (e.g. traditional machine learning and deep learning techniques) need to be included for the evaluation and benchmarking process. Furthermore, the number of candidate classification techniques should be defined in the evaluation and benchmarking scenario. As mentioned in Section 7.1.1, the dataset is divided into training and testing sets. However, each individual instance is supposed to belong to a predefined class [18, 98, 103, 104] . In the testing portion, if the classification technique performance looks 'acceptable', then the classification technique can be used to classify future data for which the class label is unknown. Ultimately, the classification technique that provides an acceptable result can be considered an 'acceptable technique'. Furthermore, for more reliable classification techniques, the difference ratio between the performance of the technique in the training and validation stages in terms of accuracy and loss function is very important to avoid over-fitting and under-fitting issues. DM considers the main component in the proposed methodology of evaluation and benchmarking of AI classification techniques used in COVID-19 medical images. DM is composed of decision alternatives and identified criteria. In our case, the classification techniques for COVID-19 are the decision alternatives, and the criteria are identified evaluation criteria based on each classification task. As mentioned earlier, the AI domain has four types of classification tasks (binary, multi-class, multi-labelled and hierarchical). Each type has its own evaluation criteria; thus, each type should have a unique DM based on the distinction of the evaluation criteria. In this study, the DMs of COVID-19 medical image classifications will be constructed based on four different types, namely, binary DM, multi-class DM, multi-labelled DM and hierarchical DM. The DM data of specific classification type are generated from the crossover between the number of COVID-19 AI classification techniques and the number of specific classification type evaluation criteria as follows. This DM is constructed on the basis of the intersection between decision alternatives (i.e. set of COVID-19 AI classification techniques) and six evaluation criteria (i.e. accuracy, precision, recall [sensitivity], F score, specificity, area under the curve) as presented in Table 3 . Table 4 . alternatives (i.e. set of COVID-19 AI classification techniques) and four evaluation criteria (i.e. exact match ratio, labelling F score, retrieval F score and Hamming loss) as presented in Table 5 . precision↓, recall↓, F score↓, precision↑, recall↑ and F score ↑) as presented in Table 6 . However, the data within the four DMs represent the values of the evaluation of each COVID-19 AI classification technique based on the identified evaluation criteria of each classification task. Practically, on the basis of these constructed DMs, three evaluation and benchmarking challenges will be generated and encountered in the future (i.e. multi-criteria, trade-off amongst the criteria and important criteria(, as highlighted in Section 5. The evaluation and benchmarking of AI classification techniques used in COVID-19 medical images is considered a complex MCDA problem. To this end, the development of decision-making approach is important to preclude the evaluation and benchmarking problem complexity. To develop a methodology of evaluation and benchmarking of AI classification techniques used in COVID-19 medical image detection, integration of MCDA methods is presented. Such development is based on AHP method for subjective weighting of identified evaluation criteria within each constructed DM as presented in Section 7.2.1 and VIKOR method for benchmarking and selecting best alternatives (i.e. COVID-19 AI classification techniques) in the constructed DMs as presented in Section 7.2.2. This stage presents the process of assigning suitable weights to the multi-evaluation criteria within each DM subjectively based on the AHP method. The AHP approach involves several steps, which are applicable for any AI classification type of COVID-19 medical image detection. The procedure of AHP includes the following steps [56] . Step 1: The problem is modelled as a hierarchy to start the AHP approach. The hierarchy contains the decision goal and the criteria that must be designed. Pairwise comparison amongst the criteria in the DM of each classification type is conducted to obtain the weights subjectively. Examples of pairwise comparison for three criteria are illustrated in Figure 5 . Step 2: The AHP builds pairwise matrix comparison in Equation (1) to determine a weighting decision: where = 1, = 1 . Step 3: This stage involves the design of a pairwise comparison questionnaire within each type of classification and distributes it to the experts. However, in this step, the number of experts included in the questionnaire should be defined. The target experts are those who have relevant experience with a case study, besides enough period of experience in the same domain. Their preferences and judgments on the evaluation criteria of each classification type used in AHP were evaluated. Step 4: In this step, each element in matrix A (1) is normalised to construct the normalised matrix , ( ) as follows: where A( ) is given by Equation (2). Step 5: This step includes AHP pairwise comparison to utilise mathematical calculations, convert judgments and assign weights for each criterion of each AI classification type. The weights of the decision criterion can be calculated using Equation (4): J o u r n a l P r e -p r o o f where n is the number of compared evaluation criteria of each COVID-19 AI classification type. Step 6: In this step, Equation (5) is utilised to check the consistency ratio (CR) to the pairwise comparison matrix as follows: = . The consistency index (CI) is calculated using Equation (6) as follows: where is the maximum eigenvalue of the judgement matrix. Random CI (RI) is calculated using Equation (7) as follows: A pairwise comparison matrix with a corresponding CR of no more than 10% or 0.1 is acceptable; otherwise it will be ignored. To start with the benchmarking of COVID-19 AI classification techniques, the VIKOR method is utilised considering its suitability for such purpose. In addition, it can provide rapid results and determine which option is the most appropriate one. The COVID-19 AI classification techniques can be benchmarked and ranked according to the VIKOR method using the obtained criteria weights from the AHP method. The VIKOR approach involves several steps [105, 106] . Step 1: Identify the best * and worst − values of all criteria within each DM, i = 1; 2; ...; n. If the ith function represents: A benefit criterion (the larger the better): A cost criterion (the smaller the better): * =min , − =max , . Step 2: AHP is considered for the computation of each criterion weight. A set of weights w = w 1 , w 2 , w 3 , ⋯ , w j , ⋯ , w n from the decision-maker is accommodated in the DM; this set is equal to 1. The resulting matrix can also be computed as demonstrated in the following equation: A weighted matrix is generated as follows: Step 3: Compute the Sj and Rj values, j=1,2,3,….,J, i=1,2,3,…,n by using the following equations: where wi is the weight of criteria expressing their relative importance. Step 4: Compute the values of Qj, = (1,2, ⋯ , ) using the following relation: where * = , − = , * = , − = . v is introduced as the weight of the strategy of 'the majority of criteria' (or 'the maximum group utility'); here, v = 0.5. Step 5: Now the alternative set (i.e. COVID-19 AI classification techniques) can be benchmarked. This process is accomplished by sorting the R and Q values in ascending order. The lowest value indicates the optimal performance. Step 6: Propose the alternative ( ′ ) as a compromise solution. It ranks the best by the measure Q (minimum) if two conditions are satisfied. The conditions are as follows: R1. 'Acceptable advantage' where ( ′′ ) is the alternative in the second position in the ranking list by Q, DQ = 1/(J−1) and J is the number of alternatives. R2. 'Stability' is acceptable with the decision-making context. Alternative ′ should also be the best as ranked by S and/or R. This compromise solution is stable within the decision-making process, which could be a 'voting by majority rule' (v > 0:5), 'by consensus' (v ≅ 0.5) or 'with veto' (v < 0.5). Here, v is the decision-making strategy weight of 'the majority of criteria' (or 'the maximum group utility'). This phase presents the process of objective (Section 7.3.1) and subjective (Section 7.3.2) validations for the results of benchmarking COVID-19 AI classification techniques. Further details are explained in the following subsections. The results of the proposed methodology will be validated by utilising an objective approach as similar to [107] . To validate the results of the ranking with the use of the previous test, the COVID-19 AI classification techniques will be divided into (n) groups on the basis of the ranking results, which were acquired from the proposed methodology. Every group consists of a number of selected COVID-19 AI classification techniques. The number of techniques within each group varies depending on various scenarios. The validation result will not be influenced by the number of groups or AI classification techniques within each group. To make sure that the benchmarking results of COVID-19 AI classification techniques are valid, this study utilises two statistical approaches: mean and standard deviation. The mean ± standard deviation can be calculated for each group of data and is used to ensure that the set of COVID-19 AI classification techniques is subjected to systematic ordering. The mean is the average result. It is calculated by performing a deviation of the sum of the observed results over the result numbers with the use of the following equation: Standard deviation is used to determine the dispersion or variation amount in the set of values and is calculated as follows: For example, let us consider that we have four groups with (n) number of COVID-19 AI classification techniques for each group. In this scenario, the first group must reach the best value, and that has to be proven when the standard deviation and the mean are measured. We assumed that the first group acquired the best in both standard deviation and the mean compared with the other three groups. However, for the second group, its results for the mean and standard deviation have to be poorer than those in the first group and better than those in the third and fourth groups or have to be equal to those in the third group. Accordingly, for the systematic ranking results, the first group must prove that it is the best compared with the other groups. This section describes the subjective validation process. The COVID-19 AI classification techniques will be evaluated by specialist experts in AI classification of medical cases. The experts can prove the effectiveness of the benchmarking results of COVID-19 AI classification techniques obtained by our proposed decision-making approach by examining the values of all evaluation criteria used. The COVID-19 pandemic has a tremendous impact on the life of people around the world, and the number of infected patients has considerably increased. COVID-19 quickly gained a foothold, and nations, governments and scholars are attempting to address this worldwide crisis. Different medical tests are used in the detection of COVID-19. Several studies have used X-rays and CT scans to support and reveal anomalies indicative of COVID-19. CT scan and X-ray tests are utilised as initial detection tools to evaluate the severity of COVID-19, monitor the emergency conditions of patients and predict disease progression. The growing developments of AI techniques have led to the challenges of choosing evaluation and benchmarking AI techniques and which technique is suitable for the diagnosis and classification of COVID-19 medical images. Thus, this study presented a systematic review of AI techniques in the detection and classification of COVID-19 medical images in terms of evaluation and benchmarking. The results showed that only 11 studies utilised AI techniques in detecting and classifying COVID-19 with different case studies. However, this study proved that the process of evaluating and benchmarking of AI classification techniques (i.e. binary, multi-class, multi-labelled and hierarchical classifications), which could be used in the detection and diagnosis of COVID-19 medical image, is a critical gap of related literature. The challenges of such gap are discussed, and the process of evaluation and benchmarking of COVID-19 AI classification techniques is considered a multi-complex attribute problem. Thus, using MCDA is essential. As a potential future research direction, this study provided a detailed methodology for the evaluation and benchmarking of AI classification techniques used in the detection of COVID-19 medical images. Such methodology is presented on the basis of three sequential phases (i.e. identification, development and validation). Coronavirus (COVID-19) outbreak: what the department of radiology should know Unique epidemiological and clinical features of the emerging 2019 novel coronavirus pneumonia (COVID-19) implicate special control measures Correlation of chest CT and RT-PCR testing in coronavirus disease 2019 (COVID-19) in China: a report of 1014 cases Sensitivity of chest CT for COVID-19: comparison to RT-PCR Antibodies in infants born to mothers with COVID-19 pneumonia An analysis of 38 pregnant women with COVID-19, their newborn infants, and maternal-fetal transmission of SARS-CoV-2: maternal coronavirus infections and pregnancy outcomes Automated detection of COVID-19 cases using deep neural networks with X-ray images Coronavirus disease (COVID-19): spectrum of CT findings and temporal progression of the disease Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19 Diagnosing COVID-19 pneumonia from X-ray and CT images using deep learning and transfer learning algorithms Coronavirus disease 2019 (COVID-19): situation report Artificial intelligence in medical practice: the question to the answer? Role of biological Data Mining and Machine Learning Techniques in Detecting and Diagnosing the Novel Coronavirus (COVID-19): A Systematic Review Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions Systematic review of an automated multiclass detection and classification system for acute Leukaemia in terms of evaluation and benchmarking, open challenges, issues and methodological aspects Multiclass benchmarking framework for automated acute Leukaemia detection and classification based on BWM and group-VIKOR Multi-agent learning neural network and Bayesian model for real-time IoT skin detectors: a new evaluation and benchmarking methodology A review on smartphone skin cancer diagnosis apps in evaluation and benchmarking: coherent taxonomy, open issues and recommendation pathway solution Comprehensive insights into evaluation and benchmarking of real-time skin detectors: Review, open issues & challenges, and recommended solutions Smart home-based IoT for real-time and secure remote health monitoring of triage and priority system using body sensors: Multi-driven systematic review Blockchain authentication of network applications: Taxonomy, classification, capabilities, open challenges, motivations, recommendations and future directions Systematic review of real-time remote health monitoring system in triage and priority-based sensor technology: Taxonomy, open challenges, motivation and recommendations Real-time remote health monitoring systems using body sensor information and finger vein biometric verification: A multi-layer systematic review Comprehensive insights into the criteria of student performance in various educational domains Real-time medical systems based on human biometric steganography: A systematic review A survey on communication components for IoT-based technologies in smart homes Based medical systems for patient's authentication: Towards a new verification secure framework using CIA standard Finger Vein Biometrics: Taxonomy Analysis, Open Challenges, Future Directions, and Recommended Solution for Decentralised Network Architectures Sensor-based mHealth authentication for real-time remote healthcare monitoring system: A multilayer systematic review False-negative results of real-time reverse-transcriptase polymerase chain reaction for severe acute respiratory syndrome coronavirus 2: role of deep-learning-based CT diagnosis and insights from two cases COVID-19 Severity Scoring Tool for low resourced settings COVIDiagnosis-Net: Deep Bayes-SqueezeNet based Diagnostic of the Coronavirus Disease 2019 (COVID-19) from X-Ray Images Deep Learning COVID-19 Features on CXR using Limited Training Data Sets COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches Cautions about radiologic diagnosis of COVID-19 infection driven by artificial intelligence COVID-19 and artificial intelligence: protecting health-care workers and curbing the spread A systematic analysis of performance measures for classification tasks COVID-19 identification in chest X-ray images on flat and hierarchical classification scenarios A hybrid COVID-19 detection model using an improved marine predators algorithm and a ranking-based diversity reduction strategy COVID-19 Severity Scoring Tool for low resourced settings COVIDiagnosis-Net: Deep Bayes-SqueezeNet based Diagnostic of the Coronavirus Disease 2019 (COVID-19) from X-Ray Images Deep learning covid-19 features on cxr using limited training data sets High accuracy android malware detection using ensemble learning MARVIN: Efficient and Comprehensive Mobile App Classification through Static and Dynamic Analysis Randomizing Smartphone Malware Profiles against Statistical Mining Techniques Android anomaly detection system using machine learning classification On behavior-based detection of malware on Android platform Machine tool selection using AHP and VIKOR methodologies under fuzzy environment Agricultural performance evaluation by integrating fuzzy AHP and VIKOR methods Decisions with multiple objectives: preferences and value tradeoffs Multicriteria decision making: a case study in the automobile industry Analytic hierarchy process (AHP), weighted scoring method (WSM), and hybrid knowledge based system (HKBS) for software selection: a comparative study Real-time remote-health monitoring systems: a review on patients prioritisation for multiple-chronic diseases, taxonomy analysis, concerns and solution procedure Fault-tolerant mHealth framework in the context of IoT-based real-time wearable health data sensors GIS and multicriteria decision analysis Based multiple heterogeneous wearable sensors: A smart real-time health monitoring structured for hospitals distributor Comprehensive review and analysis of anti-malware apps for smartphones MCDM-If not a Roman Numeral, then what? Multi-criteria evaluation and benchmarking for active queue management methods: Open issues challenges and recommended pathway solutions Mobile patient monitoring systems from a benchmarking aspect: Challenges, open issues and recommended solutions Mobilebased patient monitoring systems: A prioritisation framework using multi-criteria decisionmaking techniques Medical emergency triage and patient prioritisation in a telemedicine environment: a systematic review Cloud service selection using multicriteria decision analysis Multi-Criteria Evaluation and Benchmarking for Young Learners' English Language Mobile Applications in Terms of LSRW Skills Assessment and ranking framework for the English skills of pre-service teachers based on fuzzy Delphi and TOPSIS methods A proposed methodology of bringing past life in digital cultural heritage through crowd simulation: a case study in George Town Novel technique for reorganisation of opinion order to interval levels for solving several instances representing prioritisation in patients with multiple chronic diseases Based real time remote health monitoring systems: A review on patients prioritization and related" big data" using body sensors information and communication technology Realtime remote health-monitoring Systems in a Medical Centre: A review of the provision of healthcare services-based body sensor information, open challenges and methodological aspects Decision-making solution based multi-measurement design parameter for optimization of GPS receiver tracking channels in static and dynamic real-time positioning multipath environment Based on real time remote health monitoring systems: a new approach for prioritization "large scales data" patients with chronic heart diseases using body sensors and communication technology Real-time faulttolerant mHealth system: Comprehensive review of healthcare services, opens issues, challenges and methodological aspects Electronic medical record systems: Decision support examination framework for individual, security and privacy concerns using multi-perspective analysis A survey on multi criteria decision making methods and its applications MOGSABAT: A metaheuristic hybrid algorithm for solving multi-objective optimisation problems An evaluation and selection problems of OSS-LMS packages Towards on develop a framework for the evaluation and benchmarking of skin detectors based on artificial intelligent models using multi-criteria decision-making techniques A new digital watermarking evaluation and benchmarking methodology using an external group of evaluators and multi-criteria analysis based on 'large-scale data Software and hardware FPGA-based digital watermarking and steganography approaches: Toward new methodology for evaluation and benchmarking using multi-criteria decision-making techniques A new approach based on multi-dimensional evaluation and benchmarking for data hiding techniques Evaluation and selection of open-source EMR software packages based on integrated AHP and TOPSIS Multicriteria analysis for OS-EMR software selection problem: A comparative study Novel methodology for triage and prioritizing using "big data" patients with chronic heart diseases through telemedicine environmental Technique for order performance by similarity to ideal solution for solving complex situations in multicriteria optimization of the tracking channels of GPS baseband telecommunication receivers Multi-complex attributes analysis for optimum GPS baseband receiver tracking channels selection Comparative study on the evaluation and benchmarking information hiding approaches based multi-measurement analysis using TOPSIS method with different normalisation, separation and context techniques A Uniform Intelligent Prioritisation for Solving Diverse and Big Data Generated from Multiple Chronic Diseases Patients based on Hybrid Decision-Making and Voting Method An evaluation and selection problems of OSS-LMS packages Use of MCDM techniques for energy policy and decisionmaking problems: A review Compromise solution by MCDM methods: A comparative analysis of VIKOR and TOPSIS Decision support for participatory forest planning using AHP and TOPSIS Why the magic number seven plus or minus two An automated antipornography system using a skin detector based on artificial intelligence: A review A four-phases methodology to propose anti-pornography system based on neural and Bayesian methods of artificial intelligence On the multiagent learning neural and Bayesian methods in skin detector and pornography classifier: An automated anti-pornography system Image skin segmentation based on multi-agent learning Bayesian and neural network Anti-pornography algorithm based on multi-agent learning in skin detector and pornography classifier Systematic review of an automated multiclass detection and classification system for acute Leukaemia in terms of evaluation and benchmarking, open challenges, issues and methodological aspects A survey on deep learning in medical image analysis A review on image feature extraction and representation techniques Robust pornography classification solving the image size variation problem based on multiagent learning A review on intelligent process for smart home applications based on IoT: coherent taxonomy, motivation, open challenges, and recommendations A Novel Multi-Perspective Benchmarking Framework for Selecting Image Dehazing Intelligent Algorithms Based on BWM and Group VIKOR Techniques A new standardisation and selection framework for real-time image dehazing algorithms from multi-foggy scenes based on fuzzy Delphi and hybrid multicriteria decision analysis methods A methodology for football players selection problem based on multi-measurements criteria analysis Multi-biological Laboratory Examination Framework for the Prioritisation of Patients with COVID-19 Based on Integrated AHP and Group VIKOR Methods Novel Multi-perspective Hiring Framework for the Selection of Software Programmer Applicants Based on AHP and Group TOPSIS Techniques Review of the Research Landscape of Multi-criteria Evaluation and Benchmarking Processes for Many-objective Optimisation Methods: Coherent Taxonomy, Challenges and Recommended Solution