key: cord-0905526-lwbr65et authors: Ismail, Azimah; Ibrahim, Mohd Salami; Ismail, Samhani; Aziz, Azwa Abdul; Yusoff, Harmy Mohamed; Mokhtar, Mokhairi; Juahir, Hafizan title: Development of COVID-19 Health-Risk Assessment and Self-Evaluation (CHaSe): a health screening system for university students and staff during the movement control order (MCO) date: 2022-04-15 journal: Netw Model Anal Health Inform Bioinform DOI: 10.1007/s13721-022-00357-3 sha: 62dcc071fb50b660b89b6d2177d08d8880142a9c doc_id: 905526 cord_uid: lwbr65et COVID-19 has triggered a global health crisis. Death from severe respiratory failure and symptoms, including fever, dry cough, sore throat, anosmia, and gastrointestinal disturbances, has been attributed to the disease. Development of screening and diagnosis methods prove to be challenging due to shared clinical features between COVID-19 and other pathologies, such as Middle Eastern respiratory syndrome, severe acute respiratory syndrome, and common colds. This study aims to develop a comprehensive one-stop online public health screening system based on clinical and epidemiological criteria. The immediate target populations are the university students and staff of University Sultan Zainal Abidin and the civil servants of the Malaysian Ministry of Science, Technology, and Innovation. Forty-nine (49) clinical and epidemiological factors associated with COVID-19 were identified and prioritized based on their prevalence via rigorous review of the literature and vetting sessions. A pilot study of 200 volunteers was conducted to assess the extent of risk mitigation of COVID-19 infection among the university students and civil servants using the prototyped model. Consequently, twelve (12) clinical parameters were identified and validated by the medical experts as essential variables for COVID-19 risk-screening. The updated model was then revalidated via real mass-screening of 5000 resulting in the final adopted CHaSe system. Principal component analysis (PCA) was used to confirm the weightage of risk level toward COVID-19 to procures the optimal accuracy, reliability, and efficiency of this system. Twelve (12) factor loadings accountable for 58.287% of the clinical symptoms and clinical history variables with forty-nine (49) parameters of COVID-19 were identified through PCA. The variables of the clinical and epidemiological aspects identified are the C6 (History of joining high-risk gathering (where confirmed cases had been recorded), CH11 [History of contact with confirmed cases (close contact)], CH13 [Duration of exposure with confirmed cases (minutes)] with substantial positive factors of 0.7053, 0.706 and 0.5086, respectively. The contribution toward high-risk infection of COVID-19 was firmly attributable to the variables CH14 [Last contact with confirmed cases (days)], CH13 [Duration of exposure with confirmed cases (minutes)], and S1 (Age). The revalidated PCA for 5000 respondents also yielded twelve significant PCs with a cumulative variance of 58.288%. Importantly, the medical experts have revalidated the CHaSe system for accuracy of all clinical aspects (clinical symptoms and clinical history) and epidemiological links to COVID-19 infection. After revalidating the model for 5000 respondents, the PC variance for PC1, PC2, PC3, and PC4 was 27.36%, 11.79%, 10.347%, and 8.785%, respectively, with the cumulative explanation of 58.288% in data variability. The level of risks detected using the CHaSe system toward COVID-19 provides optimal accuracy, reliability, and efficiency to conduct mass-screening of students and government servants for COVID-19 infection. vomiting, headache, and altered smell sensation. More severe complications include respiratory failure with invasive airway support and death (WHO & China 2020) . The broad range of symptoms and complications appears to contribute to the easy spread of the disease. Many infected individuals with mild to moderate symptoms had unknowingly spread the disease, increasing its transmissibility. At the time of writing, the pandemic is responsible for more than 2.2 million deaths while the total number of global cases surge pass 102 million (Jhu 2020). COVID-19 has thus brought a record scale of impact on the economy, industries, health, and social well-being of societies around the world. Consequently, countries worldwide have implemented mass population testing and extensive contact tracing to track, isolate, and treat infected individuals at the earliest time possible. The critical need for mass testing to identify at-risk and infected individuals early is evident by China and South Korea's successful containment strategies. In both countries, mass population diagnostic testing enabled early diagnosis, identification, and isolation infected individuals. This achievement significantly reduced the number of new cases than was initially predicted (WHO & China 2020; Balilla 2020). Unfortunately, mass diagnostic testing requires heavy use of resources, limiting this option to countries with access to rich supplies and human resources. Therefore, a more practical alternative population-based screening with acceptable validity will complement the existing measures. In response, Universiti Sultan Zainal Abidin (UniSZA) developed the COVID-19 Health-Risk Assessment and Self-Evaluation (CHaSe) System. This system, accessible via the internet browser and mobile app application, aimed to provide an intelligent and comprehensive analysis of at-risk individuals at the earliest stage possible. The development of CHaSe was primarily motivated by the critical need to conduct a thorough daily risk-assessment screening among all staff and students of the university as well as civil servants under the ministry of science, technology, and innovation (MOSTI) and various other government and non-governmental agencies. During the CHaSe development, Malaysia implemented a movement control order (MCO) policy that warranted work-from-home for all working class and student groups. Upon lifting the MCO policy, many travelers from areas with high-density cases of COVID-19 may return to workplaces, such as university campuses and MOSTI headquarter, to resume study and work duty. Consequently, the magnitude of risk of outbreaks associated with the influx of returning students and staff mandates more practical, robust, vigilant, and reliable population-based screening measures. CHaSe fills this gap by serving as a platform for intelligent and comprehensive analysis of COVID-19 screening, promotes preventive measures against COVID-19, and recommends advice on action according to individual risk categories. Featuring daily risk-screening assessment, CHaSe extends its role as a comprehensive surveillance tool where individuals with an elevated risk of contracting COVID-19 may declare their status and be contactable by the surveillance team while preserving the ethical medical practice autonomy and confidentiality. Hence, CHaSe enhances the conventional active case detection and contract tracing practice, typically initiated by the health personnel. Through CHaSe, the community can access a valid population-based screening and declare these risk assessments voluntarily to public health personnel. This access significantly increases contact tracing and active case detection outreach, which otherwise would have been practically impossible to attain. This study outlines the development of the risk-prediction model employed within the CHaSe system to achieve this goal. The urgent development of the CHaSe system began by having brainstorming sessions among UniSZA experts from the Faculty of Medicine, Faculty of Informatics and Computer and the Deputy of Research and Innovation Vice Chancellor. Due to the significant morbidity and mortality associated with COVID-19, the development of the CHaSe system adopted the Six Sigma approach of DMAIC (Define, Measure, Analyze, Improve, and Control) (Juahir et al. 2017 ) in its design. Six Sigma is a data-and statistical-driven approach for continuous quality improvement via eliminating defects or problems in the manufacturing and development process. Hundreds of businesses and conglomerates worldwide, such as McDonald's, Texas Instruments, and many Japanese companies, have adopted the Six Sigma approach. Six Sigma's emphasis on quality translates to increased productivity speed with reduced defects at a lower utilization of resources and cost (Juahir et al. 2017; Cox et al. 2016) . The systematic approach assures goal attainment and continuous improvement given the relatively rapid development in knowledge and policies associated with this novel pathology. There are five major phases to DMAIC six-sigma in the development of CHaSe (Juahir et al. 2017; Aris et al. 2010 ). a. Define: Lead by the Vice Chancellor of UniSZA, this phase concentrates on identifying the gaps to address. The ultimate goal is to determine the solutions to ensure health and safety of all students, staff, and general members of the public within the locality of UniSZA and MOSTI. The thorough discussion identified the need for a population-based screening among students and staff that is practical, valid, robust, and timely for decision-making. Eleven medical specialists from the faculty of medicine, UniSZA, with diverse medical and surgical expertise, participated in a comprehensive literature review to identify established risks associated with COVID-19 infection. We also elicit assistance from an information specialist from the library to perform a robust searching strategy with sensitive search strings from all relevant databases. Two independent medical experts screened and selected articles associated with COVID-19. All papers and reports were then uploaded into Microsoft Teams (Microsoft Inc. 2018) for real-time online collaboration during the next phase. b. Measure: The measuring phase involved substantiating the prevalence, frequency, and relevance of variables associated with COVID-19 based on evidence from the literature and medical experts' consensus. Extensive COVID-19 literature was divided and delegated among all individual experts. As a critical-to-quality (CTQ) measure, a standardized data extraction tool was employed within Microsoft Team. The use of cloud-based data extraction enabled a secured synchronization of collaboration and co-authoring (Microsoft Inc. 2018) , which brought together a total of 49 variables extracted from the literature by individual medical experts into a single master data collection (WHO & China 2020; Jhu 2020; Balilla 2020; Center for Disease Control and Prevention 2021; Ministry of Health Malaysia 2020; Islam et al. 2020; Siddiqi et al. 2021; Kluster Tawar 2020; Noor Hisham Abdulah 2020; McMichael et al. 2020; MOH 2020a, b) . Based on all data collected in the database, the associated reported prevalence and relevance were thoroughly scrutinized, vetted, and revalidated by medical experts on their significance and relevance. The revalidated variables were then used for simulation model calculation, where each variable was given a dedicated weightage/ scoring to reflect their significance toward risk prediction of COVID-19. This model's outcome are three distinct risk categories, namely, Low, Medium, and High. c. Analyze: This phase encapsulated the complex process of data preprocessing, data analyses, data analytics, and data validation to generate a risk-prediction model that meets or exceeds the minimum standard of public health practice. We simulated a pilot study by distributing a questionnaire consisting of the validated variables to 200 healthy volunteers. Deliberate weightage determination was assigned for selected parameters. These include close contact with patients of COVID-19, overseas travel abroad within the last 14 days, cough symptom, fever symptom, breathing difficulty, and critical illnesses. d. Improve: To further improve the modeling and CHaSe system itself, we conduct the first trial of mass-screening among 5000 university students during the nation-wide movement control order (MCO). Respondents under the highrisk category were contacted by medical experts and advised to visit the nearest local health center for further assessment. The real mass-screening provides empirical evidence to facilitate the modification of processes as CHaSe is continuously monitored and improvised. The iterative process of recalculating each medical parameter's weightage/scoring was employed using the principal component analysis (PCA). After the new weightages are recalculated, the improved model statistically categorized respondents into three groups; (a) Low Risk, (b) Moderate Risk, and (c) High Risk. This improvement phase of the DMAIC Six Sigma for the CHaSe system acts to evaluate the modification of processes to obtain a more meaningful data interpretation. e. Control As a system-driven methodology, the quality of recommendation from the CHaSe system conforms to the standard operating procedure (SOP) and guidelines by the health ministry. (a) Low Risk: students and civil servants should record the daily Self-Assessment Module from CHaSe until further notice. They should also be practicing the recommended personal preventive measures against COVID-19 infection. (b) Medium Risk: students and civil servants in this category should limit contact with the community and contact UniSZA's contact tracing team for a telephonebased clinical assessment and recommendation. (c) High Risk: UniSZA's contract tracing team initiates telephone contact at the earliest time practical under the active-case-detection strategy. Respondents in this category must limit contact with the community and urgently visit the nearest health center with COVID-19 diagnostic capacity for further assessment. The health screening model within the CHaSe system was developed with the consideration of the following aspects: Principal component analysis (PCA) is that variance in common from the original dataset-based multivariate techniques in revealing the strong factors or factor loadings of a dataset via the underlying latent vectors of principal components (PCs) (Ismail et al. 2018; Fahmi 2011) . These vectors transform the original dataset into new uncorrelated variables producing the best linear fitting with an orthogonal basis. The new axes lie along the orthogonal lines in the directions of maximum variance (Shrestha and Kazama 2007) . The equation below defines the derived outcome (Cloutier et al. 2008) ; In this equation, ȥ denotes the component score and a represents the component loading. Meanwhile, x is the measured value of the variable, i is the component number, j is the sample number, and m is the total number of variables. At the same time, it maximizes the variance and defines the association between the PCs, namely the clinical aspects and epidemiological link to COVID-19 (Epid-Link). Recycled data were important in this study. The index values from the factor score of PCA in the first place consisted of varying scales (internally inconsistent values of negatives) that the algorithm was not easy to make the comparison or assumption with it (Helena et al. 2000) . Generally these PCs are unstable and must be subjected to rotation known as the varimax rotation generating the varimax factors (VFs) (Kamaruddin et al. 2015) . Varimax rotation is important to regulate the overfitting and complexity of the components by making enhancing the positive factor loadings and negative factor loadings within each principal component. The varimax rotation produces a more transparent quantitative measure to correlate each variable with one PC or almost one factor only (Juahir et al. 2017; Ismail et al. 2018; Kamaruddin et al. 2015; Dominick et al. 2012; Fazillah et al. 2017) . The varimax factors with an eigenvalue significantly greater than one (1), form the critical criterion for further analysis (Juahir et al. 2017) . In this study, the strong factor loading of VFs equal to or greater than 0.4 is considered a strong factor loading. The value ranging from 0.2 to 0.39 is deemed moderate, while 0.10 to 0.199 is regarded as weak factor loading (Liu et al. 2003) . The negative value of factor loadings represents the variables' insignificant loading in either the clinical aspects or Epid-Link. In this study, PCA was applied to all initial and revalidated variables in the questionnaire distributed among 200 respondents during the measurement phase and 5000 respondents in the improvement phase. Specifically, the PCA re-constructs factors (such as clinical symptoms of fever and overseas travel) from the large dataset into their (1) PCs z ij =a i1 x 1j + a i2 x 2j + … + a im x mj respective underlying latent variables, the clinical aspects and the Epid-Link. Subsequently, the factor scores of PCA were used as a single index for every contributing factor. The weighted score was used to determine the risk level of the respondents toward the COVID-19 infection based on an algorithm expressed by the equation below: where n denotes the number of factors, F I represents the score of factor i and w i is the percentage of the variance factor that factor i explains. The index values from the factor scores of PCA consisted of varying scales (internally inconsistent values of negatives) which results in in algorithmic difficulty in making comparison and developing assumptions. To achieve consistency, the scales were normalized via the rescaling method; the index values were rescaled to z values (to ensure variance for each variable would equal to unity) using the expression below: where a is equal to 1, xi is the actual observation, A and B are the lowest and highest factor score, and b is the constant value of 100. As the rescaling involved the data normalization, univariate clustering was selected to categorize the risks, namely: Low Risk, Moderate Risk, and High Risk. The lowest value of risk indicates that the respondents have a low risk of COVID-19 infection. Likewise, a high value suggests that respondents have a high risk of contracting COVID-19, thus requiring an urgent further clinical assessment and management. Discriminant analysis is a technique for developing the independent variables' discriminant functions. These functions significantly distinguish these independent variables among themselves (predictors variables) according to the dependent variables (categorical variables). The result is a classification method that produces perfect separation or very minimal misclassification rate (Juahir et al. 2017; Ismail et al. 2018; Gazzaz et al. 2012 ). By applying a linear combination of the predictor variables (Lambrakis et al. 2004) , achieved by maximizing the correlation of variance-covariance between classes while minimizing the variance-covariance correlation within the same class, the discriminant factors (DFs) reveal the derived categorical-variables-based classification (Juahir et al. 2017; Ismail et al. 2018; Gazzaz et al. 2012 ). The DFs for each cluster follow the equation: where i denotes the number of groups (G), ki represents the constant inherent of each group, n is the number of parameters used to classify a set of data into a given group and w j is the weight coefficient assigned by DF analysis (DFA) to a given parameter ( p j ) (Kannel et al. 2007) . The DFs to evaluate the categorical variables of clinical aspects and the Epid-Link were constructed via the DA technique based on two distinct modes, viz. standard and forward stepwise. In the initial phase of the pilot study, the construction of DFs was for 200 respondents with 49 variables. Later, the variables were revalidated that subsequently reduced to 12 variables to serve 5000 respondents. Therefore, the data input to discriminant analysis constituted 200 and 5000 samples of respondents. In the DA stepwise forward mode, variables were added step by step, starting from the most significant variables until no changes were observed (Fazillah et al. 2017 ). The first phase of the Six Sigma DMAIC's critical-toquality (CTQ) approach identified several root causes of outbreaks and potential interventions to COVID-19 infection in Malaysia. The Fishbone Diagram in Fig. 1 below illustrates the dynamic relationship between crucial elements from the known COVID-19 clusters in Malaysia with the underlying mechanism of transmission. (4) f (Gi) = ki + n ∑ j=1 w ij p ij The initial extensive literature review identified 49 parameters associated with COVID-19, as shown in Table 1 below. 5.3.1 Pilot study using forty-nine (49) clinical aspects and epidemiological variables linked to COVID-19 for 200 respondents 5.3.1.1 Determination of weightage scoring using the principal component analysis (PCA) All the initial forty-nine (49) parameters of COVID-19 were scrutinized before confirming the weightage score for categorizing the index using the PCA. PCs with the eigenvalues of more than 1.0 were considered significant. The PCA output revealed twelve significant PCs extracted from the variables with eigenvalues greater than 1 ( Table 2 ). The variance explanations of the PCs were 23.84%, 11.794%, 10.347%, and 8.785% for PC1, PC2, PC3, and PC4, respectively, with a total cumulative explanation of 58.287% in data variability. Thus, the PCA's twelve factor loadings are accountable for most clinical symptoms and clinical history variations. Therefore, predicting the risk of contracting COVID-19 infection based on clinical symptoms and Epid-Link is scientifically determined by these twelve factors (Niranjanamurthy et al. 2020; Cao et al. 2020) (Fig. 2 ). Referring to PC1 in Table 2 , C6 (History of joining highrisk gathering where confirmed cases had been recorded), CH11 [History of contact with confirmed cases (close contact)], and CH13 [Duration of exposure with confirmed cases (minutes)] display strong positive factors of 0.705, 0.706 and 0.509, respectively. These values suggest they are significant variables to predict the risk of contracting COVID-19. Nevertheless, the variables of CS11 [Duration of muscle pain/ fatigue (days)] and CS15 (Headache) with the factor loadings of 0.444 and 0.463 are also considered significant. The PC2 denotes underlying risks associated with age (S1) and underlying history of hypertension (Com2) at the weighted value of 0.8478 and 0.5880, respectively. In the United States of America (US), every 8 out of 10 deaths are from adults aged 65 or older (Center for Disease Control and Prevention 2021). Similarly, those aged 60 or above account for the majority of deaths from COVID-19 in Malaysia (Ministry of Health Malaysia 2020). The relatively lower cardiovascular reserve may explain the high burden of disease and complications caused by COVID-19 among this age group. Furthermore, the older population is more likely to have limited mobility issues and possibly take residence in aged-care centers. These factors explain the increased risk of contracting COVID-19 due to shared Chronic Liver Disease environment and close physical contact. In contrast, hypertension, which also steadily increases with age, is commonly treated with angiotensin-converting enzyme inhibitor (ACEi) and angiotensin-receptor blocker (ARB). Both ACE-i and ARB are associated with COVID-19 through upregulation of angiotensin-converting enzyme 2 receptor (ACE2) (Islam et al. 2020 ). ACE2 is a crucial factor for the mechanism of COVID-19 infection into the human body (Islam et al. 2020 ). On the other hand, the extent of COVID-19 infection has manifested beyond localized symptoms. The third factor (PC3) of PCA illustrates fever (CS1), difficulty breathing (CS8), and vomiting (CS12) as significant variables with the loading values of 0.6456, 0.7064, and 0.6718, respectively. These three clinical symptoms may represent bodily response toward an infection that has turned systemic. There is emerging evidence that COVID-19 leads to multi-organ dysfunction through endothelial injury, a mechanism conventionally associated with a vascular pathology (Siddiqi et al. 2021 ). Nonetheless, it is possible COVID-19 may only cause localized or predominantly respiratory symptoms. The PC4 summarizes sore throats (CS7) and difficulty breathing (CS8) as strong factors, with loading values of 0.8513 and 0.6824, respectively. Lung tissue damage caused by COVID-19 is evident by the autopsy of a 50-year-old patient from the initial case in Wuhan in 2019, which showed bilateral diffuse alveolar damage with cellular fibromyxoid exudates (WHO & China 2020). Difficulty breathing, however, is noticeably a significant factor for both PC3 and PC4. Clinically, both systemic and localized pathologies can cause breathing difficulty. Thus, it is conceivable that COVID-19, which may initially cause difficulty breathing primarily due to the assault to the respiratory system, may later cause the same clinical symptoms when it has gone systemic. The mixed significant factor loading of difficulty breathing to PC3 and PC4 thus reflects the dynamic and complexity of COVID-19 clinical presentations attributable to this symptom. Discriminant Analysis (DA) provides a further statistical estimation on the index variations of clinical aspects and Epid-Link (Table 3) . Based on PCA's risk indices, the DA evaluates the most significant discriminant functions of variables from the clinical aspects (clinical symptoms and clinical history) or Epid-Link. The risk indices were treated as dependent variables, and the clinical aspects (clinical symptoms and clinical history) and Epid-Link variables were treated as independent variables. The discriminant functions (DFs) and classification matrices (CMs) obtained from the DA standard stepwise mode and DA forward stepwise mode are depicted in Fig. 3a and b. The standard DA mode constructs DFs that contained all clinical aspects (clinical symptoms and clinical history) and epidemiological link of COVID-19 variables. The high-risk indices were 92.31% discriminated correctly (Table 3) with different significant clinical aspects and epidemiological variables. Table 3 displays the discriminant analyses that 100% correctly assign categories of Low Risk and Moderate Risk. Three variables appear to be strong discriminants (CH14), duration of exposure with confirmed cases in minutes (CH13), and age (S1). In standard mode, the 200 respondents act as discriminant variables, which the clustering matrix of DA correctly assigns categories at 97.50% accuracy. In contrast, the DA forward stepwise mode worked on Low and Moderate Risks' matrices with nine (9) discriminant variables, responsible for 98.31% and 95.45% correctness of the CMs. Through this study deliberation, the clinical aspects and epidemiological variables of CS11 [Duration of muscle pain/ fatigue (days)], CS9 [Duration of difficulty breathing (days)], and S1 (Age) are attributable to Moderate Risk of COVID-19. However, CM produced 88.00% correct assignments in the entire stepwise forward mode that used 200 discriminant parameters with different clinical aspects and epidemiological links to COVID-19. The matrix of High Risk, however, yielded the lowest at 65% percentage correctness of CMs. These results are notably dissimilar to those obtained from the DA standard mode. The Wilks' Lambda test for standard mode and forward modes revealed a Lambda value of 0.025 (p < 0.0001), and 0.181 p < 0.0001, respectively. The DA null hypothesis, H 0 , states that the mean vectors were equal among the three classes. The alternative hypothesis, H a states that at least one of the mean vectors is different from the others. As the computed p value is lower than the significance level alpha (0.05), this study rejects the null hypothesis, H 0 , and accepts the alternative hypothesis, H a. The risk of rejecting H 0 is true when the result is lower than 0.05. Thus, the discriminant analysis suggests that the included variables were sufficient to discriminate the three groups and accounted for most of the index scale variations of COVID-19 infections. These variables were the last contact with confirmed cases (days)(CH14), duration of exposure with confirmed cases (minutes) (CH13), age (years) (S1), duration of muscle pain/ fatigue (days) (CS11), and duration of difficulty breathing (days) (CS9). A real mass-screening using the CHaSe system was conducted among 5000 university staff and students from the pilot study analysis phase. The modeling employed in the CHaSe system comprises fifteen (15) questions as outlined in Table 4 below: After consideration was made based on these 5000 respondents' actual responses, the system's modeling was revalidated. The association of the clinical and Epid-Link parameters was reduced to eleven (11) variables. To High Risk Moderate Risk re-confirm the risk level of COVID-19 (Low Risk, Moderate Risk and High Risk), we conducted a thorough iterative process. These include interactive machine learning algorithm (the weightage scoring), simulation of the modeling, and predictive modeling integration into the CHaSe System with live raw data of 5000 respondents using the matrices index of PCA (Fig. 4) . The medical experts revalidated all the initial eleven parameters of COVID-19 before re-confirming the weightage score for categorizing the index using the PCA based on 5000 respondents. Interestingly, the revalidated PCA for 5000 respondents has also yielded twelve significant PCs with an eigenvalue greater than 1 (Table 3 ; Fig. 3 ) for a total of the explained cumulative variance of 58.288%. Importantly, the CHaSe system's modeling underwent repeated revalidation by medical experts for knowledge accuracy of all the clinical aspects and the Epid-Link (clinical symptoms and clinical history). After repeated revalidation by medical experts, the PCs' variance explanations are 27.36% for PC1, 11.79% for PC2, 10.347% for PC3, 8.785% for PC4, with a cumulative explanation of 58.288% in data variability. Figure 4 indicated twelve factor loadings in the PCA accountable for 58.288% (Fig. 3) of the clinical symptoms and clinical history variations. Therefore, these results demonstrate the initial clinical aspects and Epid-Link variables yielded from PCA of the pilot study were validated by further analyses of the real mass-screening among 5000 respondents (Niranjanamurthy et al. 2020; Cao et al. 2020 ) (Figs. 5 and 6; Tables 5 and 6). After model revalidation, the CHaSe System of the COVID-19 screening was accessible by students and the civil servants via the internet browser or mobile application for self-monitoring (Fig. 7a) . As part of the organizational policy to prevent COVID-19 outbreaks, it was made mandatory for the civil servants and students to undergo daily screening through the CHaSe system throughout the national Movement Control Order (MCO). The CHaSe system administration will monitor the updated data, and the outbreak details were illustrated as shown in Fig. 7b . The administration of the CHaSe system prompted the mapping of risk trends covering the entire states of Malaysia. Figure 8 portrays the COVID-19 data outbreaks that comprise Risk Trend over Time, Risk Level of Malaysia as a whole, and Risk by States. This mobile application of COVID-19 is the first online monitoring developed in Co-morbidities Diabetes 12 Co-Morbidities Hypertension 13 Co-morbidities Ischemic Heart Disease 14 Co-morbidities Heart failure 15 Co-morbidities Respiratory disease (e.g., asthma, COPD, bronchiolitis etc.) 16 Co-morbidities Stroke/TIA Variance explanation of the strong factor loadings of 27.36% for PC1 and 11.79% for PC2 after varimax rotation of 11 parameters for 5000 respondents Malaysia based on the integration of computer science, data analytics (algorithms on the weightage score for index category) and medical factors based on actual respondents. The Six Sigma approach adopted in this study serves as a problem-solving tool to address the critical need to conduct validated mass-screening when resources are limited (Juahir et al. 2017) . As discussed in the define phase, this study identifies the crucial gap in achieving a valid mass screening to mitigate the risk of cluster outbreak upon the return of staff and students post MCO. This objective is a form of control measure in evaluating if the CHaSe system meets its purpose. Through the CHaSe system, this study completed a mass-screening of 5000 university students and staff. Robust active case detection and contact tracing ensure that respondents that have a high risk of contracting Model-49V represents the modeling of initial forty-nine (49) variables based on literature review, whereas Model-11V and Model-9V represent eleven and nine variables based on actual 5000 respondents. Both Model-11VR and Model-9VR were also based on 5000 respondents, but the front liners medical expert revalidated the model to procure the optimal accuracy, reliability, and efficiency of the CHaSe system. diagnostic test but may consequently develop them as the disease progresses. Thus, asymptomatic infection is defined as lack of symptoms since exposure until complete recovery. These challenges motivate us to advocate daily screening with validated models via the CHaSe system for our students and staff to increase screening sensitivity further. Therefore, the technology-enhanced screening via the CHaSe system does not just make possible a one-time COVID-19 screening. The use of mobile technology also enables regular risk assessment as a surveillance strategy for a large population. The ability to conduct risk assessments and further personalized clinical evaluations through daily follow-up phone contact for high-risk groups serves as a robust and comprehensive screening and surveillance strategy to reduce the risk of cluster outbreaks. Nonetheless, asymptomatic individuals with no known Epid-Link were also reported among those diagnosed with COVID-19 in Malaysia (Kluster Tawar 2020; Noor Hisham Abdulah 2020; MOH 2020a, b). Hence, the CHaSe system is not replacing the existing public health measures to control the pandemic. Instead, CHaSe serves to complement and enhance the current strategy for screening and surveillance for COVID-19. On the system side, CHaSe has been developed into two main applications: web-based system and mobile apps (android and IOS). The technologies employed include MySQL as the Database Management System (DBMS) and Laravel Framework (PHP) for the web-based programs. The rules extracted from experiments using various machine learning techniques (PCA for dimensional reduction, features selection) as well as classification and regression methods were translated into PHP programming. Pilot testing conducted to verify the results and the rules generated from the computed models were observed to be consistent with the result by obtained from the medical experts. COVID-19 outbreak has rapidly escalated to become a pandemic within months of its emergence. The easy-spread and scale of the disease's impact demand medical experts and policy-makers to establish a valid screening model that can be applied effectively and efficiently to the mass population. Comprehensive conventional active case detection and contact tracing are effective in containing the outbreak. Nonetheless, these measures require large supplies and resources which are not always immediately accessible to government and private organizations. In Malaysia, students and staff of higher educational institutions in Malaysia will return to campus at the end of the movement restrictions policy to invigorate the economy and educational activities. Thus, there is an urgent need for educational institutions to conduct mass screening and surveillance as part of the organization's initiative to mitigate the risk of COVID-19 cluster outbreaks. We have developed a robust risk-prediction modeling within the CHaSe system by means of a collaboration between medical, data and computing experts to address this need. DMAIC Six Sigma approach was applied as a systematic and critical-to-quality (CTQ) approach to achieving this goal. In this study, we report the vigorous iterative process of how the risk-prediction modeling was comprehensively developed and then revalidated by front liner medical experts to procure optimal accuracy, reliability, and efficiency. Combined with the mobile technology, the results demonstrate the feasibility of mass screening 5000 students and staff within the capacity of our university's resources. Furthermore, the analysis provides risk trend over time, risk level of the entire country and risk by states. Thus, the modeling system within CHaSe serves as a valuable approach to complement and enhance the existing public health measures against COVID-19. Extenuation of saline solutes in shallow aquifer of a small tropical island: a case study of Manukan Island Assessment of COVID-19 mass testing: the case of South Korea A trial of lopinavir-ritonavir in adults hospitalized with severe Covid-19 COVID-19-Older adults Multivariate statistical analysis of geochemical data as indicative of the hydrogeochemical evolution of groundwater in a sedimentary rock aquifer system Visual Six Sigma: making data analysis lean Spatial assessment of air quality patterns in Malaysia using multivariate analysis River water quality modeling using combined principle component analysis (PCA) and multiple linear regressions (MLR): a case study at Klang River Evaluation of socioeconomic status on drug addicted person Characterization of spatial patterns in river water quality using chemometric pattern recognition techniques Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga River, Spain) by principal component analysis RAAS Inhibitors and Risk of Covid-19 Attentional process during listening to quantitative quranic verses (Fatihah Chapter) associated with memory, speech and emotion Improving oil classification quality from oil spill fingerprint beyond six sigma approach Spatial characterization and identification sources of pollution using multivariate analysis at Terengganu River Basin Application of water quality indices and dissolved oxygen as indicators for river water classification and urban impact assessment The use of multicomponent statistical analysis in hydrogeological environmental research Application of factor analysis in the assessment of groundwater quality in a blackfoot disease area in Taiwan Epidemiology of Covid-19 in a long-term care facility in king county Overview of security and compliance in Microsoft Teams. https:// docs. micro soft. com/ en-us/ micro softt eams/ secur ity-compl iance-overv iew Ministry of Health Malaysia (2020) Update on COVID-19 status and statistics Kluster Tawar: Kluster aktif terbesar dan strain D614G Official Facebook of The Ministry of Health Malaysia-Kluster Tawar Official Facebook of The Ministry of Health Malaysia-Kluster PUI Sivagangga Coronavirus-COVID-19 before and after solution through web application and app Alhamdulillah, pada hari ini 12 Assessment of surface water quality using multivariate statistical techniques: a case study of the Fuji river basin COVID-19-a vascular disease Report of the WHO-China joint mission on coronavirus disease 2019 (COVID-19). World Health Organization