key: cord-0816249-adzqqidy authors: Shafqat, Sarah; Fayyaz, Maryyam; Khattak, Hasan Ali; Bilal, Muhammad; Khan, Shahid; Ishtiaq, Osama; Abbasi, Almas; Shafqat, Farzana; Alnumay, Waleed S.; Chatterjee, Pushpita title: Leveraging Deep Learning for Designing Healthcare Analytics Heuristic for Diagnostics date: 2021-02-02 journal: Neural Process Lett DOI: 10.1007/s11063-021-10425-w sha: 960b23294d56b0f9548c50edcef7df8d0ef7c358 doc_id: 816249 cord_uid: adzqqidy Healthcare Informatics is a phenomenon being talked about from the early 21st century in the era in which we are living. With evolution of new computing technologies huge amount of data in healthcare is produced opening several research areas. Managing the massiveness of this data is required while extracting knowledge for decision making is the main concern of today. For this task researchers are doing explorations in big data analytics, deep learning (advanced form of machine learning known as deep neural nets), predictive analytics and various other algorithms to bring innovation in healthcare. Through all these innovations happening it is not wrong to establish that disease prediction with anticipation of its cure is no longer unrealistic. First, Dengue Fever (DF) and then Covid-19 likewise are new outbreak in infectious lethal diseases and diagnosing at all stages is crucial to decrease mortality rate. In case of Diabetes, clinicians and experts are finding challenging the timely diagnosis and analyzing the chances of developing underlying diseases. In this paper, Louvain Mani-Hierarchical Fold Learning healthcare analytics, a hybrid deep learning technique is proposed for medical diagnostics and is tested and validated using real-time dataset of 104 instances of patients with dengue fever made available by Holy Family Hospital, Pakistan and 810 instances found for infectious diseases including prognosis of; Covid-19, SARS, ARDS, Pneumocystis, Streptococcus, Chlamydophila, Klebsiella, Legionella, Lipoid, etc. on GitHub. Louvain Mani-Hierarchical Fold Learning healthcare analytics showed maximum 0.952 correlations between two clusters with Spearman when applied on 240 instances extracted from comorbidities diagnostic data model derived from 15696 endocrine records of multiple visits of 100 patients identified by a unique ID. Accuracy for induced rules is evaluated by Laplace (Fig. 8) as 0.727, 0.701 and 0.203 for 41, 18 and 24 rules, respectively. Endocrine diagnostic data is made available by Shifa International Hospital, Islamabad, Pakistan. Our results show that in future this algorithm may be tested for diagnostics on healthcare big data. Healthcare big data in its heterogenous form is now being analyzed for knowledge discovery and making decisions. Advanced machine and deep learning or neural net techniques [1] for analysis are researched upon to incorporate over cloud taking it towards Smart Health System [2, 3] . Learning Healthcare System [4] revolves around shifting traditional healthcare processes to expert diagnosis and treatments of various diseases [5] . Medical diagnosis is formed considering some common risks and precautions associated [3] . The realization [3, 6] to conquer problems associated with traditional healthcare through computation is a complex task especially in diagnosing [4, 7] . Advances are still being made with help of deep learning neural net techniques [8] under the umbrella of artificial intelligence (AI). Intelligent diagnostic system would shift load of routine clinical tasks from doctors and they would be free to focus on serious patients and complex cases after initial screening and diagnosis feedback from Smart Health System. Multi-class classification [9] of types of Dengue Fever (DF) and comorbidities derived from endocrine data is our problem domain here for evaluating the proposed healthcare diagnostic analytical technique using deep learning heuristics [8] . This multi-class classification of versatile diseases diagnosed through application of different event settings makes our problem heterogeneous [10] in nature. Reported by World Health Organization (WHO) [11] , it was determined that around 50 million dengue infections occur annually globally. Rigorous research in diagnostics [12] has been done over the time in disease prediction, treatment, prevention and control. To assist this research, the diagnostics solution is provided for DF for its three main types: DF, dengue hemorrhagic fever (DHF), and dengue shock syndrome (DSS). In this paper, 17 parameters have been considered for forming the diagnosis problem with hyper parameterization [13] and types of diagnosis are broadened to 8 classes; DF, DF (D/C), DHF, DHF (D/C), DHF (HD), DHF (Leak), DHF/DSS, and DSS. Some missing parameters include; NS3, virology, and active/passive infection [14] . In case of recent outbreak of Coronavirus or Covid-19 the diagnosis is challenging [15] . Patient with usual known symptoms of cough, fever or pneumonia are referred for RT-PCR test which is found 30-60% accurate [15] . Diabetes Mellitus (DM) is one of the incommunicable diseases and major health hazard in developing countries including Pakistan. Predictive methodology for analyzing diabetes patients' data is thus devised to predict types of diabetes prevalent and complications associated to it [16] . It is known that the patient diagnosed with diabetes has to be very careful in keeping the blood sugar controlled otherwise there are chances that long-term diabetes may develop certain complications in form of some known chronic diseases (mayoclinic.org) mainly; cardiovascular disease, brain stroke, nerve damage (neuropathy), kidney damage (nephropathy), eye damage (retinopathy), foot damage, worse skin conditions, hearing impairment, alzheimer's disease, etc. Furthermore, diabetic patients are put in high-risk zone for catching Covid-19. Therefore, for prediction of diagnosis of these diseases, researchers are inclined towards using deep learning heuristics that is trained on previous diagnosis data and predicts the diagnosis for similar undiagnosed cases [3] . Authors of this paper have contributed to evaluate different facets of proposed diagnostic analytics model using the hybridization of three high level deep learning algorithms on phenotypically and symptomatically rich datasets of multiple diseases. The considerable study done is mentioned in Sect. 2 addressing the problems found. Section 3 puts light on formation and modeling of our datasets for the experiments done in Orange framework. In Sect. 4, heuristics of our algorithmic model is discussed for selection of the three most successful algorithms for their properties after analyzing previous results achieved in similar scenarios. Section 5 elaborates the results when our model is applied to different dimensions of endocrine diagnostic dataset and its limited view as big data analytics. Section 6 documents the observations from experiments. Section 7, summarizes and concludes with analysis of its usefulness in future. The advances are seen in constrained clustering as it formulates over algorithms like, kmeans, spectral and other mixture models. The extended model [17] overcomes its three significant limitations; (i) handles instance level constraints having higher difficulty level apart from handling standard together/apart constraints, (ii) resolves cluster level constraints by balancing the size, and (iii) triplet constraints by ordering pair-wised constraints through side information. A good feature set based on similarity is the core requirement for analyzing a complex dataset. There are two basic challenges in constrained clustering. While constraints positively influence the performance of some algorithms when averaged over multiple constraints sets but individually results in worse case compared to no constraint. Sometimes there are limited constraints and expert guidance is inputted through side information. When constrained clustering is formulated over deep learning principals it gets advantages in form of scalability and due to hyper parameterization, the negative effect of individual constraints is diminished as by introducing triple constraints negative instances are separated from positive instances. Data-driven healthcare [18] is highly sought-after research area to transform personalized care for treatment purposes. EHR is the best representation for data-driven healthcare so far having lots of noise and sparseness. Extracting good features for phenotypically assessment of patients is thus challenging. Four layered convolutional neural networks, is applied for extraction of phenotypes to make predictions. From the temporal EHR matrix of each patient all phenotypes are extracted that are filtered for the most significant phenotypes which finally help in predictions. Deep learning is applied on discrete patient for best interpretation in four dimensional temporal EHR matrix. Naïve Bayes [19] is also a known machine learning algorithm and widely used but becomes more efficient when applied in mesh with other algorithms. Study as outlined [20] [21] [22] [23] [24] [25] in Fig. 1 is done on latest trend of advancement in machine learning and analytics is seen in rising of deep learning phenomenon. Here we would dig into history of deep learning and reason for its wider use and adoption for smart health solution. In 2015, researchers gave their view in [26] that deep learning as an advanced form of machine learning approach is taking over medical diagnosis based on latest studies on the topic [1, [3] [4] [5] 8, 9, 13, 16, 26] (Fig. 1) . It uses different combinations of neural network algorithms to abstract multiple levels of representation in medical data whether it is speech, images, or text. Researchers themselves found it highly useful for its superior accuracy to interpret medical images using convolutional neural networks (CNN) with back propagation. We see the very first architecture given for deep learning by Ivakhnenko [26] in 1965 is clearly based on the logic of multilayer neural nets. The considerable work on deep learning started in 1986 when Rina Dechter [26] proposed the term 'deep learning' for the given architecture. It is also known that genetic and differential evolutionary algorithms that is a branch of machine learning forming baseline for deep learning, performs better than particle swarm optimization algorithms when high computation is required as in big data analytics [27] . Results from 13 problems for 33 different algorithms concluded self-optimized Successful Parent Selecting Linear Population Size Reduction eigenvector-based (SPS-L-SHADE-EIG) algorithm [28] to be winning maximum problems and clearly the problems having most function calls (shown in Table 11 of [27] ). Lately other algorithms appeared for evaluation [29] [30] [31] . Managing Healthcare data for analysis to predict or diagnose the patient is challenging and more if a clinic is using traditional method of filing of patients' profiles [32] . Storing medical data electronically as Electronic Medical Records (EMR) in an international standard format used by SNOMED, ICD-10, etc. requires lots of computation and intellectual resources which make the whole process expensive, difficult and time consuming. When the healthcare data is structured on international standards only then it would be best evaluated and interpreted everywhere on similar grounds. Therefore, till now the analysis of healthcare data and its visualization is done in isolation or using online datasets made available. Mainly, in analytics, data mining and machine learning approaches are used. Segmentation, clustering, rule-based associations, and classification concepts lies within the branch of data mining. When deciding if clustering or classification is done on labeled or unlabeled dataset, if analysis is done on runtime or prior stored data this approach would be named as machine learning [33] . This approach is used for naming our learning data model for analytics as supervised (learning through labeled offline data), unsupervised (learning on data entered at runtime) or semi-supervised (using both known data and new unlabeled data that is adding to the storage space). Further machine learning approach transforms to deep learning when self-learning multilayer neural networks takes its place for optimization [12] by forming data hierarchy for unlabeled complex datasets to represent best features in data through exploration [34] . Deep learning is associated for performing analysis on complex datasets where there is huge volume or excessive features from which abstraction is done through hyper parameterization [11] . Hyper-parameters are number of layers, its hidden units, the activation function, arrangement of layers in a network, etc. Hyper-parameter tuning on a new dataset is a difficult task. Hyper-parameters that are found efficient in simple networks often fail to perform in complex scenario. Results change for every dataset therefore MENNDL was proposed for selection of optimal hyper-parameters for large compute nodes. It used the package for its deep learning model named as Convolutional Architecture for Fast Feature Embedding (CAFFE). The work contributed towards deep fair clustering with multi-state protected variables [35] also formed our understanding of complexity in managing and visualizing clusters of classes with large feature set as in our case it is 18. Forming equal size clusters [35] is not the problem we are looking at here. For our problem the definition of fair clustering is the accuracy of diagnosing patients and associating them in the right class and the distribution may vary. Our fairness measure mostly revolves around the distance metrics we choose to bind similar diagnosis in one cluster. We tested our dataset having 15696 endocrine diagnostic records using Z-Score and Anova metrics of Ordinary Least Squares (OLS) to know the best features as in Fig. 2 We did exploration on dataset of 104 patients diagnosed with or without DF with most important parameters counted to 7 out of 18 resulting in different forms of diagnosis from non-critical to fatal cases categorized into 8 classes. Algorithms that were used are Louvain Clustering, Manifold Learning, and Hierarchical Clustering, and results were projected through scatter plot and linearly. In parallel, Multidimensional Scaling (MDS) with weights is applied for clarity in visualization followed by CN2 Rule Induction classifier for single target class label (Diagnosed) to establish rules omitting outliers. Final, labeled clusters are formed on inliers that are visualized with probabilities of occurrences. In [17] , we find global size constraint limiting for our problem as any one diagnostic class may not have same number of nodes and size of clusters would be variant. Exploring discrete patient's [18] symptoms or temporality is also not our concern as we focus on each diagnosis made during patient's visit or admission filtering key phenotypes. KNN is the most common known artificial intelligence algorithm; therefore, it was tested on endocrine dataset of 15,696 records. Key features were made fuzzy; PatientID, gender, test, result, note, practitioner comment and ICD-10-CM. Accuracy achieved on training was 0.52 and resultant visualizations are shown in Fig. 2 . Naïve Bayes [19] may be a winning algorithm but may fail due to its singularity in case of complex heterogeneous dataset in terms of features. We find approaches like Multi-Dimensional Scaling (MDS) faster for visual representation of dataset. Therefore, in our case, we explored combination of algorithms; (i) Louvain clustering, (ii) Manifold Learning, and (iii) Hierarchical clustering, on different sizes of datasets with variations in hyper parameters for large to limited number of features that could be tuned to get maximum accuracy in labeling classes. Weka is widely used analytics tool but there are others as well when more precise and clear representation of data is required using different data modeling and combination of analytics algorithms lying in family of deep learning approach. Other analytics tools used in [12] beside Weka are RapidMiner and Orange. Our paper demonstrates results from Deep Analytical Hybrid Model run in Orange Framework explained in this section. Louvain clustering [36] is refinement of Greedy Modularity Optimization (GMO). It localizes modularity starting by defining multiple communities and shifting nodes to neighboring clusters while optimizing for maximizing modularity [37, 38] . The accuracy of Louvain Clustering was compared with other similar supervised learning algorithms and was found 94% accurate [39] . Authors in [28, 33] shows the strength of Manifold Learning algorithm which itself is recognized for comparison and anomaly detection in multiple machine learning models and known for its interpretation of subsets of diverse features. In conventional methods focus resides on inner structure of model for example in deep neural networks but here more complex cases are solved with integrated multiple models to study inputs and outputs as an unsupervised approach. Its results are better visualized through scatter plot and tabular view that is customized to show feature discrimination. Its interactive design is the result of efforts contributed by a body of researchers and engineers. It is greatly effective in refining and using machine learning models in parallel to articulate individual feature characteristics in subsets that diminishes the chances of coding errors found in other learning models. Manifold learning is used both for 2D and 3D feature classifications as it is used for diagnosing diabetic retinopathy (DR) [40] and Prediction of IQs from functional MRI (fMRI) [36] on real-time dataset. Machine learning paradigm is greatly assisted with visual analytics (VA) for improvements as in VA-Assisted-ML (VIS4ML) [41] . VIS4ML comprises of high-level methodologies using VA capabilities to assist machine learning and may be used to optimize model development. Hierarchical decomposition of network of nodes is one significant method to visualize gaps and similarity in data as representation through clustering is a known conventional approach for deep learning [42] . Hierarchical clustering has always been a measure for qualitative analysis on quantifiable data. Hierarchically Clustered Representation Learning (HCRL) [42] is proposed to keep hierarchical structure in a deep neural architecture. A three-layered approach [43] was presented for categorizing similar patients in deep metric learning framework having ICD-10 coding scheme. Researchers here have devised a semi supervised learning model comprised with heuristic (exploratory) approach based on three layers embedding high level deep learning approaches for visual interpretation of diagnostics of DF categorized as DF, DF (D/C), dengue hemorrhagic fever (DHF), DHF (D/C), DHF (HD), DHF (Leak), DHF/DSS and dengue shock syndrome (DSS). Louvain clustering, manifold learning and hierarchical clustering are applied to model our proposed heuristic named as Louvain Mani-Hierarchical Fold Learning (LMHFL) in orange framework. Seven best features are selected ranked (Fig. 4) using various statistical measures to calculate their influence on our learning of different inferences extracted through provided data. Other known machine learning methods are applied like; Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), Support Vector Machine (SVM) with nonlinear kernel (Table 1) with varying distance metrics. In Tables 1, 2 and 3 several interpretations were drawn with differing models based on altering hyper parameters (visualizations are displayed in Figs. 5, 6) . Clusters formations are visualized and in detail elaborated in Table 3 . Another widely hit infectious disease that has hit the World is Coronavirus (Covid-19) with some other infectious diseases like, SARS, etc. In this paper, as researchers used text/tabular dataset that symptomatically diagnose diseases based on recommended tests and results for Covid-19 and other infectious diseases. Single target class label (finding) depended on 11 best features in Fig. 7 . Multiple target class labels like; finding and survival were considered but the proposed model lacked rule induction feature for multiple classes. Features: offset, age, sex, RT_PCR_positive, intubated, intubation_present, went_icu, in_icu, needed_supplemental_O2, extubated, temperature, pO2_saturation, leukocyte_count, neutrophil_count, lymphocyte_count, view, modality, date, folder, survival (total: 20 features). Meta attributes: patientid, location, clinical_notes, other_notes. Target: finding Out of several views, below were the rules induced with maximum accuracy of 0.701 for diagnosis of Covid-19 and other infectious diseases (Fig. 8) . Parameters for extracting rules were: (i) Rule ordering: ordered, (ii) Covering algorithm: exclusive, (iii) Gamma: The heuristic model for LMHFL algorithm was then tried on around 15696 records of endocrine patients with 7 best features selected using Orange framework in Fig. 9 and gave below results that lacked clarity on different parameters setting (Table 4 ). These ambiguous unclear visualizations in Fig. 10 were due to the huge volume of data that took almost 48 hours processing time. It is greatly felt that for better results for LMHFL algorithm on healthcare big data high performance cloud platform is needed. LMHFL was able to give better visual results for 240 instances (records) extracted using DM comorbidities data model for diagnosing diabetes and its associating comorbidities linked to each patient profile with limited features; PatientID, TestID, Test, and Result for targeted single class; Disease (diagnosed). LMHFL was evaluated apart from probability chart ( Fig. 11 using Spearman correlation giving maximum results for four combination of features that were reduced to C0, C1, C2, C3, C4, C5 and C6, in ranges; 0.952, 0.942, 0.916 and 0.817. Table 2 Detailed structure of clusters formation with df class labels for model/s with different parameter settings in multiple iterations (interpret) After pruning outliers CN2 Rule Induction was applied to finally view combination of rules generated that were 28 for diagnosis of diabetes mellitus and its comorbidities. Several valuable observations from multiple experiments applying same algorithmic model LMHFL with different hyper parameter settings were documented conducted on multiple diseases (Fig. 12 ). 1. LMHFL may be applied for diagnosing multiple diseases. 2. Accuracy of diagnosis is validated on its probability of occurrence with given parameters (as in Figs. 6b, 11, 13 and 14 ). Best quality results of rules induction for DF, Covid-19 with other infectious diseases and DM comorbidities datasets is evaluated by Laplace (Fig. 8) as 0.727, 0.701 and 0.203 for 41, 18 and 24 rules, respectively. Entropy measure showed 0 quality on given algorithmic parameters (Fig. 14) with 28 rules for endocrine diseases and 173 rules for DF classes. 3. A DM comorbidities data model is extracted with 240 instances showing probability of occurrences in Fig. 13 , and Spearman maximum correlation achieved was 0.952. Fig. 5 classify each patient diagnosis based on results for different tests. 5. The diagnosis dependency is seen to relate with specific features (Figs. 5, 6 and 11 show relation with tests conducted and classification based on results). 6. More detailed view of rules induced was observed. 7. Final diagnosis is still left upon doctors after judging the probabilistic scale for disease occurrence on given rules deduced from available features. 8. Visualizations lack clarity on endocrine big data and are not yet patient specific which is platform dependent and this algorithmic model may be validated later over highperformance cloud. In Sect. 4, previous machine learning algorithms were referred to and tried for better representations but failed. LMHFL is a hybrid deep learning heuristic model for visual analytics that may be applied on imperfect training data available in huge amount normalized into structured datasheet complying to HL7 FHIR v4 to get accurate inferences. In [44] , small dataset of only 80 diabetic patients was considered to group as healthy, diabetes Type 2 without depression and with depression to find inferences based on biomarkers. These groups were evaluated on training data with CI of 95% and sample size having depression as comorbidity disease is 50%. The dataset [44] was characterized through t-test and p-value. This study [44] is said to have several limitations. We took larger datasets in different variations and feature based selection for our study. To establish the concreteness of our results for DF (104 instances), infectious diseases having COVID-19 (810 instances), DM (15696 instances) and endocrine comorbidities (240 selected instances from 15696 rows) we passed test data through focused clusters that were labeled as single or multiple target classes [9, 45] . Test data classified in other clusters would gain confidence on the most appropriate probabilistic ratio (as in Figs. 6, 11 or 13) and would be finalized by a qualified doctor intervention. The flexibility to tune this algorithm to fit any dataset is its strength but for better quality and in-time processing of results high performance cloud platform is required that may be RapidMiner or much better version for speed and accuracy. In this paper, the proposed heuristics is limited by its ability to label multi-dimensional data, rules induced were only for single target class and images were not taken as features for diagnosis. In future, authors tend to find in-depth associations [46] of mentioned comorbidities with diabetes mellitus depending on features or biomarkers. The heuristics [46] would be tested and validated on custom rules extracted from experts' opinions in endocrine knowledge domain. Big data platforms and techniques Big data scientific workflows in the cloud: challenges and future prospects: intelligent edge, fog and mist computing Context aware smarthealth cloud platform for medical diagnostics: using standardized data model for healthcare analytics Big data and new knowledge in medicine: The thinking, training, and tools needed for a learning health system Big data analytics enhanced healthcare systems: a review Smarthealth simulation representing a hybrid architecture over cloud integrated with IoT: a modular approach Clinical information extraction applications: a literature review Application of deep learning to biomedical informatics Interpretable multiclass classification by MDL-based rule lists Heteromed: heterogeneous information network for medical diagnosis Disease and economic burdens of dengue Survey of machine learning algorithms for disease diagnostic Optimizing deep learning hyperparameters through an evolutionary algorithm A toolkit for national dengue burden estimation-Google Search. World Health Organization Large-Scale Screening of COVID-19 from community acquired pneumonia using infection size-aware classification Predictive methodology for diabetic data analysis in big data A framework for deep constrained clustering-algorithms and advances Risk prediction with electronic health records: a deep learning approach Smart health prediction system using data mining Cervical cancer identification with synthetic minority oversampling technique and pca analysis using random forest classifier DRFS: detecting risk factor of stroke disease from social media using machine learning techniques Deep learning based genome analysis and NGS-RNA LL identification with a novel hybrid model IoT based health-related topic recognition from emerging online health community (med help) using machine learning technique, mdpi Diabetic retinopathy diagnostics from retinal images based on deep convolutional networks Deep neural networks for multimodal imaging and biomedical applications. IGI Glob Deep learning for healthcare management and diagnosis Swarm intelligence and evolutionary algorithms: performance versus speed A self-optimization approach for L-SHADE incorporated with eigenvector-based crossover and successful-parent-selecting framework on CEC 2015 benchmark set Manifold: a model-agnostic framework for interpretation and diagnosis of machine learning models Community detection using an enhanced louvain method in complex networks Big data analytics: computational intelligence techniques and application areas Methodological challenges and analytic opportunities for modeling and interpreting Unsupervised machine learning for networking: techniques, applications and research challenges Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis Towards fair deep clustering with multi-state protected variables Feature selection based on community detection in feature correlation networks Functional brain network changes following use of an allostatic, closed-loop, acoustic stimulation neurotechnology for military-related traumatic stress A manifold regularized multi-task learning model for IQ prediction from two fMRI paradigms Supervised classification enables rapid annotation of cell atlases Artificial intelligence and deep learning in ophthalmology VIS4ML: an ontology for visual analytics assisted machine learning Hierarchically clustered representation learning Fine-grained patient similarity measuring using deep metric learning Increased serum interleukin-9 and interleukin-1β are associated with depression in type 2 diabetes patients Investigating multi-layer machine learning algorithms to improve diabetic analytic models investigating multi-layer machine learning algorithms to improve diabetic analytic models Universality theorems for generative models Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Acknowledgements This research is supported by Holy Family Hospital and Shifa International Hospital, Pakistan. In this work Dr. Waleed S Alnumay is supported by Researchers Supporting Project number (RSP-2020/250), King Saud University, Riyadh, Saudi Arabia. The data contributed for diagnosis of Dengue Fever ( Fig. 5 and 6) , infectious diseases having Covid-19 (Fig. 8) or Diabetes and its comorbidities (as in Fig. 11 , 13 and 14) holds a lot of worth to come up with these observations mentioned in Section 5 from experimental study. Sarah Shafqat 1,2 · Maryyam Fayyaz 3 · Hasan Ali Khattak 4 · Muhammad Bilal 5 · Shahid Khan 6 · Osama Ishtiaq 6 · Almas Abbasi 1 · Farzana Shafqat 2,6 · Waleed S. Alnumay 7 · Pushpita Chatterjee 8