key: cord-0195268-gsc0kwvn authors: Chire, Josimar; Zuniga, Esteban Wilfredo Vilca title: Characterization of Covid-19 Dataset using Complex Networks and Image Processing date: 2020-09-24 journal: nan DOI: nan sha: 5f33334b3920fa8b18e9f5d44cd68591ab245a46 doc_id: 195268 cord_uid: gsc0kwvn This paper aims to explore the structure of pattern behind covid-19 dataset. The dataset includes medical images with positive and negative cases. A sample of 100 sample is chosen, 50 per each class. An histogram frequency is calculated to get features using statistical measurements, besides a feature extraction using Grey Level Co-Occurrence Matrix (GLCM). Using both features are build Complex Networks respectively to analyze the adjacency matrices and check the presence of patterns. Initial experiments introduces the evidence of hidden patterns in the dataset for each class, which are visible using Complex Networks representation. Covid-19 is a breakthrough in human history. It is destroying powerful economies and collapsing emerge countries. During its early stage, the virus had a reproduction number of 4.22 in Germany and the Netherlands. Even if the developed countries reduced the impact of the virus, in countries with poverty and weak health systems, the virus is still a severe problem. By consequence, many efforts are focused on finding a vaccine, study the virus and find automatic tools to support prognosis of the illness. Oriented on this direction, many groups are working with tomography, x-ray images to build a model using Artificial Intelligence techniques, i.e. Deep Learning [1] [2] [3] At the same time a limitation in the first months was access to images related to covid-19 patients. Deep Learning algorithms are based in Artificial Neural Networks with many layers and where each layer or groups has an specific function but one limitation is the need of great quantity of images. For the previous reason, data augmentation is common to get artificial images with rotation, some noise. One approach which meaningful feature extraction which represents internal patterns from one dataset is Complex Networks [4] [5] [6] [7] , previous experiments showed the strength of approaches using graph representation in comparison to classical Machine Learning algorithms. The presented results introduces the idea of good representation of the internal patterns. Besides, it is possible to affirm that this Complex Network representation can represent this patterns using a small number of samples in comparison with Deep Learning. In this paper, we propose a new technique based in Complex Networks to identify the virus in x-ray images using high-level algorithms to exploit the structure of the features from these images. After a search using keywords, i.e. covid-19 tomography dataset. Many datasets were found but these are too big to download and process later, besides a variety of image formats are available, i.e. nii, dicom. One available dataset 1 is chosen because format png is ready for processing this images. The image Fig. 1 presents 16 samples of positive and negative cases respectively. A dataset with 100 images is selected, positives and negative classes are balanced. By consequence is necessary to find a transformation from images to Complex Networks. A first proposal is using Frequency Histogram, because it can reduce dimensionality and represent the distribution of pixels. Previously, a transformation of color images is performed to get grayscale images. Later, a proposal using GLCM is done to get neighborhood features considering texture analysis. Histogram frequency were calculated to have a lower dimensionality representation and statistical features were calculated, median, mean, standard deviation, kurtosis and skew. This histogram is considering the three channels of classical RGB image representation. Figure Fig. 2 represents the histogram frequency from the previous sample of images. Using this representation lets find a visual difference between positive and negative cases. Grey Level Co-Occurrence Matrix (GLCM) algorithm [8] is a second order statistical method use for texture feature extraction. From this matrix, the next features are extracted: This features are considering 4 orientations: 0, 45, 90 and 135 degrees. A sample of the dataset is presented in Tab. I. A transformation from RGB representation to grayscale is performed using this formula: where RGB are the red, gren, blue channels of the image. Figure 4 presents the results for GLCM features. Column 1 is showing positive cases, and column 2, the negative ones. Considering previous results, using Frequency Histogram and GLCM is possible to notice that Complex Networks building is possible using euclidean distance. Besides, the representation of Complex Network through adjacency matrices presents reticular patterns. This patterns are different, positive cases presents a distribution of further or higher distances between the nodes/elements than negative ones. By contrast, negatives samples presents only a few link with high distances. An approach to represent covid-19 tomography images using Complex Networks is feasible. The intensity of the links represented through adjacency matrices presents a strong difference between both classes. In spite of GLCM is a more elaborated technique to extract neighborhood pattern from the images, frequency histogram has a similar representation. Although, both processes are different to create the Complex Networks has a similar behaviour, this feature is presented in visualization of adjacency matrices. Besides, a comparison between Complex Networks approaches for High Level Classification will be presented. The authors are considering to include a higher number of samples for each classes to have higher diversity of images. Complex Networks representation can be leverage for High Level Classification tasks, then experiment on that way will be performed. Authors wants to thank Research4tech, an Artificial Intelligence(AI) community of Latin American Researcher with the aim of promoting AI, build Science communities to catapult and enforce development of Latin American countries supported on Science and Technology, integrating academic community, technology groups/communities, government and society. Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks Automated detection of covid-19 cases using deep neural networks with x-ray images Covxnet: A multi-dilation convolutional neural network for automatic covid-19 and other pneumonia detection from chest x-ray images with transferable multi-receptive feature optimization Organizational data classification based on the importance concept of complex networks Classification using link prediction New feature for complex network based on ant colony optimization for high level classification A network-based high-level data classification algorithm using betweenness centrality