key: cord-0670634-va6lwyc6 authors: Harikrishnan, NB; Pranay, SY; Nagaraj, Nithin title: A Neurochaos Learning Architecture for Genome Classification date: 2020-10-12 journal: nan DOI: nan sha: 8d60fe28f5701529cc71302bf65ee3a0af3100e7 doc_id: 670634 cord_uid: va6lwyc6 There has been empirical evidence of presence of non-linearity and chaos at the level of single neurons in biological neural networks. The properties of chaotic neurons inspires us to employ them in artificial learning systems. Here, we propose a Neurochaos Learning (NL) architecture, where the neurons used to extract features from data are 1D chaotic maps. ChaosFEX+SVM, an instance of this NL architecture, is proposed as a hybrid combination of chaos and classical machine learning algorithm. We formally prove that a single layer of NL with a finite number of 1D chaotic neurons satisfies the Universal Approximation Theorem with an exact value for the number of chaotic neurons needed to approximate a discrete real valued function with finite support. This is made possible due to the topological transitivity property of chaos and the existence of uncountably infinite number of dense orbits for the chosen 1D chaotic map. The chaotic neurons in NL get activated under the presence of an input stimulus (data) and output a chaotic firing trajectory. From such chaotic firing trajectories of individual neurons of NL, we extract Firing Time, Firing Rate, Energy and Entropy that constitute ChaosFEX features. These ChaosFEX features are then fed to a Support Vector Machine with linear kernel for classification. The effectiveness of chaotic feature engineering performed by NL (ChaosFEX+SVM) is demonstrated for synthetic and real world datasets in the low and high training sample regimes. Specifically, we consider the problem of classification of genome sequences of SARS-CoV-2 from other coronaviruses (SARS-CoV-1, MERS-CoV and others). With just one training sample per class for 1000 random trials of training, we report an average macro F1-score>0.99 for the classification of SARS-CoV-2 from SARS-CoV-1 genome sequences. Robustness of ChaosFEX features to additive noise is also demonstrated. Chaos emanates from the variety of behaviours exhibited by a simple deterministic non-linear dynamical system. This range of behaviours vary from periodic to random-like [1] . The discovery of chaos theory also contributed to research in neuroscience, climate science, cryptography, epidemiology etc. Chaotic behaviours are well studied in neuroscience, especially at the level of neurons [2] , [3] . Neuronal cells exhibit a large range of firing patterns such as repetitive pulses (periodic), quasi-periodicity, and bursts of action potentials [2] . These firing patterns are driven by the external stimuli such as variations in the ionic environment driven by the effects of neuromodulators. This variability in the firing patterns indicates the presence of non-linearity and chaos at the level of neuron, axon etc. Such conclusions were inferred using classical intracellular electrophysiological recordings of action potentials in single neurons along with the help of macroscopic models [2] . There has been extensive research to develop mathematical models to capture the behaviour of a biological neuron. One such model is the Hindmarsh and Rose model [4] . This model captures the oscillatory burst discharges found in real neuronal cells and is capable of exhibiting chaos. A detailed study that supports the presence of chaos in the brain at various spatiotemporal scales and mathematical neuronal models exhibiting chaos is provided in [5] and [2] . The current state of the art deep learning algorithms do not incorporate the rich properties of chaos at the level of artificial neurons. This research gap has been highlighted in our earlier work [6] , [7] . The presence of chaos in biological neural networks at various spatiotemporal scales [3] makes chaotic neurons a potential candidate to compete with current artificial neurons used in Artificial Neural Networks (ANNs). In our earlier work, we proposed the ChaosNet architecture [7] , where we used chaotic 1D Generalized Luröth Series (GLS) neurons for extracting nonlinear features from the data for solving classification tasks. Further, in [8] , we augment the nonlinear features extracted from a single layer of GLS neurons with a Support Vector Machine classifier (SVM) trained using a linear kernel (ChaosNet + SVM). The efficacy of ChaosNet + SVM in low training sample regime is highlighted in [8] for Iris dataset and synthetically generated data. In this work, we propose an overarching architecture titled 'Neurochaos Learning' (NL) that generalizes our previous research (ChaosNet [7] , ChaosNet +SVM [8] ). We contrast NL with ANN and further provide mathematical justification of the power of chaos that is employed in NL (at the level of individual neurons) in approximating a large class of discrete nonlinear functions (real-valued with finite support) -by proving a version of the universal approximation theorem. We also highlight the advantages of chaotic feature engineering which is implicit in NL architecture for the classification of coronavirus genome sequences in the high training sample as well as low training sample regimes. Genomics research addresses the study of the roles of multiple genetic factors and its interaction with the environment [9] . This research was enabled by the Human Genome Project. There were two key motivations for the development of Human Genome Project: (a) to exploit the global views of genome could speed up the biological research, (b) to attack problems in a unbiased and comprehensive way [10] . The Human Genome Project was followed by other projects like ENCODE [11] , FANTOM [12] , Roadmap Epigenomics [13] . Because of these projects, there is an abundance in genomic data. This motivates the use of Machine Learning (ML) and Deep Learning (DL) algorithms in genomics research. Recent research [14] , [15] shows the effectiveness of Convolutional Neural Networks (CNNs) to model the sequence specificity of protein binding. In [16] , a three layer CNN was used to predict the effects of non-coding variants from genome sequence. Similar to CNN, Recurrent Neural Networks (RNNs) is another popular DL algorithm widely used in sequence modelling. The authors in [17] highlight the performance of hybrid architectures on a transcription factor binding (TFBS) site classification task. The research showcases the performance of CNN-RNN hybrid architecture in comparison to standalone CNN as well as RNN. DL algorithms are ideally suited when the number of instances in the training data set is very high. But this need not be the case all the time. The best example for limited training data is the outbreak of COVID-19 pandemic disease. The spread rate of the highly contagious SARS-CoV-2 virus responsible for COVID-19 is very high and has spread to all countries of the world [18] . Also, the genome sequence of the virus shares a 79% match with that of the SARS-CoV-1 viral genome and a nearly 50% match with that of the Middle East Respiratory Syndrome Coronavirus (MERS-CoV) genome [19] . Some of the common symptoms of COVID-19 are dry cough, shortness of breath and dyspnoea, myalgia, headache and diarrhoea [19] . The outbreak was declared a pandemic in March 2020 and while there have been 37 million cases reported globally, there is no accepted vaccine for COVID-19. An early identification of this deadly disease and isolation of patients from the rest of the population would have facilitated effective containment of disease spread. For this, we would need novel computational methods which can uniquely identify the signatures of SARS-CoV-2 virus from limited samples (available during the early stages of the outbreak). In such a scenario, ML algorithms need to classify from fewer instances of training samples, especially during the initial few days/weeks of the outbreak. It is in such situations, we demonstrate the usefulness of NL for classification. This could, in principle, be applied for future outbreaks of novel diseases. The sections in this paper are arranged as follows: Section 2 explains the proposed NL architecture, Section 3 provides the information of dataset used in this research, Section 4 highlights the experiments conducted on synthetic as well as real world data, Section 5 provides the scope for future work and the closing remarks. The chaotic neurons we consider are piece-wise linear 1D maps known as as Generalized Luröth Series (GLS) [20] . The tent map and the binary map are commonly used GLS maps and we use the former as chaotic neurons for the proposed architecture. The tent map is mathematically represented as follows: where x ∈ [0, 1) and 0 < b < 1. Refer to Figure 1 . The presence of neural chaos in the brain, which we describe as 'neurochaos' and the earlier success of ChaosNet for classification tasks inspires us to propose a 'Neurochaos Learning' architecture (or NL for short) in this work. The architecture consists of a multi-layer neural network built of chaotic neurons. ChaosNet is a particular instance of NL architecture used for classification tasks only (which makes use of a simple decision rule based on mean representation vectors) [7] . In this paper, the NL architecture includes extraction of features from the input layer followed by a linear SVM classifier. The feature extraction step using chaotic neurons is termed as ChaosFEX. Further, we can extract ChaosFEX features from a multilayer chaotic neural network with homogeneous as well as heterogeneous chaotic neurons at different layers. These extracted ChaosFEX features can be freely combined with any of the available classifiers or regression models from machine learning literature. Thus the proposed neurochaos learning architecture allows for a great deal of flexibility to be combined with traditional ML algorithms. A comparison of the properties of NL with ANNs is provided in Table 1. As previously defined, ChaosFEX stands for the features extracted from the input layer of NL. In an earlier work, we have shown that such features when passed through SVM classifier with linear kernel act as an efficient representation of input data [8] . The chaotic features are able to provide linear separability between the classes. The proposed NL architecture using ChaosFEX is provided in Figure 2 . The architecture consists of a single layer of GLS neurons (C 1 ,C 2 , . . .C n ). Features extracted from the n input GLS neurons constitute ChaosFEX which is followed by SVM classifier with linear kernel. All GLS neurons in the input layer have an initial neural activity of q units. The skewness of GLS maps is controlled by the discrimination threshold (b). By varying b, the chaotic neurons can exhibit weak and strong chaos (as determined by the value of the Lyapunov exponent). The stimulus or input data to the proposed architecture are represented as x 1 , x 2 , . . . , x n in Figure 2 . The stimulus initiates the firing in chaotic neurons. The chaotic firing trajectory of the k-th GLS neuron, represented as A k (t), halts when the trajectory reaches the . . x n . The chaotic neuron, say C k , starts firing when it encounters the corresponding stimulus x k . The trajectory of k-th chaotic neuron C k is represented as A k (t). The trajectory continues until it reaches the ε neighbourhood of the stimulus. From the chaotic trajectory A k (t), we extract firing time, firing rate, energy of the chaotic trajectory, entropy of the symbolic sequence of chaotic trajectory. These extracted features (ChaosFEX) are passed to SVM classifier with linear kernel. The nonlinearity in ANN is provided by the activation function which is not needed for NL. Yes No Not currently used. NL could employ backpropagation in the future if needed. ε neighbourhood I k = (x k − ε, x k + ε) of the stimulus x k . The time taken (N k ) for A k (t) to reach the ε neighbourhood of the stimulus (x k ) is defined as the Firing Time [6] . The chaotic firing is guaranteed to stop because of the topological transitivity [1] , [7] property of chaos. Thus, for a single stimulus say x k , the GLS neuron (C k ) outputs a chaotic trajectory. From this chaotic trajectory we extract the following features -ChaosFEX: 1. Firing Time. 2. Firing Rate [7] . 3. Energy of the chaotic trajectory [8] . 4. Entropy of the symbolic sequence of the chaotic trajectory [8] . These extracted features are passed to SVM classifier with linear kernel. (Note: In the rest of this paper, ChaosFEX+SVM and ChaosFEX are used interchangeably.) Neurons in the brain are known to have a non-linear response and further found to exhibit chaotic behaviour [5] , [2] . However, existing ANN architectures do not exhibit chaos at the level of neurons. In the case of NL, the neurons fire chaotically upon encountering an input sample. The notion of firing (trajectory) is absent in traditional ANNs (Table 1 ). To evaluate the robustness of ChaosFEX, we conduct two sets of binary classification experiments -(a) we train the NL using a noisy train data and test on a noiseless test data; (b) using the same hyperparameters used in experiment (a), we evaluate the performance of NL on a noiseless train and test data. The above two experiments are also evaluated for SVM with RBF kernel for a comparative performance analysis. We use two sets of simulated datasets for these experiments -the concentric circle data (CCD) and overlapping concentric circle data (OCCD). The governing equations for OCCD are as follows: (1) where i = {0, 1} (i = 0 represents Class-0 and i = 1 represents Class-1), , normal distribution, with µ = 0 and σ = 1. In the case of CCD, the value of α used in equation 1 and equation 2 is set to 0.01. Figure 3a and Figure 3b represents the noiseless (CCD) and noisy (OCCD) data respectively. Table 2 and Table 3 represents the train and test data statistics used in the noisy and noiseless experiments. Table 4 represents the results corresponding to noise experiments. Expt-1: Training using OCCD and Testing using CCD using NL The first step is to find the hyperparametersinitial neural activity (q), discrimination threshold (b) and epsilon (ε) suitable for ChaosFEX to learn the distribution of a noiseless data from the distribution of a noisy data. We used a small set of CCD as a validation set and found the suitable hyperparameters that gives best performance for the validation data. The following hyperparameters, q = 0.34, b = 0.499, and ε = 0.18 gave a best macro F1-score of 0.99. The number of class-0 and class-1 data instances belonging to the validation set are 1087 and 1073 respectively. We then retrained the NL with the hyperparameters using OCCD and tested on unseen CCD. For the unseen CCD test data we get an macro F1-score of 0.98 (Table 4) . Expt-2: Training using CCD and Testing using CCD using NL In this scenario, we train the NL using CCD and test using data drawn from the same distribution. In order to evaluate the robustness of ChaosFEX features, we use the same hyperparameters (q = 0.34, b = 0.499 and ε = 0.18) used in experiment 1. For the unseen test data, we get a F1-score of 0.99. This shows that the hyperparameters are invariant to the noise in the training data. Since the same hyperparameters work in both experiment 1 and 2, this shows the robustness of ChaosFEX to noise in training data. We could have instead chosen to train NL independently for experiment 1 and 2 in which case we would have got different set of hyperparameters which are optimum in those cases. For example, with the hyperparameters (q = 0.22, b = 0.96, ε = 0.018), we get 100% classification accuracy for experiment 2. ChaosFEX features thus provide us with the flexibility to choose between robust and optimum hyperparameters based on the desired application. Expt-3: Training using OCCD and Testing using CCD using SVM+RBF A maximum macro F1-score of 0.665 was obtained for C = 1 and gamma = 0.1. We then retrained SVM with RBF kernel with C = 1 and gamma = 0.1 using OCCD and tested on unseen CCD. For the test data we get a macro F1-score = 0.67 (Table 4) . In this case, we train and test SVM with RBF kernel using CCD. For the same hyperparameters as used in experiment 3 (C = 1, gamma = 0.1), we get a test macro F1-score of 0.80. (Note: for C = 1 and gamma = 'scale', we get a test accuracy of 100%). Thus, there is a higher degradation of performance metrics (F1-score) in using the same hyperparameters for experiment 3 and 4 (0.67 and 0.80) as compared to ChaosFEX (0.98 and 0.99). Clearly, ChaosFEX features are more robust to noise. Definition: A dynamical system (Σ) is transitive if for each two points x, y ∈ Σ and for ε > 0, there exist a z ∈ Σ such that on finite number of iterations, z reaches the ε neighbourhood of x and y. Definition: A set Y is a dense subset of X if, for any point x ∈ X, there is a point y in the subset Y arbitrarily close to x [1] . We use the Topological Transitivity property of chaos and the existence of a dense orbit to prove the Universal Approximation Theorem (UAT) for GLS neurons 1 . Let f (n) be a discrete time real valued function having a finite support L. The Neurochaos Learning architecture (NL) consisting of a single layer with L chaotic neurons can approximate 2 f (n). Assuming that we use a chaotic 1D map C i for the i-th neuron in NL, and given any desired error ε > 0, we have: where q is the initial neural activity for all the neurons in NL, N i is the firing time of the i-th chaotic neuron and C i is the 1D chaotic map with the chaotic trajectory starting from q a dense orbit. Proof : (by construction). Design an NL with one layer with exactly L chaotic neurons. Let each of the neurons be initialized with q and let the input to this NL be the L real-valued samples of the function f (n) which act as stimuli for the corresponding L chaotic neurons. Now, for a given ε > 0, we can always construct a neighbourhood of stimulus such that C N k k (q) ∈ I k for the k-th chaotic neuron of NL. This is always possible because of the topological transitivity property of chaos defined in Section 2 and since the chaotic trajectory starting from initial value q is dense. The topological transitivity property guarantees the chaotic firing to reach the η neighbourhood (I k ) of stimulus in finite number of iterations (N k ) for the dense orbit starting from q. For any given ε, the following is true: = ε. Eq. (7) is true since because N i is the firing time for C i and the orbit is dense. Hence, the set of chaotic neurons {C i } that constitute the input layer of NL can always approximate the function f (n) with an ε error bound. This theorem holds true for NL constructed with chaotic neurons that satisfies the topological transitivity property and has a dense orbit. Further, having a single dense orbit implies uncountably infinite number of dense orbits. In other words, let q be the initial value corresponding to a dense orbit. We can do a back iteration on q infinite number of times to get a unbounded binary tree whose cardinality of paths is uncountably infinite. Any initial value chosen from the node of this unbounded binary tree will reach q after a finite number of iterations. Since the trajectory starting with q is a dense orbit (by assumption), hence the trajectory corresponding to any value in the node of this binary tree is also dense. Furthermore, all these are distinct dense orbits. An advantage of the above constructive proof is that we are guaranteed to satisfy UAT for discrete time functions of finite support (L) with a Neurochaos Learning architecture (NL) comprising of only one input layer with exactly L chaotic neurons. Contrast this with ANNs where it is not known what is the lower bound of the number of neurons needed for UAT. 1 An example of a dense orbit on the binary map is the real number that has a binary expansion given by '0.0 1 00 01 10 11 000 001.. . . ' [1] . The skew-tent map can be proved to have a dense orbit by the fact that there exists a conjugacy with the binary map. 2 For quantifying this approximation, we use the sum of absolute differences as the distance metric. In other words, for any two ChaosFEX feature extraction maps the data into a high dimensional space. The input data matrix with m data instances and n points per data instance (m × n) is mapped to m × 4n. We extract 4 features from the chaotic trajectory. In general, the input data matrix with size m × n is mapped to chaotic feature space where the new matrix dimension is m × kn, where k represents the number of distinct features extracted from the chaotic firing of the neurons in the input layer of NL. In ChaosNet [7] , k = 1 because only a single feature (Firing Rate) was extracted. Thus, chaos based feature engineering can be incorporated in either NL (as we have done in this work) or in conventional ANNs. The aim of Neuromorphic computing [21] research is to develop hardware and software architectures inspired from the structural and functional mechanisms of the brain. This involves exploiting the rich properties of biological neural networks such as robust learning, plasticity [22] and low computation power. NL allows to integrate neuronal models inspired from the biological neurons. The neurons in NL can be replaced by the Hindmarsh and Rose neuronal model [4] or adaptive exponential integrate-and-fire model [23] or other neuronal models that simulate the firing patterns of biological neurons. Such a combination of biological neuronal models and classical machine learning techniques directly contributes to Neuromorphic computing paradigms by aiding interpretability of the learning process. Deterministic chaos provides a rich variety of behaviours -periodic, quasi-periodic, eventually periodic and non-periodic orbits or trajectories, multiple co-existing attractors, fractal boundaries, various types of synchronization -all of which are being exploited in engineering applications. This adaptability of chaotic systems allows applications in several fields such as chaotic computing [24] , lossless compression and memory encoding [25, 26] , chaos based cryptography [27] , chaotic neural multiplexing [28] , chaotic neural networks [29] , fractal image compression [30] etc. Despite having a positive Lyapunov exponent, two chaotic systems still synchronizes under specific conditions [31] . The GLS neurons used in NL is proven to be useful in memory encoding [25] , lossless compression [32] , cryptography [33] , [34] and error control coding [35] . These rich properties of chaotic neurons employed in NL (possibly in conjunction with traditional ML architectures) makes it a very good competitor for ANNs. This section provides a detailed description of datasets used to evaluate the efficacy of ChaosFEX. We used synthetically generated as well as real wold datasets for our analysis. The real world dataset consists of genome sequences of SARS-CoV-2 and other coronaviruses. The synthetic data helps to visualize the non linear features extracted from ChaosFEX. We used synthetically generated OCCD data described in Subsection 2.4.2 of Section 2.4 for evaluating the efficacy of ChaosFEX. The train test distribution is provided in Table 5 . In the training set, there is a 32.8% overlap of class-0 data instances with class-1 data instances. This overlap makes the classification task challenging. The classification of SARS-CoV-2 from other coronaviruses is challenging because of its similarity across coronavirus family. We used the data provided by the authors of the paper titled "Classification and Specific Primer Design for Accurate Detection of SARS-CoV-2 Using Deep Learning" [36] , [37] . The authors extracted data from the 2019 Novel Coronavirus Resource repository (2019nCoVR) 3 . All available sequence under the query Neuclotide Completeness = "Complete" AND host = "homo sapiens" were downloaded by [36] . After removing the recurring sequences, a 533 unique sequences of different length (1,260 -31,029) base pair sequences were used for conducting experiments. The data details is provided in Table 6 . A five class classification problem is formulated with this dataset. The reasoning behind grouping of viruses into distinct classes is provided in [36] . ORFlab gene consists of two-thirds in Coronaviruses' genome sequence. This signifies the importance of classifying SARS-CoV-2 from similar viruses. In [36] , the authors downloaded the genome sequence corresponding to the following query: gene= "ORF1ab" AND host= "homo sapiens" AND "complete genome" [36] . The recurring sequences are removed and 45 data instances corresponding to SARS-CoV-2 are grouped as class-0 and the remaining data instances belonging to other viruses are grouped as class-1. Table 7 represents the data instances per class for binary classification problem. For the binary classification of SARS-CoV-2 genomes from SARS-CoV-1 genomes, a total of 4498 and 101 genome sequences respectively were obtained from multiple data repositories until early April 2020. 3930 SARS-CoV-2 sequences were obtained from GISAID, 407 from GenBank, and the remaining from Genome Warehouse, CNGBdb and NMDC databases through the China National Center for Bioinformation [38] . All SARS-CoV-1 sequences were obtained from GenBank. All sequences were chosen with the filters Nucleotide Completeness = "Complete" AND host = "homo sapiens". Accession IDs for all sequences as well as acknowledgement as provided in the GitHub repository 4 . This additional data is not used in [36] . The data instance per class is provided in Table 8 (Class-0: SARS-CoV-2 and Class-1: SARS-CoV-1) . This section deals with the set of experiments evaluated on OCCD and coronavirus genome sequences. We have used Python 3, LinearSVC [39] , Numba [40] , Numpy [41] and Scikit-learn [42] package for the implementation of ChaosFEX. We compare the performance of ChaosFEX with classical SVM in the low and high training sample regime. Decisions based on limited training samples is a challenging problem for ML algorithms. This learning paradigm with limited samples is referred as few shot learning. Few shot learning aims to develop ML models which generalizes from a small set of labeled training data. There has been previous research in few shot learning [43] , [44] . In this paper, we use ChaosFEX to learn from fewer training samples. Data preprocessing is the first step in any machine learning task. The following data preprocessing is carried out for genome sequence data. • Step 1: Conversion of nucleotide sequence into numeric format. We choose the same numeric conversion as mentioned in [36] . In this section we discuss the hyperparameters that were arrived at for NL (ChaosFEX+SVM) and SVM. The NL (ChaosFEX+SVM) has three hyperparameters: initial neural activity (q), discrimination threshold (b) and epsilon (ε). In order to find the best hyperparameters we do three fold crossvalidation. In the case of OCCD, the hyperparameter tuning is as follows: we fixed the initial neural activity q = 0.22, discrimination threshold b = 0.96, and varied ε from 0.01 to 0.2 with a step size of 0.001. In the three fold crossvalidation we get an average macro F1-score of 0.833 for an ε = 0.018. Figure 4a represents the results corresponding to hyperparameter tuning. We did five fold crossvalidation for the following hyperparameters of SVM-kernel = ['linear', 'rbf'], gamma = ['scale', 'auto']. The train and validation split (train(%), Val (%)) for fold 1, fold 2, fold 3, fold 4 and fold 5 were chosen to be approximately (80%, 20%). We used LinearSVC [39] for the implementation of SVM with linear kernel and LIBSVM [45] for the implementation of SVM with RBF kernel. We got a maximum average macro F1-score of 0.837 using SVM with RBF kernel for gamma ='scale'. We fixed the initial neural activity q as 0.34, discrimination threshold b as 0.499. For this q and b, we varied the epsilon ε from 0.18 to 0.19 with a step size of 0.0001. We did a three fold crossvalidation. In each of the three folds we took 66.6% data for training and 33% for validating. We got an average F1-score of 1, for ε in the range starting from 0.18 to 0.1839 with a step size of 0.0001 in the three fold validation. Out of these values, we choose ε = 0.183 for further analysis. Figure 4b shows the graph of macro averaged F1-score of threefold crossvalidation vs. ε. We did a five fold crossvalidation for the following hyperparameters of SVM: kernel = ['linear', 'rbf'], gamma = ['scale', 'auto']. The train and validation split (train(%), Val (%)) for fold 1, fold 2, fold 3, fold 4 and fold 5 were chosen to be approximately (80%, 20%). We got a maximum average macro F1-score of 1 using linear kernel. SVM with RBF kernel for gamma ='scale' also gave a maximum average macro F1-score of 1. We use kernel = 'linear' for further experiments. We performed two sets of experiments on the OCCD. The first experiment uses the train distribution provided in Table 5 , and evaluated the performance of ChaosFEX and SVM with RBF kernel on the test data. The second experiment deals with low training sample regime. In both these experiments we compare the performance of ChaosFEX with SVM (RBF kernel). The input data passed to ChaosFEX and SVM with RBF kernel are the same. The accuracy, macro averaged precision, recall and F1-score are provided in Table 9 . The classification metric definitions are provided in [46] . In this task, NL (ChaosFEX+SVM) slightly improved the performance when compared to the input data directly passed to SVM with RBF kernel. In the chaotic feature space, SVM with linear kernel was able to give a slightly improved performance. This empirically shows the effectiveness of chaotic feature space. The chaotic feature space (Firing time, Firing rate, Energy of the chaotic trajectory, Entropy of the symbolic sequence of the chaotic trajectory) can be seen as an efficient feature engineering technique. In the extracted feature space (Figure 5a, Figure 5b , Figure 5c and Figure 5d ) of train data, the data belonging to class-0 and class-1 are populated in different clusters. The data is not spread everywhere in the extracted feature space. There are multiple clusters of data belonging to class-0 and class-1 in the extracted feature space. In Figure 5a and Figure There is a direct correlation between firing time ( Figure 5a) and energy of the chaotic trajectory ( Figure 5c ). As firing time increases, the length of the chaotic trajectory also increases. As length increases the energy of the chaotic trajectory also increases. This is because energy is defined as the square of the l 2 -norm of chaotic trajectory. Hence firing time and energy of the chaotic trajectory are positively correlated. Similarly, the feature extracted space of Firing rate (Figure 5b) and Entropy of the symbolic sequence of the chaotic trajectory (Figure 5d ) have similar patterns. These four features together is yielding a macro averaged F1-score of 0.84 on test data. One shot learning or few shot learning is one of the challenging problems in machine learning. In many tasks a huge amount of training data will not be available. For example, during the time of an initial outbreak of a pandemic, decisions has to be made from fewer number of samples. In such a situation learning from limited samples plays a key role in controlling the spread of pandemic disease. We evaluate the efficacy of the proposed method in the low training sample regime. The hyperparameters used are the same for low training sample regime as well as high training sample regime (q = 0.22, b = 0.96, ε = 0.018). In the low training sample regime we did not consider 30% of data belonging to each of the four quadrants for both class-0 and class-1. The 30% removal of data instances is based on the length of the data. The length of each data point ( The topmost 30% of the data instances (in each quadrant) with highest lengths were removed. We did this in order to avoid too many overlapping data instances between class-0 and class-1 data instances. After removal of the 30% of the data instances from each quadrant, we get a reduced dataset which is used subsequently for training. The test dataset is unchanged (refer to Table 5 ). We did 200 random trials of training with 4, 20, 36, . . . , 724 (step size = 16) samples per class. These samples are drawn uniformly from the four quadrants. Figure 6a , 6b, 6c and 6d depict the training data for four cases in the low training sample regime. We then found macro F1-score for the test set provided in Table 5 . Since we did 200 random trails of training, we have 200 macro F1-scores for training with 4, 20, . . . , 724 samples per class. We computed the mean F1-score for the 200 random trials of training with 4, 20, . . . 724. Figure 7 and Figure 8 represents average macro F1-score vs. the number of training samples per class and standard deviation of F1-scores vs. number of training samples per class respectively. ChaosFEX consistently performs slightly better than SVM (RBF kernel) when the number of training samples per class is 36 or higher. The data corresponding to the classification of SARS-CoV-2 from other viruses are provided in Table 7 . Because the data was highly imbalanced, we carried out a five fold stratified crossvalidation on the binary classification (SARS-CoV-2 vs. Others) data. The train and validation split (train (%), Val (%)) for fold 1, fold 2, fold 3, fold 4 and fold 5 were chosen to be approximately (80%, 20%). The maximum length of the genome sequence is 31029. For each fold we calculated precision, recall and F1-score separately. Using this information we computed the macro averaged precision (Pr), recall (Re) and F1-score (F1) for that particular fold [46] . The results using ChaosFEX for the five fold crossvalidation is provided in Table 10 . Table 11 represents the five fold crossvalidation for the same data distribution for SVM with linear kernel (without ChaosFEX features). For both NL (ChaosFEX+SVM) and SVM with linear kernel, we get an average macro F1-score of 1.0 in five fold stratified crossvalidation. In the case of multiclass classification (Table 6) , we did a five fold stratified crossvalidation. The train and validation split (train (%), Val (%)) for fold 1, fold 2, fold 3, fold 4 and fold 5 were chosen to be approximately (80%, 20%). The maximum length of the genome sequence is 31029. In all the five folds, ChaosFEX slightly outperformed SVM with [47] . In our experiments, using the same hyperparameters ChaosFEX preserves the performance of previously learned tasks. This is an empirical evidence of the robustness of ChaosFEX to the catastrophic forgetting problem. The robustness of chaos based machine learning algorithm to catastrophic forgetting problem has been shown in [48] . The results for five fold crossvalidation for ChaosFEX and SVM with linear kernel (without ChaosFEX features) are provided in Table 12 and Table 13 . In the low training sample regime we used 1, 2, . . . , 6 samples per class. We did 200 random trials of training with 1, 2, . . . , 6 samples per class. These independent trials of training are tested on the remaining data. We then computed the average macro F1-score of the test data. Figure 9a and Figure 9b represents the average macro F1-score and In the low training sample regime ChaosFEX+SVM slightly outperforms SVM with linear kernel with 1, 2, . . . , 6 training samples per class. With just one training sample ChaosFEX gave an average macro F1-score > 0.90. There is a consistent increase in the performance of ChaosFEX for the multiclass classification problem in the low training sample regime. The 200 random trials of training ensures that this high performance is not due to overfitting. The dataset corresponding to the classification of SARS-CoV-2 and SARS-CoV-1 are provided in Table 8 . The number of genome sequences of SARS-CoV-2 is higher when compared to the genome sequence of SARS-CoV-2. The maximum length of the genome sequence is 30129. We did a five fold stratified crossvalidation on this data. The train and validation split (train (%), Val (%)) for fold 1, fold 2, fold 3, fold 4 and fold 5 were chosen to be approximately (80%, 20%). With the same hyperparameters, ChaosFEX (q = 0.34, b = 0.499, ε = 0.183) gave high performance in all the five folds. The results for the five fold validation for ChaosFEX and SVM with linear kernel (without ChaosFEX features) are provided in Table 14 and Table 13 . In the low training sample regime we used 1, 2, . . . , 20 samples per class. We did 1000 random trials of training with 1, 2, . . . , 20 samples per class. These independent trials of training are tested on the remaining data. Figure 10a and Figure 10b represents the average macro F1-score and the standard deviation of macro F1-scores of testdata for 1000 independent trials of training with 1, 2, . . . , 20 samples per class respectively. In the case of low training sample regime for SARS-CoV-2 vs. SARS-CoV-1, we observe a maximum average macro F1-score > 0.99 for training with one sample per class. As the number of training samples increases, the average F1-score shows a decreasingly increasing trend. The standard deviation of F1-scores as number of training samples increases shows an increasingly decreasing trend. ChaosFEX slightly outperforms SVM with linear kernel in the low training sample regime except for training with 2, 3, 4 and 5 samples per class. Also in the five fold cross validation for the same data (Table 14) we get an average macro F1-score of 1.0 using ChaosFEX. The low training sample regime highlights the requirement of only a single sample of SARS-CoV-2 and SARS-CoV-1 for classification for ChaosFEX. From [49] , SARS-CoV-2 and SARS-CoV-1 are genetically close to each other even though the SARS-CoV-2 is not a genetic descendent of SARS-CoV-1. However our experiments seems to indicate that the difference between the ChaosFEX features of the genomic sequences of the 2 viruses are significant enough that from very few observed sequences (few shot learning) ChaosFEX is able to generalize for efficient classification of larger sets of sequences from the 2 viruses. The combination of chaos and machine learning opens the possibility of developing brain inspired learning algorithms. In this work, we propose a Neurochaos Learning (NL) architecture which explicitly employs chaotic neurons (unlike traditional ANNs which has simple dumb neurons) and by combining chaos-based feature extraction with SVM-based classification, we demonstrate efficacy and robustness of such an approach. Our proof of the Universal Approximation Theorem (UAT) is enabled by two properties of chaos -topological transitivity and existence of a dense orbit. An important benefit of our proof is the explicit construction of NL with the exact number of neurons needed to approximate a discrete time real valued function with finite support to any desired accuracy. Such an equivalent is not available for ANNs to the best of our knowledge. Thus the benefit of using the rich features of chaos is evident in our work. In the experiments, we evaluated the performance of ChaosFEX both in low as well as high training sample regime for synthetically generated overlapping concentric circle data and coronavirus genome sequence data. In the case of classification of SARS-CoV-2 vs. SARS-CoV-1, ChaosFEX gave an average F1-score > 0.99 with just one training sample per class. This shows the robustness of the ChaosFEX features and its ability to generalize with very few training samples. The ChaosFEX features can be combined with any machine learning algorithm. The hyperparameters used in ChaosFEX preserves the performance of earlier learned tasks when more classes are added. This shows the efficacy of the proposed method to catastrophic forgetting problem. Combing ChaosFEX with Deep learning and Reinforcement learning algorithms are a future line of work. The code used for the classification of OCCD can be found here: https://github.com/HarikrishnanNB/occd_ experiments. The code used for classification of coronavirus genome sequence is available here: https://github. com/HarikrishnanNB/genome_classification. Accession IDs as well as acknowledgement of the genome sequence used in the classification of SARS-CoV-2 vs. SARS-CoV-1 is available in the GitHub repository: https://github.com/HarikrishnanNB/genome_classification/ tree/master/sequence_usage_acknowledgements. A first course in chaotic dynamical systems: theory and experiment Is there chaos in the brain? ii. experimental evidence and related models Chaotic itinerancy as a dynamical basis of hermeneutics in brain and mind A model of neuronal bursting using three coupled first order differential equations Is there chaos in the brain? i. concepts of nonlinear dynamics and methods of investigation A novel chaos theory inspired neuronal architecture Chaosnet: A chaos based artificial neural network architecture for classification Neurochaos inspired hybrid machine learning architecture for classification Deep learning for genomics: A concise overview Initial sequencing and analysis of the human genome An integrated encyclopedia of dna elements in the human genome Functional annotation of a full-length mouse cdna collection Integrative analysis of 111 reference human epigenomes Predicting the sequence specificities of dna-and rna-binding proteins by deep learning Convolutional neural network architectures for predicting dna-protein binding A deep learning framework for modeling structural features of rna-binding protein targets Deep gdashboard: Visualizing and understanding genomic sequences using deep neural networks Review on machine and deep learning models for the detection and prediction of coronavirus Severe acute respiratory syndrome coronavirus 2 (sars-cov-2) and corona virus disease-2019 (covid-19): the epidemic and the challenges Ergodic theory of numbers Neuromorphic systems: engineering silicon from neurobiology Neural plasticity in the ageing brain Adaptive exponential integrate-and-fire model as an effective description of neuronal activity Synthetic computation: chaos computing, logical stochastic resonance, and adaptive computing A novel compression based neuronal architecture for memory encoding Using chaotic artificial neural networks to model memory in the brain Chaos-based cryptography: a brief overview Neural signal multiplexing via compressed sensing Chaotic neural networks Fractal image compression: theory and application Synchronization in chaotic systems Arithmetic coding as a non-linear dynamical system Simultaneous arithmetic coding and encryption using chaotic maps Novel applications of chaos theory to coding and cryptography Using cantor sets for error detection Classification and specific primer design for accurate detection of sars-cov-2 using deep learning. bioRxiv Specific primer design for accurate detection of sars-cov-2 using deep learning The 2019 novel coronavirus resource. Yi chuan= Hereditas Liblinear: A library for large linear classification Numba: A llvm-based python jit compiler Array programming with NumPy Scikit-learn: Machine learning in Python Tadam: Task dependent adaptive metric for improved few-shot learning Few-shot image recognition by predicting parameters from activations Libsvm: A library for support vector machines A systematic analysis of performance measures for classification tasks. Information processing & management Overcoming catastrophic forgetting in neural networks Chaotic continual learning. 4-th Lifelong Learning Workshop at ICML Coronaviridae Study Group of the International et al. The species severe acute respiratory syndrome-related coronavirus: classifying 2019-ncov and naming it sars-cov-2 Harikrishnan N. B. thanks "The University of Trans-Disciplinary Health Sciences and Technology (TDU)" for permitting this research as part of the PhD programme. The authors gratefully acknowledge the financial support of Tata Trusts.