Topological constraints and robustness in Liquid State Machines Expert Systems with Applications 39 (2012) 1597–1606 Contents lists available at ScienceDirect Expert Systems with Applications j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a Topological constraints and robustness in liquid state machines Hananel Hazan ⇑, Larry M. Manevitz Department of Computer Science, University of Haifa, Mount Carmel, Haifa 31905, Israel a r t i c l e i n f o a b s t r a c t Keywords: Liquid State Machine Reservoir computing Small world topology Robustness Machine learning 0957-4174/$ - see front matter � 2011 Elsevier Ltd. A doi:10.1016/j.eswa.2011.06.052 ⇑ Corresponding author. Tel.: +972 4 8288337; fax: E-mail addresses: hhazan01@cs.haifa.ac.il (H. Ha (L.M. Manevitz). The Liquid State Machine (LSM) is a method of computing with temporal neurons, which can be used amongst other things for classifying intrinsically temporal data directly unlike standard artificial neural networks. It has also been put forward as a natural model of certain kinds of brain functions. There are two results in this paper: (1) We show that the Liquid State Machines as normally defined cannot serve as a natural model for brain function. This is because they are very vulnerable to failures in parts of the model. This result is in contrast to work by Maass et al. which showed that these models are robust to noise in the input data. (2) We show that specifying certain kinds of topological constraints (such as ‘‘small world assumption’’), which have been claimed are reasonably plausible biologically, can restore robustness in this sense to LSMs. � 2011 Elsevier Ltd. All rights reserved. 1. Introduction Processing in artificial neurons typically is a-temporal. This is because the underlying basic neuronal model, that of Pitts and McCulloch (1943) is a-temporal by nature. As a result, most appli- cations of artificial neural networks are related in one way or an- other to static pattern recognition. On the other hand, it has long been recognized in the brain science community that the McCul- lough–Pitts paradigm is inadequate. Various models of differing complexity have been promulgated to explain the temporal capa- bilities (amongst other things) of natural neurons and neuronal networks. However, during the last decade, computational scientists have begun to pay attention to this issue from the neurocomputation perspective as well, e.g. Fern and Sojakka (n.d.), Jaeger (2001a, 2001b, 2002), Lukosevicius and Jaeger (2009) and Maass, Natschläger, and Markram (2002a, 2002b, 2002d), and investiga- tions as to the computational capabilities of various models are being investigated. One such model, the Liquid State Machine (LSM) (see Fig. 1) (Maass et al., 2002a), has had substantial success recently. The Liquid State Machine is a somewhat different paradigm of compu- tation. It assumes that information is stored, not in ‘‘attractors’’ as is usually assumed in recurrent neural networks, but in the activity pattern of all the neurons which feed-back in a sufficiently recur- rent and inter-connected network. This information can then be recognized by any sufficiently strong classifier such as an Adaline ll rights reserved. +972 4 8288181. zan), manevitz@cs.haifa.ac.il (Widrow & Hoff, 1960), Back-Propagation, SVM1 or Tempotron (Gutig & Sompolinsky, 2006). (The name ‘‘liquid state’’ comes from the idea that the history of, e.g. timings of rocks thrown into a pond of water, is completely contained in the wave structure.) Moreover, the ‘‘persistence of the trace’’ (or as Maass put it, the ‘‘fading memory’’ (Lukosevicius & Jaeger, 2009)) allows one to recognize at a temporal distance the signal that was sent to the liquid; and sequence and timing effects of inputs. The Liquid State Machine is a recurrent neural network. In its usual format (Lukosevicius & Jaeger, 2009; Maass et al., 2002a), each neuron is a biologically inspired artificial neuron such as an ‘‘integrate and fire’’ (LIF) neuron or an ‘‘Izhikevich’’ style neuron (Izhikevich, 2003). The connections between neurons define the dynamical process, and the recurrence connections define what we call the ‘‘topology’’ in this paper. The properties of the artificial neurons, together with these recurrences, results in any sequence of history input being transformed into a spatio-temporal pattern activation of the liquid. The nomenclature comes from the fact that one can intuitively look at the network as if it was a ‘‘liquid’’ such as a pond of water, the stimuli are rocks thrown into the water, and the ripples on the pond are the spatio-temporal pattern. In the context of LSM the ‘‘detectors’’ are classifier systems that receive as input a state (or in large systems a sample of the ele- ments of the liquid) and are trained to recognize patterns that evolve from a given class of inputs. Thus a detector could be a SVM or an Adaline (Widrow & Hoff, 1960), perceptron (Pitts & McCulloch, 1943), or three level back propagation neural networks, etc. 1 SVM = support vector machine. http://dx.doi.org/10.1016/j.eswa.2011.06.052 mailto:hhazan01@cs.haifa.ac.il mailto:manevitz@cs.haifa.ac.il http://dx.doi.org/10.1016/j.eswa.2011.06.052 http://www.sciencedirect.com/science/journal/09574174 http://www.elsevier.com/locate/eswa Fig. 1. Liquid State Machine framework. 2 http://www.lsm.tugraz.at/csim/. 3 http://www.cri.haifa.ac.il/neurocomputation. 1598 H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606 The term detector is standard in the LSM community and date back to Maass et al. (Jaeger, 2001a; Lukosevicius & Jaeger, 2009; Maass, 2002; Maass & Markram, 2004; Maass et al., 2002b) the idea is that the ‘‘detectors’’ are testing whether the information for classification resides in the liquid; and thus are not required to be biological. In this way, it is theoretically possible for the detectors to recognize any spatio-temporal signal that has been fed into the liquid; and thus the system could be used for, e.g. speech recognition, or vision, etc. This is an exciting idea and, e.g. Maass and his colleagues have published a series of papers on it. Amongst other things, they have recently shown that once a detector has been sufficiently trained at any time frame, it is resilient to noise in the input data and thus it can be used successfully for generalization (Bassett & Bullmore, 2006; Fern & Sojakka, n.d.; Maass et al., 2002b). Furthermore, there is a claim that this abstraction is faithful to the potential capabilities of the natural neurons and thus is explan- atory to some extent from the viewpoint of computational brain science. Note that one of the underlying assumptions is that the detector works without memory; that is the detector should be able to classify based on instantaneous static information; i.e. by sampling the liquid at a specific time. That this is theoretically pos- sible is the result of looking at the dynamical system of the liquid and noting that it is sufficient to cause the divergence of the two classes in the space of activation. Note that the detector systems (e.g. a back propagation neural network, a perceptron or a support vector machine (SVM)) are not required to have any biological plausibility; either in their de- sign or in their training mechanism, since the model does not try to account for the way the information is used in nature. Despite this, since natural neurons exist in a biological and hence noisy environ- ment, for these models to be successful in this domain, they must be robust to various kinds of noise. As mentioned above, Maass et al. (Lukosevicius & Jaeger, 2009; Maass, Legenstein, & Markram, 2002; Maass et al., 2002b; Maass & Markram, 2004) addressed one dimension of this problem by showing that the systems are in fact robust to noise in the input. Thus small random shifts in a temporal input pattern will not affect the LSM’s ability to recognize the pattern. From a machine learning perspective, this means that the model is capable of generalization. However, there is another component to robustness; that of the components of the system itself. In this paper we report on experiments performed with various kinds of ‘‘damage’’ to the LSM and unfortunately have shown that the LSM with any of the above detectors is not resistant, in the sense that small damages to the LSM neurons reduce the trained classifiers dramatically, even to essentially random values (Hazan & Manevitz, 2010; Manevitz & Hazan, 2010). Seeking to correct this problem, we experimented with differ- ent architectures of the liquid. The essential need of the LSM is that there should be sufficient recurrent connections so that on the one hand, the network maintains the information in a signal, while on the other hand it separates different signals. The models typically used are random connections; or those random with a bias to- wards ‘‘nearby’’ connections. Our experiments with these topolo- gies show that the network is very sensitive to damage because the recurrent nature of the system causes substantial feedback. Taking this as a clue, we tried networks with ‘‘hub’’ or ‘‘small world’’ (Albert & Barabási, 2000; Barabási, 2000; Barabási & Albert, 1999) architecture. This architecture has been claimed (Achard, Salvador, Whitcher, Suckling, & Bullmore, 2006; Bassett & Bullmore, 2006; Varshney, Chen, Paniagua, Hall, & Chklovskii, 2011) to be ‘‘biologically feasible’’. The intuition was that the hub topology, on the one hand, inte- grates information from many locations and so is resilient to dam- age in some of them; and on the other hand, since such hubs follow a power rule distribution, they are rare enough that damage usu- ally does not affect them directly. This intuition was in fact borne out by our experiments. 2. Materials and methods We simulated the Liquid State Machine with 243 integrate and fire neurons (LIF) in the liquid following the exact set up of Maass and using the code available at the Maass laboratory software ‘‘A neural Circuit SIMulator’’.2 To test variants of topology we re- implemented the code, available at our website.3 The variants of the topologies implemented are described in the paper below as are the types of damages. Input to the liquid was at 30% of the neu- rons, the same input at all locations in a given time instances. The detectors of the basic networks were back propagation networks with three levels with 3 neurons in the hidden level and one output neuron. In most experiments, the input was given by the output of all non-input neurons of the liquid (i.e. 170 inputs to the detector). In some experiments (see section below) the inputs to the detector were given over 20 time instances and so the detector had 3400 http://www.lsm.tugraz.at/csim/ http://www.cri.haifa.ac.il/neurocomputation H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606 1599 inputs. The networks were tested with 20 random temporal binary sequences of length 45 chosen with uniform distribution. The experiments were repeated 500 times and statistics reported. 3. Theory/calculations As discussed in the introduction, in a system, there are two sources of potential instability. First is the issue of small variants in the input. Systems need to balance the need of separation with generalization. That is, on the one hand, one may need to separate inputs with small variations into separate treatment, but on the other hand, small variants may need to be treated as ‘‘noise’’ or generalization of the trained system. For the LSM, as is typically presented in the literature, it is understood, e.g. from the work of Lukosevicius and Jaeger (2009) and Maass (2002)) that the LSM and its variants do this successfully in the case of spatio-temporal signals. The second issue concerns that of the sensitivity of the system to small changes in the system itself, which we choose to call ‘‘damages’’ in this paper. This is very important if, as is the case for LSM, it is supposed to be explanatory for biological systems. Our experiments therefore are based on simulating the LSM with temporal sequences and calculating how resistant they are to two main kinds of such damages. The damages chosen for inves- tigation were: (1) at each time instance a certain percentage of neurons in the liquid would refuse to fire regardless of the internal charge in its state; (2) at each time instance a certain percentage of neurons would fire regardless of the internal charge, subject only to the limitation of the refractory period. Since the basic results (see below) showed that the standard variants of LSM were not robust to these damages at various small levels, we considered topological differences in the connectivity of the LSM. 3.1. First experiments: LSMs are not robust 3.1.1. The experiments To test the resistance of standard LSM to noise, we (i) down- loaded the code of Maass et al. from his laboratory site4 and then implemented two kinds of damage to the liquid and (ii) re-imple- mented the LSM code so that we could handle variants. These mod- els use a kind of basic neuron that is of the ‘‘leaky integrate and fire’’ (LIF)5 variety and in Maass’ work, the neurons are connected randomly but with some biologically inspired parameters: 20% inhibitory and a connectivity constraint giving a preference to geo- metrically nearby neurons over more remote ones. (For precise details on these parameters, see: neural Circuit SIMulator4 and Maass and Markram (2002).) External stimuli to the network were always sent to 30% of the neurons, always chosen to be excitatory neurons. Initially, we experimented with two parameters: (i) the percentage of neurons damaged; (ii) the kinds of damages. The kinds were either transforming a neuron into a ‘‘dead" neuron; i.e. one that never fires or transforming a neuron into a ‘‘generator’’ neuron, i.e. one which fires as often as its refractory period allows it, regardless of its input. We did experiments with different kinds of detectors: Adaline (Widrow & Hoff, 1960), Back-Propagation, SVM and Tempo- tron (Gutig & Sompolinsky, 2006). Classification of new data could then be done at any of the sig- nal points. We ran experiments as follows: we randomly chose twenty temporal inputs; i.e. random sequences of 0s and 1s of length 45, corresponding to spike inputs over a period of time; and trained an LSM composed of 243 integrate and fire neurons 4 A neural Circuit SIMulator: http://www.lsm.tugraz.at/csim/. 5 LIF = leaky integrate and fire. as in the liquid (Maass & Markram, 2002) to recognize ten of these inputs and reject the other ten. Each choice of architecture was run 500 times varying the precise connections randomly. We tested the robustness of the recognition ability of the network with the fol- lowing parameters: – The neurons in the network were either leaky integrate and fire neurons (Maass, 2002) or Izhikevich (Izhikevich, 2003) style neurons. – The average connectivity of the networks was maintained at about 20% chosen randomly in all cases although with different distributions. – The damages were either ‘‘generators", i.e. the neurons issued a spike whenever their refractory period allowed it; or they were ‘‘dead’’ neurons that could not spike. – The degree of damage was systematically checked at 0.1%, 0.5%, 1%, 5%, and 10% in randomly chosen neurons. The results shown in tables throughout the paper are in per- centages, over the (500) repeated tests. One hundred percent indi- cates that all the 20 vectors of one test, over 500 repetitions of the test were fully recognized correctly. Fifty percent indicates that only half the vectors over 500 times were recognized. (This corre- sponds to a chance baseline). The graphs presented below show the full distribution of all the tests and the results over all the kinds of damages and all varieties of topologies. As expected, they dis- tribute as Gaussian, but note that the average success rate varies from a baseline of 10 successes (50%) for random guessing (see Fig. 2) to as high as almost 20 (98%) for generalization in certain cases and 88% for some of the damages. 3.2. Second experiments: modifications of the LSM 3.2.1. Different kinds of basic neurons In attempts to restore the robustness to damage, we experi- mented with the possibility that a different kind of basic neuron might result in a more resilient network. Accordingly, we imple- mented the LSM with various variants of ‘‘leaky integrate and fire neurons’’, e.g. with history dependent refractory period (Manevitz & Marom, 2002) and by using the model of neurons due to Izhikevich (2003). The results under these variants were qualitatively the same as the standard integrate and fire neuron. (The Izhikevich model produces a much more dense activity in the network and thus the detector was harder to train but in the end the network was trainable and the results under damage were very similar.) Accordingly, we report only results with the standard integrate and fire neuron as appears, e.g. in Maass’ work (Maass, 2002). Fig. 2. Results of identification of random vectors on an untrained LSM with uniform random connections. This is a baseline. The result is a Gaussian distribution around 10 vectors. http://www.lsm.tugraz.at/csim/ 1600 H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606 3.2.2. Allowing detectors to have memory In trying to consider how to make the model more robust to damage, we investigated the fact that the detector has no memory. Perhaps, if we allow the detector to follow the development of the network for a substantial amount of time, both in training and run- ning, it would be more robust. To check this, we took the most ex- treme other case; we assumed that the detector system in fact takes as input a full time course of 20 iterations of the output neu- rons of the liquid. This means that instead of a NN with input of 170; we had one with 20 times 170 time course inputs. It seemed reasonable that (i) with so much information, it should be rela- tively easy to train the detector; (ii) one could hope that damage in the liquid would be local enough that over the time period, the detector could correct for it. In order to test this, we re-imple- mented the LSM detector to allow for this time entry. Our detector was trained and tested as follows. There were 170 output units. At a ‘‘signal point’’ each of them was sampled for the next 20 iterations and all of these values were used as a single data point to the detector. Thus the detector had 170 times 20 inputs. We chose separate detector points typically at intervals of 50. We then used back propagation on these data points. This means that eventually the detector could recognize the signal at any of Fig. 3. Histogram of connection distributions when the output connections were randomly selected chosen according to a power-law. Note that the input histogram is different than the output histogram. Fig. 4. Maass LSM (a) normal operation; (b) with 10% dead damage; (c) with 10% the ‘‘signal points’’; after training there was no particular impor- tance to the choice of separation of the signal points except that there was no overlap between the data points. While we did not control for any connections between the intervals of data points (i.e. 50, and we also checked other time intervals) and possible nat- ural oscillations in the network, we do not believe there were any. As anticipated, there was no significant trouble in training the net- work to even 100% of recognition of the training data. The ‘‘detectors’’ were three level neural networks, trained by back-propagation. We also did some experiments with the Tempo- tron (Gutig & Sompolinsky, 2006); and with a simple Adaline detector (Widrow & Hoff, 1960). Training for classification could be performed in the damage-less environment successfully with any of these detectors. Then we exhaustively ran tests on these possibilities. In all of these tests, following Maass (2002), Maass and Markram (2002) and Maass et al. (2002a), we assumed that approximately 20% of the neurons of the liquid were of the inhib- itory type. The architecture of the neural network detector was 204 input neurons (which were never taken from the neurons in the LSM which were also used as inputs to the LSM) 100 hidden level neurons and one neuron for the output. Results running the Maass et al. architecture are presented in Fig. 4 and Table 4 and can be compared with a random connected network of 10% average connectivity, see Table 2. The bottom line (see the results section) was that even with low amounts damage and under most kinds of connectivity, the net- works would fail; i.e. the trained but damaged network loss of function was very substantial and in many cases could not perform substantially differently from a random selection. 3.3. Third experiments: changing the architecture Our next approach, and ultimately the successful one, was to experiment with different architectures. The underlying intuition is that the recurrent nature of the liquid results in feedback of information making the network dynamics too sensitive to changes in the network. Since one can look at ‘‘damages’’ as noise. One can easily discern the large change in the reaction of the network. H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606 1601 instantaneous changes in the architecture, it seems reasonable to design architectures that can somehow ‘‘filter’’ out minor changes. The liquids were varied in their topologies in the following ways: 1. Random connectivity. Each neuron in the network is connected to 20% of the other neurons in a random fashion. (i) In the ori- ginal Maass topology the connections are chosen with a larger bias for nearby neurons (see Maass, 2002; Maass et al., 2002a; Maass, Natschläger, & Markram, 2002c). This is the literature standard and is what is usually meant as LSM. (ii) We also tested a network without such bias; i.e. the connections are chosen to 20% of the other neurons randomly and uniformly. The results presented below showed that these architectures are not robust. 2. Reducing the connectivity to 10% and 5% in the above arrange- ment. The intuition for this was that with lower connectivity, the feed-back should be reduced. The results presented below show that this intuition is faulty and that these networks are even less robust than the above (see Tables 1, 2, 5 and 6). 3. Implementation of ‘‘Hub’’ topologies in either input connectiv- ity or output. The intuition here is that the relative rarity of ‘‘hubs’’ results in their damage being a very rare event. But when they are not damaged, they receive information from many sources and can thus filter out the damage thus alleviat- ing the feedback in the input case. In the output hub case, the existence of many hubs should allow the individual neurons to filter out noise. The construction of hubs was done in various fashions: a. Hand design of a network with one hub for input. See Appendix A for a full description of this design. b. Small world topologies. Since small world topologies follow power law connectivity, they produce hubs. On the other hand such topologies are thought to emerge in a ‘‘natural’’ fashion (Albert & Barabási, 2000; Barabási, 2000; Barabási & Albert, 1999; Varshney et al., 2011) and appear in real neuronal systems (Albert & Barabási, 2000; Bassett & Bullmore, 2006), see Fig. 3. Note however, that in our context there are two directions to measure the power law: input and output connectivity histograms for the neurons. We checked the following variants: Table 1 Five percentage uniform random connectivity without memory input to the detector.a Damage Non 0.1% 0.5% 1% 5% 10% Dead neurons 100% 55% 53% 52% 51% 49% Noisy neurons 100% 63% 54% 55% 51% 50% Dead and noisy 100% 55% 52% 52% 50% 50% Generalization 100% 93% 88% 80% 75% 78% a For all the tables that are shown in this paper, 50% is the baseline of random classification. Table 2 Ten percentage uniform random connectivity without memory input to the detector. Damage Non 0.1% 0.5% 1% 5% 10% Dead neurons 100% 56% 53% 51% 51% 49% Noisy neurons 100% 73% 58% 54% 51% 52% Dead and noisy 100% 59% 54% 52% 52% 51% Generalization 100% 100% 93% 88% 83% 81% i. Input connectivity is power law. That is we assign a link from a uniformly randomly chosen neuron to a second neu- ron chosen randomly according to a power law. In this case the input connectivity follows a power law; while the output connectivity follows a Gaussian distribution. ii. Output connectivity is power law. That is we reverse the above. In this case the input connectivity is Gaussian while the output connectivity is power law. iii. Replacing ‘‘Gaussian’’ with ‘‘uniform’’ in case (i) above. iv. Replacing ‘‘Gaussian’’ with ‘‘uniform’’ in case (ii) above. v. We also tried choosing a symmetric network with power law connectivity (i.e. for both input and output.) Note that in this case, the same neurons served as ‘‘hubs’’ both for input and output. vi. Finally, we designed an algorithm to allow distinct input and output power law connectivity. In this case the hubs in the two directions are distinct. Algorithms 1 and 2 below accom- plish this task. Algorithm 1 Generate a random number between min and max value with Power law distribution, Input: min,max, size, How_many_numbers, counter Arry = array, Magnify = 5 for i = 1 to How_many_numbers index = random(array.start,array.end) end_array = array.end candidate = array[index] AddCells(array, Magnify); for t = 0 to Magnify array[end_array + t] = candidate end for shuffle(array) output_Array[i] = candidate counterArry[candidate]++ end for shuffle(counterArry) Output output_Array,counterArry Algorithm 2 Create the connectivity matrix for the liquid network using the Algorithm 1 as an Input weight_Matrix use algorithm 1 to creart (arraylist, counterArry) counter = 0 for i=1 to counterArry.lenght for t=1 to counterArry[i] weight_Matrix[i, arraylist[counter]]=true counter++ end for end for One problem with the various algorithms for designing power law connectivity is that under a ‘‘fair’’ sampling, the network might not be connected. This means that such a network actually has a lower, effective connectivity. We decided to eliminate this problem by randomly connecting the disconnected components (either from an input or output perspective) to another neuron chosen randomly but proportionally to the connectivity. (This does not guarantee connectivity of the graph, but makes it unlikely, so that the effective connectivity is not substantially affected.) Table 4 Twenty percentage connectivity under Maass’s distribution preferring local connections. Damage Non 0.1% 0.5% 1% 5% 10% Dead neurons 90% 60% 52% 51% 50% 50% Noisy neurons 90% 78% 57% 52% 52% 52% Dead and noisy 90% 54% 52% 53% 50% 50% Generalization 90% 96% 93% 93% 84% 84% Table 5 Five percentage uniform random connectivity with memory input to the detector. Damage Non 0.1% 0.5% 1% 5% 10% Dead neurons 100% 55% 53% 53% 51% 50% Noisy neurons 100% 63% 54% 54% 53% 51% Dead and noisy 100% 56% 53% 52% 51% 51% Generalization 100% 93% 87% 80% 75% 79% Table 6 Ten percentage uniform random connectivity with memory input to the detector.a Damage Non 0.1% 0.5% 1% 5% 10% Dead neurons 100% 58% 55% 53% 49% 50% Noisy neurons 100% 74% 59% 57% 54% 50% Dead and noisy 100% 61% 54% 55% 50% 50% Generalization 100% 96% 92% 85% 82% 82% a For all the tables that shown in this paper, 50% is the baseline of random classification. Table 7 Twenty percentage uniform random connectivity with memory input to the detector. Damage Non 0.1% 0.5% 1% 5% 10% Dead neurons 100% 63% 55% 52% 50% 50% Noisy neurons 100% 87% 67% 61% 54% 52% Dead and noisy 100% 68% 57% 52% 50% 49% Generalization 100% 98% 97% 95% 89% 86% Table 8 Maass’s distribution like in Table 4 but with memory input to the detectors. Damage Non 0.1% 0.5% 1% 5% 10% Dead neurons 100% 61% 53% 49% 49% 50% Noisy neurons 100% 79% 60% 55% 51% 49% Dead and noisy 100% 64% 55% 52% 51% 52% Generalization 100% 100% 96% 93% 84% 85% 1602 H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606 4. Results 4.1. First experiments: LSM is not robust First, there was not much difference between the detectors; so eventually we restricted ourselves to the back-propagation detector. (Note that none of units of the liquid input were accessed by the detectors were allowed to be input neurons of the liquid.) It turned out that while the detector is able to learn the randomly chosen test classes successfully, if there is sufficient average connectivity (e.g. 20%), almost any kind of damage caused the detector to have a very substantial decay in its detecting ability (see Table 3). Note that even with lower connectivity, which has less feedback, the same phenomenon occurs. See Table 1 (5% connectivity) and Table 2 (10% connectivity). When the network is connected randomly but with bias for geo- metric closeness as in Maass’ distribution, the network is still very sensitive (although a bit less so). Compare Table 4 to Table 3. After our later experiments, we returned to this point (see con- cluding remarks, below). In Fig. 4 we illustrate the difference in reaction of the network by a raster (ISI) display. Note that with 10% damage, it is quite evident to the eye that the network di- verges dramatically from the noise free situation. In Tables 1–4 one can see this as well with 5% noise for purely random connec- tivity. Actually, with low degrees of damage the detectors under even the Maass connectivity (see Table 4) show dramatic decay in recognition although not to the extremes of random connectiv- ity. These results (see Tables 1–4) were robust and repeatable under many trials and variants. Accordingly, we conclude that the LSM, either as purely defined with random connectivity, or, as implemented in Maass et al. (2002a) cannot serve as a biologically relevant model. 4.2. Second experiments: varying the neurons and allowing the detectors to have memory 4.2.1. Variants of neurons (history dependent refractory period and izhikevich) The results under these variants were qualitatively the same as the standard integrate and fire neuron. (The Izhikevich model pro- duces a much more dense activity in the network and thus the detector was harder to train but in the end the network was train- able and the results under damage were very similar.) Accordingly, we report only results with the standard integrate and fire neuron as appears, e.g. in Maass’ work. 4.2.2. Detectors with memory input The ‘‘detectors’’ in our experiments were either three level neu- ral networks, trained by back-propagation, the Tempotron (Gutig & Sompolinsky, 2006); or with a simple Adaline detector (Widrow & Hoff, 1960). Training for classification could be performed in the damage-less environment successfully with any of these detectors. We exhaustively ran tests on these possibilities; including damage degree and kinds and detector types. Tables 5–8 show the results with different uniform connectivity in the liquid when there is memory input to the detector. Table 8 Table 3 Twenty percentage uniform random connectivity without memory input to the detector. Damage Non 0.1% 0.5% 1% 5% 10% Dead neurons 99% 60% 53% 51% 51% 50% Noisy neurons 99% 86% 65% 58% 52% 50% Dead and noisy 99% 65% 55% 53% 50% 51% Generalization 99% 100% 97% 94% 87% 84% Fig. 5. Histographs of correctness results in LSM networks with 20 time interval input, different amounts of ‘‘dead’’ neuron damage, average connectivity of 20% with a uniform random distribution on the connections. Fig. 6. Histographs of correctness results in LSM networks with 20 time interval input, different amounts of ‘‘noise generator’’ neuron damage, average connectivity of 20% with a uniform random distribution on the connections. Fig. 7. Histographs of correctness results in LSM networks with one hub distribu- tion with different amounts of ‘‘noise generator’’ neuron damage. Fig. 8. Histographs of correctness results in LSM networks with different amounts of ‘‘dead’’ neuron damage with one hub distribution. H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606 1603 shows similar result (like in Table 4) for the Maass connectivity with memory input to the detector. Histographs of sample results with 5% and 10% damage for the neural network detectors are pre- sented in Figs. 5–13 . (Since the results for the other detectors were similar, we did not run as many tests on them) Note, Figs. 5–13 re- fer to the various kinds of hub architectures with the memory in the detector. In all of these tests, following Maass, we assumed that approx- imately 20% of the neurons of the liquid were of the inhibitory type. The architecture of the neural network detector was 204 in- put neurons (which were never taken from the neurons in the LSM which were also used as inputs to the LSM � 30 times) 3 hidden le- vel neurons and one neuron for the output. For 20% connections the Maass et al. architecture without memory in the detector as presented in Table 4 can be compared with a uniform random con- nected network of 20% average connectivity without memory in the detector in Table 3, and can be compared as well with the Maass topology with memory in Table 8 and with uniform random of 20% connectivity with memory in Table 7. Note that Table 1 can be also compared to Table 5 and Table 2 can be compared to Table 6. Since this paper is about robustness Figs. 5 and 6 present the full distribution of the experiments of Table 7 under these conditions with different degrees of damage. Note that with damage over 1%, the histogram deteriorates dramatically. The bottom line of all these comparisons is that decreasing con- nectivity and adding memory to the detector slightly increases the robustness performance with low amounts of damage, but even with low amounts of damage and under all our variants of random connectivity, the networks would fail. That is, the trained but dam- aged network loss of function was very substantial and in many cases could not perform substantially differently from a random classification (see Figs. 5 and 6). 4.3. Third experiments: varying the architecture of the network 4.3.1. Hand chosen one-hub topology Since the Maass et al. topology and the uniform random distri- bution topology showed a high level of vulnerability to any small amount of damage, and since adding memory to the detector helped only marginally to recover from damage in the liquid; we Table 9 One hub network with memory input to the detector. Damage Non 0.1% 0.5% 1% 5% 10% Dead neurons 100% 95% 88% 85% 76% 67% Noisy neurons 100% 97% 91% 86% 70% 62% Dead and noisy 100% 96% 89% 86% 75% 68% Generalization 100% 100% 97% 97% 96% 95% started to create different topologies to test the robustness of the liquid with the same parameters as those set by Maass et al. (see Jaeger, 2001a; Lukosevicius & Jaeger, 2009; Maass, 2002; Maass et al., 2002a, 2002c, 2002d; Natschläger, Maass, & Markram, 2002a; Natschläger, Markram, & Maass, 2002b). One of those topologies is the hub topology that is described in detail in Appen- dix A. In this case, one can see from Table 9, Figs. 7 and 8, that the robustness was substantially increased. However, under the con- struction as presented in Appendix A, there can appear substantial disconnected components in the liquid. Moreover, in results not presented in this paper, the signal has weaker persistence, i.e. detectors are able to recognize the signals in a substantially smal- ler time window. 4.3.2. Small world topologies The general connectivity in the human brain has been held to have some small world properties (Achard et al., 2006) Algorithm 1 is designed to obtain a hub topology using a more ‘‘natural’’ algo- rithm thus creating a topology that will be robust to damage in the liquid and to be more ‘‘natural’’ in its construction. One of the properties of the small world is the power-law distribution. The results of the small world with a power-law distribution (see Table 10, Figs. 9 and 10) were, however, very similar to the Maass topology and to the uniform random topology in the Table 10 Small world with a power-law distribution with memory input to the detector. Damage Non 0.1% 0.5% 1% 5% 10% Dead neurons 100% 55% 51% 51% 50% 51% Noisy neurons 100% 79% 58% 53% 50% 51% Dead and noisy 100% 58% 51% 50% 48% 50% Generalization 100% 100% 97% 93% 90% 89% Fig. 9. Histographs of correctness results in LSM networks with different amounts of ‘‘dead’’ neuron damage with small world topology obtained with a power law distribution. Fig. 10. Histographs of correctness results in LSM networks with different amounts of ‘‘noise generator’’ neuron damage for small world topology obtained with a power-law distribution. Table 11 Small world with a double power-law distribution with memory input to the detector. Damage Non 0.1% 0.5% 1% 5% 10% Dead neurons 96% 95% 87% 83% 74% 69% Noisy neurons 96% 99% 93% 88% 72% 64% Dead and noisy 96% 97% 89% 84% 70% 66% Generalization 96% 99% 99% 98% 97% 97% Table 12 Small world with a double power-law distribution without memory input to the detector. Damage Non 0.1% 0.5% 1% 5% 10% Dead neurons 62% 83% 67% 61% 56% 53% Noisy neurons 62% 91% 75% 66% 54% 55% Dead and noisy 62% 86% 69% 65% 52% 55% Generalization 62% 100% 96% 95% 93% 91% Fig. 12. Histographs of correctness results in LSM networks with different amounts of ‘‘dead’’ neuron damage with small world topology obtained with a double power law distribution. Fig. 13. Histographs of correctness results in LSM networks with different amounts of ‘‘noise generator’’ neuron damage for small world topology obtained with a double power-law distribution. 1604 H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606 robustness from damage in the liquid. On the other hand, they had improved generalization capability (see Table 10). Looking closer at the distribution, as can be seen from Fig. 3, Algorithm 1 actually creates a power-law distribution in terms of total connections, but when we separate the connections to input and output connections, we see that while the output has a power law distribution, the input connections have a roughly random uniform distribution. 4.3.3. Small world topologies with double power-law distribution Accordingly, using Algorithms 1 and 2 we created a double power-law distribution (using the reverse order for input connec- tions and output connections as in Fig. 11). The robustness and the generalization ability was much improved The best results were with a double-power law where the distributions are over distinct neurons and these are the results presented here in Tables 11 and 12 and Figs. 12 and 13 . Fig. 11. Connection distribution of small-world with double power-law. 5. Discussion In this work, we looked at the robustness of the LSM paradigm and by experimenting with temporal sequences showed that the basic structural set up in the literature is not robust to two kinds of damages; even at small levels of damages. We also investigated this for various degrees of connectivity. While lowering the average degree of connectivity resulted in decreased sensitivity in all architectures to some extent, the bot- tom line is that decreased connectivity is ineffective. In addition, it became evident that lowering the connectivity also decreases the strength the network has in representability and, importantly, Fig. 14. A graphical summary of the results presented in this paper. The ‘‘standard’’ LSM topologies either uniform or in Maass’s original papers are not robust; but small world topologies show an improvement, which is most marked in the case of a two-way power law distribution. Fig. A1. Hub topology. H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606 1605 in the persistence of the signal. (That is, a low degree of connectiv- ity causes the activity to die down quickly because of the lack of feedback. Thus the network is bounded in time and cannot recognize an ‘‘older’’ input signal.) Thus we see, as is to be expected from the analysis in Jaeger (2001a, 2001b, 2002) and Maass et al. (2002a) that a higher connectivity gives a larger set of ‘‘filters’’ that separate signals, but on the other hand makes it more sensitive to changes. In any case, even with low connectivities, the random topology was not robust; nor was the Maass topology. (While not at random levels of identification, as we have seen, e.g. in Tables 1 and 2 it suf- fered very substantial decays with even small amounts of damages. In addition, other experiments (not shown here) with connectivi- ties below 15–20%, show that the networks do not maintain the trace for very long.) We also investigated some variants in the kinds of neurons. It seems that the LSM (or ‘‘reservoir computing’’ concept) does not change much vis a vis robustness to internal noise based on these choices. We did see substantial improvement when supplying a window of time input to the detector rather than an instant of time. How- ever, alone this was not sufficient. The major affect was changing the topology of connectivity to accommodate the idea of hubs, power law and small world con- nectivity. Under these topologies, with the best result occurring when we have power law histogram of both input and output con- nectivity to the neurons with separate neurons as hubs in both directions, the liquids are robust to damages. 6. Conclusions We have shown experimentally that the basic LSM is not robust to ‘‘damages’’ in its underlying neurons and thus without elabora- tion cannot be seen to be a good fit for a model for biological com- putation. (We mention (data not shown here) that this result holds even if training is continued while the network is suffering damage.) However, choosing certain power law topologies of the connec- tivity can result in more robust maintenance of the pertinent infor- mation over time. A graphical summary of the results for robustness under different topologies is given in Fig. 14. In the papers (Bassett & Bullmore, 2006; Varshney et al., 2011), a distribution was chosen for biological reasons to allow preference for close neurons. This distribution is superior to the totally ran- dom one, but is still not sufficiently robust. Choosing a power law distribution and being careful to making the assignments dif- ferently for in and out connectivity proved to be the best. Since this is thought of as a potentially biological arrangement (Barabási & Albert, 1999; Bassett & Bullmore, 2006); LSM style networks with this additional topological constraint can, as of this date, be consid- ered sufficiently biological. Other distributions may also work. Acknowledgements We want to acknowledge the distinction and support provided by Sociedad Mexicana de Inteligencia Artificial (SMIA) and the 9th Mexican International Conference on Artificial Intelligence (MICAI- 2010) in order to enhance, improve, and publish this work. We thank the Caesarea Rothschild Institute for support of this research. The first author thanks Prof. Alek Vainstein for support in the form of a research fellowship. A short version of this work was presented in the MICAI-2010 meeting (Manevitz & Hazan, 2010) whom we thank for inviting us to write this extended ver- sion. We also thank the Maass laboratory for the public use of their code. Appendix A The architecture one hub was made as the following: 1606 H. Hazan, L.M. Manevitz / Expert Systems with Applications 39 (2012) 1597–1606 � Divide all the neurons (240) to groups; the size of each group is randomly chosen between 3 and 6 neurons in one group. Each neuron in the entire group connects to 2 of his neighbors in the same group. � Choose 1=4 of the groups to be hubs and the rest of the groups we will call them the base. � For 20% connection (that is 11,472 connections) 90% of the con- nections are from the base groups to the hub groups, 7% are from the hub group to base group and 3% are connections between the hub groups. To accomplish that: – Choose (10324 times) random neurons from the base groups and connect each one with a randomly neuron from a hub group. – Randomly choose (803 times) a randomly neuron and connect it to a randomly chosen neuron from the base neurons. – Connect (345 times) randomly one of the neurons from the hub neurons to anther neuron but from a different group (see Fig. A1). References Achard, S., Salvador, R., Whitcher, B., Suckling, J., & Bullmore, E. (2006). A Resilient, low-frequency, small-world human brain functional network with highly connected association cortical hubs. The Journal of Neuroscience, 26(1), 63–72. doi:10.1523/JNEUROSCI.3874-05.2006. Albert, R., & Barabási, A.-L. (2000). Topology of evolving networks: Local events and universality. Physical Review Letters, 85(24), 5234–5237. Retrieved from . Barabási, G. B. A.-L. (2000). Competition and multiscaling in evolving networks. cond-mat/0011029. Retrieved from . Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512. doi:10.1126/science.286.5439.509. Bassett, D. S., & Bullmore, E. (2006). Small-world brain networks. The Neuroscientist, 12(6), 512–523. doi:10.1177/1073858406293182. Fern, C., & Sojakka, S. (n.d.). Pattern recognition in a bucket. Retrieved from . Gutig, R., & Sompolinsky, H. (2006). The tempotron: A neuron that learns spike timing-based decisions. Nature Neuroscience, 9(3), 420–428. doi:10.1038/ nn1643. Hazan, H., & Manevitz, L. M. (2010). The liquid state machine is not robust to problems in its components but topological constraints can restore robustness. In IJCCI (ICFC-ICNC) (pp. 258–264). Izhikevich, E. M. (2003). Simple model of spiking neurons. IEEE Transactions on Neural Networks, 14(6), 1569–1572. doi:10.1109/TNN.2003.820440. Jaeger, H. (2001a). The ‘‘echo state’’ approach to analysing and training recurrent neural networks (No. GMD Report 148). German National Research Center for Information Technology. Retrieved from . Jaeger, H. (2001b). Short term memory in echo state networks (No. GMD Report 152). German National Research Center for Information Technology. Retrieved from . Jaeger, H. (2002). Adaptive nonlinear system identification with echo state networks. Retrieved from . Lukosevicius, M., & Jaeger, H. (2009). Reservoir computing approaches to recurrent neural network training. Computer Science Review, 3(3), 127–149. doi:10.1016/ j.cosrev.2009.03.005. Maass, W. (2002). Paradigms for computing with spiking neurons. In J. L. van Hemmen, J. D. Cowan, & E. Domany (Eds.). Models of neural networks. Early vision and attention (Vol. 4, pp. 373–402). New York: Springer. Maass, W., Legenstein, R. A., & Markram, H. (2002). A new approach towards vision suggested by biologically realistic neural microcircuit models. In Proceedings of the 2nd workshop on biologically motivated computer vision. Lecture notes in computer science. Springer. Retrieved from papers/lsm-vision-146.pdf. Maass, W., & Markram, H. (2002). Temporal integration in recurrent microcircuits. In M. A. Arbib (Ed.), The handbook of brain theory and neural networks (2nd ed.. Cambridge: MIT Press. Maass, W., & Markram, H. (2004). On the computational power of circuits of spiking neurons. Journal of Computer and System Sciences, 69(4), 593–616. doi:10.1016/ j.jcss.2004.04.001. Maass, W., Natschläger, T., & Markram, H. (2002a). Computational models for generic cortical microcircuits. In J. Feng (Ed.), Computational neuroscience: A comprehensive approach. CRC-Press. Retrieved from papers/lsm-feng-chapter- 149.pdf. Maass, W., Natschläger, T., & Markram, H. (2002b). Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural Computation, 14(11), 2531–2560. Retrieved from papers/lsm-nc-130.pdf. Maass, W., Natschläger, T., & Markram, H. (2002c). A model for real-time computation in generic neural microcircuits. In Proceedings of NIPS 2002 (Vol. 15, pp. 229–236). Retrieved from papers/lsm-nips-147.pdf Maass, W., Natschläger, T., Markram, H. (2002d). A fresh look at real-time computation in generic recurrent neural circuits, Tech. Report, Institute for Theoretical Computer Science, TU Graz, Graz, Austria. Manevitz, L., & Hazan, H. (2010). Stability and topology in reservoir computing. In G. Sidorov, A. Hernández Aguirre, & C. Reyes García (Eds.), Advances in soft computing. Lecture notes in computer science (Vol. 6438, pp. 245–256). Berlin/ Heidelberg: Springer. Retrieved from . Manevitz, L. M., & Marom, S. (2002). Modeling the process of rate selection in neuronal activity. Journal of Theoretical Biology, 216(3), 337–343. Retrieved from . Natschläger, T., Maass, W., & Markram, H. (2002). The ‘‘Liquid Computer’’: A novel strategy for real-time computing on time series. Special Issue on foundations of information processing of TELEMATIK, 8 (1), 39–43. Retrieved from papers/lsm- telematik.pdf. Natschläger, T., Markram, H., & Maass, W. (2002). Computer models and analysis tools for neural microcircuits. In R. Kötter (Ed.), A practical guide to neuroscience databases and associated tools. Boston: Kluwer Academic Publishers. Retrieved from papers/lsm-koetter-chapter-144.pdf. Pitts, W., & McCulloch, W. S. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biology, 52(1–2), 99–115. discussion 73-97. Varshney, L. R., Chen, B. L., Paniagua, E., Hall, D. H., & Chklovskii, D. B. (2011). Structural Properties of the Caenorhabditis elegans Neuronal Network. PLoS Computational Biology, 7(2), e1001066. doi:10.1371/journal.pcbi.1001066. Widrow, B., & Hoff, M. (1960). Adaptive switching circuits. 1960 {IRE} {WESCON} Convention Record, Part 4 (pp. 96–104). {IRE}. Retrieved from . http://dx.doi.org/10.1523/JNEUROSCI.3874-05.2006 http://www.ncbi.nlm.nih.gov/pubmed/11102229 http://www.ncbi.nlm.nih.gov/pubmed/11102229 http://arxiv.org/abs/cond-mat/0011029 http://dx.doi.org/10.1126/science.286.5439.509 http://dx.doi.org/10.1177/1073858406293182 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.97.3902 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.97.3902 http://dx.doi.org/10.1038/nn1643 http://dx.doi.org/10.1038/nn1643 http://dx.doi.org/10.1109/TNN.2003.820440 http://www.faculty.iu-bremen.de/hjaeger/pubs/EchoStatesTechRep.pdf http://www.faculty.iu-bremen.de/hjaeger/pubs/EchoStatesTechRep.pdf http://www.faculty.iu-bremen.de/hjaeger/pubs/STMEchoStatesTechRep.pdf http://www.faculty.iu-bremen.de/hjaeger/pubs/esn_NIPS02 http://dx.doi.org/10.1016/j.cosrev.2009.03.005 http://dx.doi.org/10.1016/j.cosrev.2009.03.005 http://dx.doi.org/10.1016/j.jcss.2004.04.001 http://dx.doi.org/10.1016/j.jcss.2004.04.001 http://dx.doi.org/10.1007/978-3-642-16773-7_21 http://dx.doi.org/10.1007/978-3-642-16773-7_21 http://www.ncbi.nlm.nih.gov/pubmed/12183122 http://dx.doi.org/10.1371/journal.pcbi.1001066 http://isl-www.stanford.edu/~widrow/papers/c1960adaptiveswitching.pdf http://isl-www.stanford.edu/~widrow/papers/c1960adaptiveswitching.pdf Topological constraints and robustness in liquid state machines 1 Introduction 2 Materials and methods 3 Theory/calculations 3.1 First experiments: LSMs are not robust 3.1.1 The experiments 3.2 Second experiments: modifications of the LSM 3.2.1 Different kinds of basic neurons 3.2.2 Allowing detectors to have memory 3.3 Third experiments: changing the architecture 4 Results 4.1 First experiments: LSM is not robust 4.2 Second experiments: varying the neurons and allowing the detectors to have memory 4.2.1 Variants of neurons (history dependent refractory period and izhikevich) 4.2.2 Detectors with memory input 4.3 Third experiments: varying the architecture of the network 4.3.1 Hand chosen one-hub topology 4.3.2 Small world topologies 4.3.3 Small world topologies with double power-law distribution 5 Discussion 6 Conclusions Acknowledgements Appendix A References