Information-preserving Transforms: Two Graph Metrics for Simulated Spiking Neural Networks Procedia Computer Science 20 ( 2013 ) 14 – 21 1877-0509 © 2013 The Authors. Published by Elsevier B.V. Selection and peer-review under responsibility of Missouri University of Science and Technology doi: 10.1016/j.procs.2013.09.232 ScienceDirect Available online at www.sciencedirect.com Complex Adaptive Systems, Publication 3 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 2013- Baltimore, MD Information-Preserving Transforms: Two Graph Metrics for Simulated Spiking Neural Networks Alexander M. Duda* and Stephen E. Levinson Department of Electrical and Computer Engineering, Beckman Institute, University of Illinois, Urbana, IL 61801, USA Abstract We are interested in self-organization and adaptation in intelligent systems that are robustly coupled with the real world. Such systems have a variety of sensory inputs that provide access to the richness, complexity, and noise of real-world signals. Specifically, the systems we design and implement are ab initio simulated spiking neural networks (SSNNs) with cellular resolution and complex network topologies that evolve according to spike-timing dependent plasticity (STDP). We desire to understand how external signals (like speech, vision, etc.) are encoded in the dynamics of such SSNNs. In particular, we are interested in identifying and confirming the extent to which various population-level measurements (or transforms) are information-preserving. Such transforms could be used as an unambiguous way of identifying the nature of the input signals, when given only access to the SSNN dynamics. Our primary objective in this paper is to empirically examine the extent to which a couple of graph metrics provide an information-preserving transform between the input signals and the output signals. In particular, we focus on the standard deviation of the time-varying distributions for local influence (weighted out-degree) and local impressionability (weighted in-degree), which provide insight into information encoding at the population-level in the dynamics of SSNNs. We report the encouraging results of an experiment carried out in the Language Acquisition and Robotics Group. Keywords: Ab Initio Cellular Models; Complex Networks; Graph Metrics; Information-Preserving; Multi-Scale Modeling; Neurorobotics; Nonlinear Dynamics; Real-World Coupling; Simulated Spiking Neural Networks; Spike-timing Dependent Plasticity; STDP Learning; Topological Adaptation 1. Introduction In the Language Acquisition and Robotics Group housed within the Beckman Institute for Advanced Science and Technology, we are designing computational models that enable a humanoid robot (see Fig.1) to learn natural language as a child does-- by interacting with people and the real-world. We embrace the perspective that such learning is based on the existence of a robust multi-sensory associative memory that serves as the cognitive architecture for an embodied agent with access to rich, real-world sensory streams [1]. Traditionally, we developed abstract statistical models of associative memory [2,3,18]. Though they exhibited encouraging results, they certainly * Corresponding author. E-mail address: amduda@illinois.edu Available online at www.sciencedirect.com © 2013 The Authors. Published by Elsevier B.V. Selection and peer-review under responsibility of Missouri University of Science and Technology Open access under CC BY-NC-ND license. Open access under CC BY-NC-ND license. http://creativecommons.org/licenses/by-nc-nd/3.0/ http://creativecommons.org/licenses/by-nc-nd/3.0/ 15 Alexander M. Duda and Stephen E. Levinson / Procedia Computer Science 20 ( 2013 ) 14 – 21 did not attain the qualitative or quantitative success of the current best natural example we have: the neocortex. Thus, given the state-of-the-art in large-scale simulation tools and experimental neuroscience, we thought it was the right time to begin investigating, from the beginning, an answer to the question ``How is the neocortex so successful at information-processing, especially considering that neuron-neuron communication is phenomenally unreliable?" In the past, numerous people have considered this question in various forms (like Von Neumann in his work on synthesizing reliable systems from unreliable components [4]), as well as contemporary computational neuroscientists. However, we are specifically interested in identifying information-bearing signals (as opposed to regulatory signals) in the statistical sense at the population-level [5,6]. We assert that such a signal could be used to ` -order model with respect to information, and serve as the basis for an associative memory. In this paper we examine two candidate information-bearing signals, which could be used for constructing such a reduced-order model. Section 2 discusses the details of our SSNN design, providing context and motivation for our decisions, as well as particularly outlining the equations and parameter values used to achieve the desired behavior; Section 3 describes the experiment; Section 4 provides the results of the experiment, while Section 5 outlines future work in the short- term and long-term. Fig. 1 Bert, the iCub humanoid robot: a state-of-the-art platform for research in embodied cognition [7]. 2. Simulated Spiking Neural Network Design Of all the natural and man-made systems that exhibit any level of associativity, the human neocortex is currently the most successful. Thus, examining the way the neocortex accomplishes such a task could provide deep insight into how we, as engineers, could create machines and models with similar capabilities. The neocortex is extremely complex and has structures that carry out various processing functions across a range of spatial and temporal scales. Of course, there is interaction and interdependence between and among the phenomena that occur at these different scales. Therefore, it raises the question-- on which scale should one focus? Research completed in the past two decades has shown that a spiking neural network (SNN) has an extremely wide range of computational abilities. In particular, it has been displayed that to the extent that a given population carries out a specialized function it is just a reflection of the particular types of signals that have been fed to the population [8]. In other words, a population can be viewed as a general-purpose computational-unit that can be adapted to carry out a variety of functions, presumably including multi-sensory integration and associativity. Thus, it is reasonable to think that constructing a realistic, canonical, unspecialized SSNN is a wise first step. To create such a SSNN, one needs to identify a variety of important features that should be included. For our purposes in this paper, we have 16 Alexander M. Duda and Stephen E. Levinson / Procedia Computer Science 20 ( 2013 ) 14 – 21 omitted a few important features (that we plan on including in future work, see Section 5). We now detail the important features that are included in our SSNN. 2.1 Hodgkin-Huxley Neuron Model The spiking neuron model used is the classic Hodgkin-Huxley (HH) model of a single neuron [9] in the form described by Koch [10]. What makes the HH neuron model most useful is the wide range of nonlinear behaviors observed from various input. For a detailed summary of these behaviors (tonic spiking, resonator, integrator, etc.) and a comparison of the most widely-used neuron models (including the integrate-and-fire, resonate-and-fire, Izhikevich, and FitzHugh-Nagumo), which justify its use as a biophysically meaningful neuron model, see Izhikevich [11]. We assume that the axon carries three primary currents: IK (voltage-gated persistent K+), INa (voltage-gated transient Na+), and IL (Ohmic leak). With C as the membrane capacitance and Iapp as the applied current, the primary set of space-clamped HH equations in standard form are as follows: apprestLKKNaNa IVVGVEnGVEhmGdt dV C 43 (1) where a gating variable nhmg ,, evolves according to the following equation: gVgV dt dg gg 1 (2) and the rate constants are defined as follows: 110 25 10/25 Vm e V V 18/4 Vm eV (3,4) 20/07.0 Vh eV 1 1 10/30 Vh e V (5,6) 1100 10 10/10 Vn e V V 80/125.0 Vn eV (7,8) To produce standard spiking behaviour for a patch model [13], the parameter values are set as follows: C - 14[F], GNa = 0.12 -8[S], ENa = 0.05[V], GK -8 [S], EK = -0.077[V], GL -12 [S],Vrest = - 0.065[V], and Vinitial = -72.655×10 -3[V]. 2.2 Synapse Model A NeuroXyce [12] synapse device has been implemented [13]. However, the model could be implemented in a number of simulation environments. With conductance-based synaptic dynamics, plasticity determined by a spike- timing dependent model developed by Clopath-Gerstner, and a reliability parameter (which in our experiments was not activated and will not be discussed in detail here) that modulates the transmission of synaptic current as well as the learning process, the device captures a number of important experimentally-observed features that will be useful in state-of-the-art research on the dynamics of large populations of spiking neurons and learning. The details of these different features are discussed below. For reference, Fig. 2 shows a presynaptic neuron, A, with membrane voltage A.V and a postsynaptic neuron, B, with membrane voltage B.V. The synapse is represented by a directed edge from A to B; the postsynaptic current is denoted by Ipost . 17 Alexander M. Duda and Stephen E. Levinson / Procedia Computer Science 20 ( 2013 ) 14 – 21 The model used is based on NEURON simulator s Exp2Syn mechanism [14, 15]. With w representing the Clopath- Gerstner plasticity scheme outlined below (set to 1 in the case where no learning occurs), B.V the momentary postsynaptic voltage, and ErevE the reversal potential (set to -85x10 -3[V]), the postsynaptic current is the followingyy : revpost ErVBwgI .max (9) where maxg , the maximal conductance, is defined as follows: risedecay norm tt f ng expexpmax (10) where fnormff is a normalizing factor that ensures the peak is 1, is the rise time (set to 2x10 -4[s]) and decay is the decay time (set to 1x10-2 [s]) (making sure that decay rise). We have adapted the Clopath-Gerstner model [16, 17] to be used in a real-time fashion within a NeuroXyce circuit device that interacts with the Hodgkin-Huxley spiking neuron device previously described. This well-known phenomenological model captures a number of the important experimentally observed behaviors of plasticity in synapses. Additionally, it is easily tunable to exhibit a variety of STDP curves. The curve we want to generate is shown in Fig. 2. With the following notation: S (voltage threshold for a spike event), R (voltage value for a resting event), w (weight/strength of synapse), A.V (momentary presynaptic membrane voltage), VL3VV (a low-pass filtered version (LPF) of A.V 3), B.V (momentary postsynaptic membrane voltage), VL1VV (a low-pass filtered version (LPF) of B.V 1), VL2VV (a low-pass filtered version (LPF) of B.V 2), and the Boolean operator on variables x1 and x2 defined such that (x1 > x2)=1 if x1 > x2 and 0 otherwise, the modified Clopath-Gerstner equation that updates the synaptic weight is as follows: maxmin wmwwmwdt dw dt dw dt dw C LTP wLTDw (11) Fig. 2 The left figure shows a single directed synapse going from the presynaptic neuron, A, with membrane voltage A.V to the postsynaptic neuron, B, with membrane voltage B.V. The postsynaptic current (the current that will flow from A to B) is denoted by Ipost. The right figure the change in synaptic weight, , along the vertical axis as a function of the timing between presynaptic and postsynaptic spikes (shown along the horizontal axis in milliseconds). When the postsynaptic neuron spikes before the presynaptic neuron, long-term depression (LTD) will result, the greatest decrease occurring when the timing is close to zero. When the presynaptic neuron spikes before the postsynaptic neuron, long-term potentiation (LTP) will result, the greatest increase occurring when the timing is close to zero. where the changes in w due to long-term depression (LTD) and long-term potentiation (LTP) are: A.V B.V Ipost 18 Alexander M. Duda and Stephen E. Levinson / Procedia Computer Science 20 ( 2013 ) 14 – 21 RVRVSVAA dt dw LLLTD LTD 11. (12) RVRVSVBSVBA dt dw LLLTP LTP 22.. (13) while the changes in LPF voltages are: 1 11 . LL VVB dt dV , 2 22 . LL VVB dt dV , 3 33 . LL VSVA dt dV (14, 15, 16) The parameters are set as follows (note that some parameter values are different from the Clopath-Gerstner papers; this was necessary to obtain the desired behaviour as shown in Fig. 2; parameter tuning was completed via simulation): S = -45.3×10-3[V], R = -72.655×10-3[V], wmin = 0, wmax = 1.6, winitial = 1, ALTD = 5×10 -2 [V-1], ALTP = 8.5 [V-2], 1 = 23×10 -3[s], 2 = 7×10 -3[s], and 3 = 46×10 -3[s]. The interested reader is encouraged to see the NeuroXyce Netlists to run basic simulations [13]. Fig. 3 Visualization of initialized SSNN with 100 excitatory Neurons (represented by blue vertices) and 1991 directed synapses (represented by directed edges). Each vertex has interior coloring of blue to represent excitatory neurons; the brightness of blue indicates the local -degree) of a given vertex (brighter means greater impressionability). The radius of each vertex is proportional -degree) of a given vertex (larger means greater influence). There are ten neurons in the SSNN that receive external input signals; these are denoted by a yellow border around the vertex and form the left-most column. There is a bold line on the originating side of a directed edge. 2.3 Network The network consists of 100 excitatory HH neuron models, as detailed in Section 2.1, 1991 directed synapses using the model detailed in Section 2.2, the existence of which were determined probabilistically during initialization. Fig. 3 provides a visualization of the SSNN, as well as a description. The neurons were placed on a grid, to simulate a neural sheet. Therefore, the distance between neurons dictated the time it would take for a signal to travel along a synapse. A minimum delay scaling factor of 0.3 [ms] was chosen to ensure that persistent activity would result only most likely when the input neurons were driven at a high frequency. This scaling factor was multiplied by a distance 19 Alexander M. Duda and Stephen E. Levinson / Procedia Computer Science 20 ( 2013 ) 14 – 21 value, with noise, to ensure that distances were more natural. For instance, a given pair of neurons separated by a distance of x on the grid, would have a delay defined (x+uniform(-0.20x,0.20x))*(0.3 [ms]). 3. Experiment description Each neuron has a number of synapses for which it serves as the presynaptic neuron; each synapse has a strength, w; the sum of all such weights is known as the weighted out-degree of the neuron and can be thought of as a measure of the local influence of the neuron. Likewise, each neuron has a number of synapses for which it serves as the postsynaptic neuron; the sum of all such weights for these synapses is known as the weighted in-degree of the neuron and can be thought of as a measure of the local impressionability of the neuron. Examining the distribution of these weighted degrees across all neurons in the SSNN can give insight into global behavior. In the experiment, ten neurons were selected for receiving an externally applied input (denoted by yellow borders in Fig. 3). These neurons were partitioned into two groups (5 neurons that were c below) and 5 neurons that were considered postsynaptic (terme would be applied to the neurons of one group; an identical pulse train (though shifted in time) would be applied to the neurons of the other group. The pulse train used across trials had amplitude 0.1825 width 1 [ms], and period 400 [ms]; only the relative timing between the trains applied were adjusted. The inputs provided were as follows (where post-pre means post-before-pre pre-post means pre-before-post ): post-pre 80 [ms], post-pre 48 [ms], post-pre 16 [ms], pre-post 16 [ms], pre-post 48 [ms], pre-post 80 [ms] (as indicated in the legend of Fig. 4). A greater number of inputs would be desirable. However, due to the cellular resolution of the SSNN, the computations are very taxing in terms of processing and memory; thus, only a limited number of inputs were explored during this experiment. For each trial, the pulse train was applied for a period of 6.5 [s], which is long enough for the neurons to exhibit 16 spike pairs. At each time step of the simulation, the standard deviation of the distribution for the weighted in-degree, IN, and the distribution for the weighted out-degree, OUT, was computed. This measurement is a population-level, global metric, that is, a candidate information-bearing signal (a.k.a. information-preserving transform). By running the experiment and examining the results, we could determine the extent to which these transforms are, in actuality, information-preserving. 4. Results Fig. 4 The plot shown above summarizes the results of the experiment. In particular, for each of the six different input signals, the standard deviation of the distribution for weighted in-degree and weighted out-degree was computed at each time step. See body of the text for a detailed description of these results. 20 Alexander M. Duda and Stephen E. Levinson / Procedia Computer Science 20 ( 2013 ) 14 – 21 For each of the different time series, it is apparent that the standard deviation is time-varying, increases (or does not change) in response to each applied pulse, and shows a clear differentiation in terms of the paths taken and the final values assumed (they appear to be diverging). For OUT, post-pre 16 [ms] shows the largest increase, which corresponds to the timing that would cause the greatest LTD to synaptic strength, followed by pre-post 16 [ms]; this relative ordering also is reasonable given the synaptic curve we defined, which allows for a maximum LTD (from initial value of w = 1) of -1, while the maximum LTP (from initial value of w = 1) is 0.6. The others seem to follow a similarly intuitive ordering, with the exception of pre-post 80 [ms], which appears to break the trend slightly by being greater than the pre-post 48 [ms]; though, clearly if the simulation had gone on longer, it appears the The standard deviation of the distribution for local influence is highly correlated with the input signal. For IN, as before, the largest increases were those that received pulse trains with the tightest relative timing. Also, the ordering of the other time series follows a similar trend, though the spread is narrower. The standard deviation of the distribution for local impressionability is correlated with the input signal, but it appears that distinguishing them is a bit more challenging than the previous case. The metrics considered seem to be good candidates for information-bearing signals, though OUT seems to be more effective in that the measured time series diverge, which makes identifying the corresponding input signal easier. The result matches our hypothesis well. Though, it remains to be seen how robust such correlations will remain when more complex input signals are provided (speech, video, etc.) that have noise, as well as when the model itself is scaled up to include multiple SSNNs [5] that exhibit stochastic neuron-neuron communication [13]. However, we are encouraged by the result and its implications for information-processing in SSNNs, as well as the ability to create reduced-order and functionally equivalent models of SSNNs in the not-too-distant future. 5. Future Work As alluded to earlier in the paper, this experiment is just a small part of a larger project being carried out in the Language Acquisition and Robotics Group. We are designing a multi-sensory associative memory based on the dynamics of SSNNs [5]. In the short-term we will be including all the important identified features into a given SSNN (a more realistic spatial location of neurons which could be captured by a longer minimum delay, a probability of connection (for initialization) that drops off as a function of distance, the existence of inhibitory neurons, as well as a reliability parameter that modulates neuron-neuron communication and learning). We hope to explore SSNNs that are tuned to exhibit more persistent activity and consist of many more neurons/synapses; this will require scaling up the model to a more powerful computing cluster. Furthermore, we will be using such canonical SSNNs as a computational unit within a larger simulated neural substrate; there will be a single SSNN for each sensory stream (vision, speech), as well as an SSNN for sensory integration. With such a simulated neural substrate implemented, we will conduct a number of speech and vision binding experiments with Bert, our iCub humanoid robot, and determine the extent to which metrics like those discussed in this paper prove useful in creating a reduced-order model (with respect to information) that could serve as the basis for an associative memory. In the long-term we will make the simulated neural substrate even larger and allow other sensory streams to be included (like signals from motors, haptic sensors, and direct feedback from individuals interacting with Bert that could provide reinforcement signals by way of modulating the synaptic learning curve (which would be similar to modulating neurotransmitter release)). We are confident that such experiments will help to provide an empirical neurorobotic platform that will enable new insight into the problems of intelligence and the design of neocortically- inspired intelligent machines. Acknowledgements Supported by the Laboratory Directed Research and Development Program at Sandia National Laboratories under LDRD #12-1058 and LDRD #151345. Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. The authors would also like to thank the anonymous reviewers of this paper, for their helpful suggestions. 21 Alexander M. Duda and Stephen E. Levinson / Procedia Computer Science 20 ( 2013 ) 14 – 21 References 1. S. E. Levinson, Mathematical Models for Speech Technology, West Sussex, England: Wiley, (2005) 230. 2. M. McClain and S. Levinson, Semantic-based learning of syntax in an autonomous robot, I. J. Hum. Robot., vol. 4, no. 2 (2007) 321. 3. L. Niehaus and S. E. Levinson, Online learning and integ. of complex action and word lexicons for lang. ground., Proc. IEEE ICDL (2012). 4 J. von Neumann, Probabilistic logics and the synthesis of reliable organisms from unreliable components, Godfrey version, 2010. 5. A. M. Duda and S. E. Levinson, Nonlinear dynamical multi-scale model of associative memory, Proc. IEEE ICMLA (2010) 867. 6. A. M. Duda and S. E. Levinson, Complex networks of spiking neurons: Collective behavior characterization, Proc. ICCS (2011) 1627. 7. G. Metta, G. Sandini, D. Vernon, L. Natale, and F. Nori, The icub humanoid robot: an open platform for research in embodied cognition, Proc. of 8th Workshop on Performance Metrics for Intelligent Systems, ACM (2008) 50. 8. J. M. Schwartz and S. Begley, The Mind & The Brain: Neuroplasticity and the Power of Mental Force. New York, NY, USA: HarperCollins, 2002. 9. A. L. Hodgkin and A. Huxley, A quantitative description of membrane current and its application to conduction and excitation in nerve," J. Physiol., vol. 117 (1952) 500. 10. C. Koch, Biophysics of Computation: Information Processing in Single Neurons. New York, N.Y.: Oxford University Press (1999) 142. 11. E. M. Izhikevich, Which model to use for cortical spiking neurons? IEEE Trans. on Neur. Nets, vol. 15, no. 5 (2004) 1063. 12. C. Warrender, J. Aimone, C. Teeter, and R. Schiek, Neuroxyce: a highly parallelized simulator for biologically realistic neural networks, Neuroinformatics Meeting (2012). 13. A. M. Duda. NeuroXyce Synapse Device with STDP and Stochastic Transmission Reliability. Sandia National Laboratories (2012). 1 euron/neuron/mech.html#Exp2Syn. 1e76f9aa2/src/nrnoc/exp2syn.mod. 16. C. Clopath, L. Busing, E. Vasilaki, and W. Gerstner, Connectivity reflects coding: a model of voltage-based stdp with homeostasis, Nat Neurosci, vol. 13, no. 3 (2010) 344. 17. C. Clopath and W. Gerstner, Voltage and spike timing interact in stdp - unified model, Front. in Syn. Neuro., vol. 2, no. 00025 (2010). 18. K.M. Squire and S.E. Levinson, HMM-Based Concept Learning for a Mobile Robot, IEEE Trans. on Ev. Comp., vol.11, no.2 (2007) 199.