Neural network-expert system models of relative-depth perception Behavior Research Methods, Instruments, & Computers 1995,27 (2),173-177 Neural network-expert system models of relative-depth perception STEPHEN J. GOTTS and FREDERICKJ. BREMNER Trinity University, San Antonio, Texas Degree of binocular, horizontal disparity was used by two hybrid neural network/expert system computer models to make relative-depth judgments for pairs of stimulus points. These judgments were then correlated with the actual depth relationships of the points. Results from Simulation 1 showed that horizontal disparity could be computed by the shift in activated cortical hypercolumns evoked by a particular stimulus, and that, in general, multiple disparities could be compared to make accurate judgments about relative depth. However, these results also indicated that stimuli toward the periphery of the visual field were inaccurately perceived as being more distant. Simulation 2 cor- rected for this inaccuracy by appropriately weighting a stimulus point's disparity value as a function of its horizontal position in the visual field. Our ability to perceive and judge depth stems largely from the horizontally differing perspectives of two eyes; each perspective provides a slightly shifted view of the world. Many theorists (e.g., Barlow, Blakemore, & Petti- grew, 1967; Hubel & Wiesel, 1962; Marr, 1982) have been interested in what the process of seeing with two eyes, known as stereopsis, entails and exactly how it al- lows us to perceive depth. STEREOPSIS Two distinct explanations of binocular depth percep- tion evolved in psychophysical circles as early as 1850. Whereas David Brewster advocated a convergence the- ory, Charles Wheatstone supported a disparity theory (Gulick & Lawson, 1976). Brewster argued that the suc- cessive convergence of the optic axes on two points of discrepant distance is the process that we use to perceive depth. Wheatstone, on the other hand, believed that the difference between the images formed on the retinae of the two eyes results in the perception of depth. Although these two arguments seem to force disparity and conver- gence into a state of mutual exclusiveness, it is likely that both concepts function in concert to provide depth cues (Foley, 1980). The optic axes of convergence must cross at some point in the visual field, forming a depth refer- ence area known as the horopter (Ogle, 1964). Relative depths of visual stimuli positioned off the horopter are probably judged by comparing horizontal disparities. Two simulations were designed using a hybrid neural network-expert system approach in order to show how convergence and horizontal disparity might work to- gether to provide cues that are helpful in making relative- Correspondence should be addressed to S. 1. Gotts, 2555 NE Loop 410 #1213, San Antonio, Texas 78217 (e-mail: sgotts@trinity.edu). depth judgments. In both simulations, the optic angles of convergence were held theoretically fixed. Stimuli were then chosen from the set of points located off the hor- opter, making it possible to assess the accuracy ofjudg- ments based solely upon horizontal disparity. SIMULATION 1 While it is clear that single cells in the visual cortex are disparity selective (Barlow et al., 1967), it is still un- clear exactly how this information is used by the brain at later stages of computation. Several theorists (DeAnge- lis, Ohzawa, & Freeman, 1991; Guillemot et al., 1993; Nomura, 1994) have proposed models suggesting uses of the cues provided by these binocular neurons. The hy- pothesis underlying our neural network-expert system model is that the cortex, in order to make relative-depth judgments, performs the equivalent of a difference cal- culation, given the horizontal shift in activated cortical hypercolumns (Hubel & Wiesel, 1962) due to stereopsis. Although this strategy is an obvious departure from the established view, it provides a computationally simple mechanism inspired by the basic topographic arrange- ment of the hypercolumns. Method Apparatus An 8-Mb Compudyne (IBM-compatible) 486DX personal com- puter operating at 33 MHz was used to implement the simulation. The back-propagation neural network was written in Microsoft C/C++ Version 7.0 (Microsoft Corporation, 1991). Training shapes were generated from a diagram of the left and right visual hemifields found in Kelly (1985). Input Layer The input patterns to the network represented the activations of retinal ganglion cells caused by stimulus points in the visual field. 173 Copyright 1995 Psychonomic Society, Inc. 174 GOTTS AND BREMNER Julesz's (1971) work with random-dot stereograms established the legitimacy of using single-point stimuli in studies of depth per- ception. Only points from the binocular, right visual hemifield were used to train the network, as this was sufficient to demon- strate the feasibility of the proposed mechanism. The input layer to the network consisted of a vector 80 units long, the first 40 units representing the ganglia of the nasal hemiretina of the right eye, and the second 40 units representing the ganglia of the temporal hemiretina of the left eye. Each unit of the vector corresponded to a sector of one eye's visual field and symbolized one retinal, X, on-centered ganglion cell. X retinal ganglia seem to be involved in perceiving detail, while the other two main types of retinal ganglia-Y and W-seem to be more involved in detecting mo- tion in the visual field and in providing information necessary to move the eyes (Sterling, 1990). As no motion is involved in the current task, the X pathway is clearly the most germane of the three. Hidden Layer The hidden layer of the neural network consisted of a vector 125 units long. The closest neurological correlate to this layer of neu- rons is the lateral geniculate nucleus (LGN), which relays infor- mation from 'the retina to the cortex while maintaining contralat- eral and ipsilateral pathways (Kelly, 1985). It is important to mention, however, that this analogy is somewhat loosely con- structed. The connections with the hidden layer are not con- strained to maintain the same special lateral relationships: A sin- gle hidden neuron is connected to all the neurons at the input and output layers. Output Layer The output layer of the neural network consisted ofa 2 X 20 ma- trix representing hypercolumns found in Area V 1 of the visual cortex (Kandel, 1985). The two rows ofthe matrix represented the contralateral and ipsilateral dimensions of each hypercolumn, while the 20 columns denoted each of the 20 hypercolumns. Procedure Network training. The back-propagation neural network was allowed to self-organize until it correctly mapped the entire set of 144 input points to corresponding outputs. The network converged as soon as the error for each output unit was less than .10. As there was no danger of activation versus inactivation misclassification (because .8 was the lowest possible value for an active classifica- tion, whereas .2 was the highest possible value for an inactive clas- sification), the .10 error level was assessed as being sufficiently low. Network testing. After the training phase, a set of20 running- fact pairs was presented to the model. Each of the 20 pairs con- sisted of two running facts, representing two stimulus points of discrepant depth. Each of the first 10 running-fact pairs involved points with dissimilar horizontal positions, while the last 10 each involved points with similar horizontal positions. Expert system. For each of the stimulus points, the hyper- column discrepancies were calculated by the formula (discrep- ancy = column number of hypercolumn activated in ipsilateral row - column number ofhypercolurnn activated in contralateral row). The difference values were compared using a series of in- equality formulae, and judgments were made on the basis ofthese formulae. If the first point's disparity was greater than that of the second point, the first point was judged closer (a judgment value of -1). On the other hand, if the second point's disparity was greater than that of the first, the second point was judged closer (a judgment value of + 1). Points eliciting the same disparity values were judged as having equal distance (a judgment value of 0). The system's relative-depth judgments were written to a file, and the judgments were then correlated with the correct depth relation- ships. Figure 1 shows a logical flowchart for the hybrid model and Input Layer Total n =80 Hidden Lpyer 10 20 30 n = 125 0 0u1llut Layer C I Figure 1. Flowchart of the hybrid model. The neural-network component interacts with the expert-system component by supplying it with the hypercolumnar activation information it needs to make its judgments. HYBRID MODELS OF RELATIVE-DEPTH PERCEPTION 175 clarifies the relationship between the neural-network and expert- system components. Note--I indicates that the first point is closer; + I indicates that the second point is closer; 0 indicates that the points have the same depth. Table 1 Judged and Correct Depth Relationship Values for Simulations 1 and 2 Results and Discussion The neural network was able to converge and cor- rectly map all input patterns to their corresponding out- put patterns with error less than .10. The total training time was 12 min; 36 passes of the training facts were nec- essary. The Pearson correlation between relative-depth judgment data and correct depth relationships was found to be significant [r(18) = .48, P < .05]. These results establish that the proposed mechanism of relative-depth perception is indeed feasible. The judgment values and actual depth relationships are provided in Table 1. While the model perfectly discriminated relative depth for points from the last 10 running-fact pairs (which involved comparing points from similar horizontal posi- tions), it had trouble judging points from the first 10 pairs (which involved comparing points from dissimilar horizontal positions). Note, in particular, the lack of cor- respondence for Pairs 2, 5, 6, 8, and 10. The small inac- curacy ofthe model was due to a slight shift of the stim- ulus points for a given hypercolumn disparity toward the observer as the visual hemifield was traversed from left to right (refer to Figure 2). The shift is most pronounced at the far right, or periphery, of the hemifie1d. When a close point near the right border of the binocular hemi- field was compared with a more distant point near the left border, the expert-system component sometimes mistakenly judged the far-left point as the closer of the two presented. However, when presented with points having similar horizontal positions, the system per- SIMULATION 2 An inaccuracy such as the one described in Simula- tion 1 is neither desirable nor necessary. The shift phe- nomenon is the direct result of employing overlapping, polar-coordinate systems, and it can be corrected easily with a simple weighting strategy. The hybrid model in this simulation combines the strategy employed by the first simulation with a function that corrects each stim- ulus point's disparity value on the basis of its horizontal position in the visual field. Method Procedure As in Simulation I, the neural network went through a period of training to learn the relationship between stimulus points in the right visual hemifield and activation of hypercolumns in the left visual cortex. As soon as the network was able to map correctly each of the 144 training points to its corresponding hypercolumn activations with error less than .10 on each output, the network was presented with the set of 20 running-fact pairs, for which the teacher values were not provided. For this simulation, a different formula from the one given in Simulation I was used to calculate the hypercolumn discrepancies (disparity = column number of ac- tivated ipsilateral row - column number of activated contralateral row). The column numbers (from the output matrix) of the acti- vated hypercolumns for each point were left unaltered. However, the ipsilateral column number was weighted according to which horizontal position was involved (the contralateral eye was arbi- trarily selected to observe varying horizontal positions; the ipsi- lateral eye could have been used with equal facility). If a point was located on the right side of the hemifield, its ipsilateral column number was weighted more than if it had been located on the left side. This larger weighting would result in a larger disparity value, effectively compensating for the visual shift. In Simulation I, when a point from the left side of the visual hemifield was pre- sented with an equidistant point from the right side, the point on the right was actually judged as being more distant because its dis- parity value was too small. By increasing its value with the weight- ing strategy, it was moved closer to the visual-shift line for that particular disparity, and the depth judgment was corrected. Refer to Figure 2 for a graphic explanation. The formula for computing any stimulus point's disparity was given by [disparity = ipsilateral column * (I + contralateral column * shift) - contralateral col- umn], where shift = .004611. It is important to note here that this formula, in itself, is trivial; it enabled the authors to establish the formed flawlessly. The shift is clearly present in any sys- tem that consists of two overlapping, polar-coordinate systems, each having a unique origin. The human brain obviously employs such a system. However, whether the human brain compensates for this shift is unclear. Network Design To determine whether compensating for the shift was improv- ing the system's relative-depth judgments, the design, training facts, and running facts from Simulation I were used. Apparatus The machine used in the current simulation was the same as that used in Simulation I. Additionally, the C computer code was the same as that written for Simulation I, with the exception of the dif- ference calculation; the exact calculation is described later (see Procedure). -I -I -I -I -I -I -I -I -I -I -I -I -I -I -I -I -I -I -I -I Judgments -I -I +1 -I -I -I -I -I +1 -I o -I -I -I o -I -I -I o -I -I -I -I -I -I -I -I -I -I -I -I -I -I -I -I -I -I -I -I -I Model I Model 2 Correct Relationship I 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Running-Fact No. 176 GOTTS AND BREMNER Line alongwhich discrepancy values are the sarne Temporal Hemiretina Line along which all points have equal depth Figure 2. A diagram of the line along which the model in Simulation 1 perceives points as having the same depth. Note the shift of the line toward the observer at the right edge of the visual hemifield. The correction. made by the model in Simula- tion 2, allowsfor more accurate relative-depth judgments. feasibility of our particular disparity computation. The shift prob- lem addressed by the new formula is one of employing stereopsis as a mechanism for relative-depth perception, not just a problem of our particular computation. Any system that accurately judges relative depth must somehow solve the shift problem. Once computed, the new disparity values were compared using the inequality formulas from Simulation I to judge relative depth for each pair of stimulus points. The system tested to see whether the first point's disparity was greater than that of the second point. If so, the first point was judged closer; otherwise, the second point was judged closer. There was no real chance that the system would judge the points as equidistant, on account of the acuity of the weighting. It would only judge two points as having equal distance ifthey were the same point. The system's relative-depthjudgments were written to a file and then correlated with the correct depth relationships. Results and Discussion The neural network implemented in Simulation 2 was able to converge and correctly map all input patterns to their corresponding output patterns with error less than .10. The total training time for this version of the net- work was 15 min; 40 passes of the training facts were necessary. These varied slightly from the training re- sults found in Simulation I, simply because the pseudo- random number algorithm generated a different set of values every time the program was executed; the algo- rithm used the computer's clock time as a unique seed value. Different random numbers were assigned as hid- den weight values in this simulation, which resulted in a slightly different training time. The Pearson correlation between the system's relative- depth judgments and the correct depth relationships was found to be highly significant [r(18) = 1.00, p < .01]. The system was 100% accurate at making relative-depth judgments (an apparent improvement on the accuracy of Simulation I). The data from the current simulation (see Table I) establish a successful modeling of the percep- tion of relative depth. Furthermore, the data establish that our hypothesis concerning how disparity is computed is entirely feasible. GENERAL DISCUSSION Use of a hybrid neural network/expert system ap- proach in this particular problem is valuable because it allows us to model both the visual-tract physiology and the higher cognitive processing of depth cues in the most attractive way possible. While not clear for single-point stimuli, the benefit of the neural network component will be fully realized when more complex stimuli are in- volved. Future iterations ofthe model will be able to ren- der disparity values easily, while a more rule-based ap- proach would be extremely tedious to implement and less physiologically valid. The rule-based approach of the expert-system component, however, makes quick and explicit decisions on the basis of the information fed to it by the neural network. These decisions are made in a way that more closely mimics the higher cognitive pro- cesses of the human brain. In this particular setting, the hybridization ofthe neural-network and expert-systems approaches yields a useful model demonstrating that a hypercolumnar-disparity computation is indeed feasible. REFERENCES BARLOW, H. B., BLAKEMORE, c, & PETTIGREW, J. D. (1967). The neural mechanism of binocular depth discrimination. Journal of Physiology, 193,327-342. HYBRID MODELS OF RELATIVE-DEPTH PERCEPTION 177 DEANGELIS, G. C, OHZAWA, 1.,& FREEMAN, R. D. (1991). Depth is en- coded in the visual cortex by a specialized receptive field structure. Nature, 352,156-159. FOLEY, J. M. (1980). Binocular distance perception. Psychological Re- view, 87, 411-434. GUILLEMOT, J.-P., PARADIS, M.-C., SAMSON, A., PTITO, M., RICHER, L., & LEPORE, F. (1993). Binocular interaction and disparity coding in area 19 of visual cortex in normal and split-chiasm cats. Experi- mental Brain Research, 94, 405-417. GULICK, W. L., & LAWSON, R. B. (1976). Human stereopsis: A psy- chophysical analysis. New York: Oxford University Press. HUBEL, D. H., & WIESEL, T. N. (1962). Receptive fields, binocular in- teraction and functional architecture in the eat's visual cortex. Jour- nal ofPhysiology, 160, 106-154. JULESZ, B. (1971). Foundations of cyclopean perception. Chicago: University of Chicago Press. KANDEL, E. R. (1985). Processing ofform and movement in the visual system. In E. R. Kandel & 1. H. Schwartz (Eds.), Principles of neural science (2nd ed., pp. 366-383). New York: Elsevier. KELLY, J. P. (1985). Anatomy of the central visual pathways. In E. R. Kandel & J. H. Schwartz (Eds.), Principles of neural science (2nd ed., pp. 356-365). New York: Elsevier. MARR, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco: W. H. Freeman. MICROSOFT CORPORATION (1991). Microsoft C/C++ (Version 7.0) [Computer program]. Redmond, WA: Author. NOMURA, M. (1994). A model for neural representation of binocular disparity in striate cortex: Distributed representation and veto mechanism. Biological Cybernetics, 69, 165-171. OGLE, K. N. (1964). Researches in binocular vision. New York: Hafner. STERLING, P. (1990). Retina. In G. M. Shepard (Ed.), The synaptic or- ganization of the brain (3rd ed., pp. 170-213). New York: Oxford University Press. (Manuscript received November 18,1994; revision accepted for publication January 20, 1995.)