doi:10.1016/j.eswa.2007.03.013 Available online at www.sciencedirect.com www.elsevier.com/locate/eswa Expert Systems with Applications 34 (2008) 2334–2341 Expert Systems with Applications Novel yield model for integrated circuits with clustered defects Lee-Ing Tong, Li-Chang Chao * Department of Industrial Engineering and Management, National Chiao Tung Uninversity, 1001 Dah-Hsei Road, Hsin-Chu 300, Taiwan, ROC Abstract As wafer sizes increase, the clustering phenomenon of defects increases. Clustered defects cause the conventional Poisson yield model underestimate actual wafer yield, as defects are no longer uniformly distributed over a wafer. Although some yield models, such as neg- ative binomial or compound Poisson models, consider the effects of defect clustering on yield prediction, these models have some draw- backs. This study presents a novel yield model that employs General Regression Neural Network (GRNN) to predict wafer yield for integrated circuits (IC) with clustered defects. The proposed method utilizes five relevant variables as input for the GRNN yield model. A simulated case is applied to demonstrate the effectiveness of the proposed model. � 2007 Elsevier Ltd. All rights reserved. Keywords: Clustered defects; General regression neural network; IC; Pattern; Yield model 1. Introduction Wafer yield is an important index of success used in inte- grated circuits (IC) manufacturing. Wafer yield is defined as the probability that a chip on a wafer has no defect. Defects are physical anomalies which result in circuit faults; dirt particles are the primary source of defects in IC manufacturing (Ferris-Prabhu, 1992). Numerous mathematical models have been developed for predicting wafer yield in the last 40 years (Cunningham, 1990; Stapper, 1991; Stapper & Rosner, 1995; Tyagi & Bayoumi, 1992). Most of these models treat wafer yield as a function of chip size, mean number of defects per chip and the average number of defects per unit area; the Poisson model, compound Poisson models and negative binomial model are examples such models (Cunningham, 1990). The Poisson model is the simplest model to use; however, to successfully predict wafer yield, defect must occur independently with constant probability of occurring in any small area on a wafer (Albin & Friedman, 1991). If these assumptions hold, defects are uniformly scattered over a wafer. However, Stapper (1985) reported that 0957-4174/$ - see front matter � 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2007.03.013 * Corresponding author. Tel.: +886 35 731 896; fax: +886 35 733 873. E-mail address: lichang.iem91g@nctu.edu.tw (L.-C. Chao). defects are typically clustered rather than dispersed ran- domly over a wafer, and this distribution becomes more evident as wafer size increases. Clustered defects usually violate the independence assumption of the Poisson model. The Poisson model, therefore, underestimates actual yield when defects cluster. Under this scenario, numerous yield models obtain more accurate yield predictions than the Poisson model. Compound Poisson yield models are complicated and only evaluate the relationship between chip size and yield (Cunningham, 1990). The cluster parameter a of the nega- tive binomial model can be very scattered and negative when the model is applied to predict yield (Cunningham, 1990). Consequently, these mathematical yield models have particular problems in predicting wafer yield. Dupret and Kielbasa (2004) use the partial least square (PLS) regres- sion methods to model the yield from measurements obtained during the production. However, an advanced statistics is needed to use the PLS regression methods. Neural networks can handle problems such as recognizing complicated patterns and fitting nonlinear functions. Back- Propagation Neural Network (BPNN), known for its general pattern-mapping capability, can be applied to numerous prediction problems and always performs well (Bishop, 1994; Fausett, 1994). However, obtaining good mailto:lichang.iem91g@nctu.edu.tw L.-I. Tong, L.-C. Chao / Expert Systems with Applications 34 (2008) 2334–2341 2335 prediction network requires substantial effort to identify BPNN’s parameters, such as the number of hidden layers, number of hidden units, learning rate and momentum. Furthermore, the BPNN model has certain problems such as local optimal solution, overtraining and undertraining. Compared with BPNN, the General Regression Neural Network (GRNN) model has numerous advantages: learn- ing is fast; only one parameter is required; there are no overtraining or undertraining problems; and, the likelihood of obtaining a global optimal solution is higher than with BPNN. This study presents a novel yield model which employs GRNN to predict wafer yield with clustered defects. The proposed GRNN yield model utilizes five relevant variables as input variables to predict the wafer yield: number of defects; chip size; mean number of defects per chip; mean number of defects per unit area; and, clustering index. The prediction accuracy of the proposed approach is com- pared with those of the negative binominal yield model and the BPNN yield model. A simulated case is presented to demonstrate the effectiveness of the proposed approach. 2. Yield models The Poisson yield model (Ferris-Prabhu, 1992), which is based on the Poisson distribution, is Y 1 ¼ Pðk ¼ 0Þ¼ e�k0; ð1Þ where k represents the number of defects in a chip and k0 represents the mean number of defects per chip. The Pois- son yield model was sufficiently effective for small chip sizes and tended to underestimate yields for larger chip sizes (Cunningham, 1990). To identify the clustering properties of defects in the yield model, some spatial distributions, including compound Poisson distributions, have been con- sidered (Raghavachari, Srinivasan, & Sullo, 1997). The compound Poisson yield model replaces defect density, which is assumed to be a constant in the Poisson yield mod- el, with a probability density function. The compound Poisson yield model can be described as Y ¼ Z 1 0 e�DAfðDÞdD; ð2Þ where D represents the defect density, A represents the chip size and f(D) represents the probability density function of defects. The compound Poisson yield model is complex and only considers relations between chip size and yield. The negative binomial yield model, which is a widely applied yield model, employs a gamma function for the dis- tribution of defect density (Okabe, Nagata, & Shimada, 1972; Stapper, 1973). The negative binomial distribution can be described as PðkÞ¼ Cðk þ aÞð�k=aÞk k!CðaÞð1 þ �k=aÞkþa ; k ¼ 0; 1; 2; . . . ; ð3Þ where �k and a are parameters of the negative binomial dis- tribution. The negative binomial yield model is Y 2 ¼ 1 ð1 þ �k=aÞa : ð4Þ Parameter a, called the cluster parameter, can be calculated as a ¼ �k2 ðr2 � �kÞ ; ð5Þ where �k is the mean number of defects per chip and r2 is the variance. The negative binomial model has been shown to be a powerful prediction model in IC manufacturing (Cunningham, 1990). However, reports also show that the cluster parameter a in the negative binomial model can be very scattered and negative when the model is used to predict yield (Cunningham, 1990). Langford, Liou, and Raghavan (2001) presents a simple robust windowing method for the Poisson yield model to extract the systematic and random components of yield from wafer probe bin map data. Liou et al. (2002) presents a statistical modeling of MOS devices for parametric yield prediction. Skinner et al. (2002) discuss two classes of tradi- tional multivariate statistical methods and a classification and regression tree (CART) method for modeling and anal- ysis of wafer probe test data to determine the cause of low yield wafers. Meyer and Park (2003) present a center-satel- lite model to Predicting defect-tolerant yield in the embed- ded core context. Dupret and Kielbasa (2004) presents partial least square (PLS) regression methods to model the yield from measurements obtained during the produc- tion. Hong, Milor, Choi, and Lin (2005) utilize two models which are derived from the Poisson yield model and the neg- ative binomial yield model for the effect of area scaling on IC reliability. Kim and Baldwin (2005) present a theoretical yield model for assembly process of area array solder inter- connect process. Other yield models used in various compa- nies are summarized in Stapper and Rosner (1995). In summary, existing wafer yield models have significant limitations: clustered defects cause the conventional Pois- son yield model to underestimate wafer yield; the com- pound Poisson yield model is too complex; the cluster parameter a of the negative binomial model can be sub- stantially scattered and sometimes negative; and, many parameters must be set when applying the BPNN model. Such drawbacks affect performance when these models are employed to predict yield. The most accurate models are the negative binomial (Cunningham, 1990; Stapper, 1973) and BPNN network (Bishop, 1994; Fausett, 1994) yield models. Only these two models, therefore, were selected for comparison in this study. 3. General regression neural network implementation The major difference between GRNN and other super- vised neural networks is that GRNN can treat continuous valued outputs and categorize data, and there are fewer training parameters are required, such as the number of hidden layers, number of hidden units, learning rate and 2336 L.-I. Tong, L.-C. Chao / Expert Systems with Applications 34 (2008) 2334–2341 momentum, than in BPNN. Moreover, GRNN can be used for any regression problem in which a linearity assumption is violated, and it converges fast on the optimal regression surface as the number of samples becomes substantially large. The GRNN model, then, is used in this study to pre- dict wafer yield. Fig. 1 shows the three-layer network of the GRNN model (Specht, 1991). Input units are merely distribution units which forward measurement variables to the pattern units in the second (hidden) layer. This hidden layer con- sists of one neuron for each pattern in the training pattern. The GRNN is essentially trained after one pass of the training patterns and its activation function normally uses an exponential function. The unique parameter of GRNN is the smoothing factor r which influences the output value; that is, high smoothing factors produce increased relaxed surface fits throughout the data. Unlike the conventional regression model, GRNN can be defined through its joint continuous probability density function, rather than utilizing a specified function that must be determined in advance. Assume that f(x, y) repre- sents the known joint continuous probability density func- tion of a vector variable, x, and a scalar random variable, y; the regression of y on X, then, is E½yjx ¼ X� ¼ R1 �1 yfðX; yÞdyR1 �1 fðX; yÞdy : ð6Þ When the density f(x, y) is unknown, it must be estimated by observations of x and y. The GRNN model utilizes a Parzen (1962) window, which is a nonparameter approach to estimating the joint continuous probability density func- tion f(x, y). The estimator can be represented as f̂ðX; YÞ¼ 1 ð2pÞðpþ1Þ=2rðpþ1Þ � 1 n Xn i¼1 exp � ðX � XiÞTðX � XiÞ 2r2 " # � exp � ðY � Y iÞ2 2r2 " # ; ð7Þ Fig. 1. GRNN block dia where p is the dimension of x, r is the smoothing parame- ter, Xi and Yi are sample values of observations x and y; and n is the number of sample observations. Combining Eq. (6) and (7), Eq. (8) can be obtained as bEðyjXÞ¼ Y_ðXÞ ¼ Pn i¼1 exp � ðX�XiÞTðX�XiÞ 2r2 h iR1 �1 y exp � ðy�Y iÞ2 2r2 h i dyPn i¼1 exp � ðX�XiÞTðX�XiÞ 2r2 h iR1 �1 exp � ðy�Y iÞ2 2r2 h i dy : ð8Þ Eq. (8) can be further simplified as Eq. (9), which is given by bY ðXÞ¼ Pn i¼1Y i exp � D 2 i 2r2 h i Pn i¼1 exp � D2i 2r2 h i ; ð9Þ where D2i ¼ðX � X iÞTðX � XiÞ. The GRNN model utilizes Eq. (9) to estimate y. Typically, the activation function of GRNN network is exponential, as shown in Eq. (10): fðD2i Þ¼ exp � D2i 2r2 � � : ð10Þ A new vector X is subtracted from the stored pattern vector when it enters the network. The squares of the difference are summed and input into the activation function in Eq. (10). Those values which pass through the activation func- tion are the pattern unit outputs and are forwarded to the summation units. The summation units proceed to sum the dot product between a weight vector and the pattern unit outputs to generate an estimate as shown in Eq. (11). Xn i¼1 exp � D2i 2r2 � � : ð11Þ gram (Specht, 1991). L.-I. Tong, L.-C. Chao / Expert Systems with Applications 34 (2008) 2334–2341 2337 Conversely, the summation units sum the dot product between the samples Yi and the pattern unit outputs to generate an estimator as shown in Eq. (12).Xn i¼1 yi exp � D2i 2r2 � � : ð12Þ Finally, the output unit divides Eq. (12) by Eq. (11) to ob- tain the desired estimate of y, which is the same as Eq. (9). GRNN measures how far a given sample pattern is from patterns in the training set. When a new pattern is pre- sented to the network, the input pattern is compared to all of the patterns in the training set to determine how far it is from those patterns. The output that is predicted by the network is a proportional amount of all of the out- puts in the training set. The proportion is based upon how far the new pattern is from the given patterns in the train- ing set. GRNN uses an algorithm to find appropriate indi- vidual smoothing factors for each input as well as an overall smoothing factor. The algorithm proceeds in two parts. The first part trains the network with the data in the training set. The second part tests a whole range of smoothing factors. The method will produce networks which work much better on the test set. The performance of neural networks can be measured by a root-mean squared error (RMSE), which can be calcu- lated as RMSE ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPn i¼1ðAi � OiÞ 2 n s ; ð13Þ where n represents the number of patterns, Ai represents the actual value of output and Oi represents the predicted value. Another indicator for measuring the strength of the relationship between the actual and predicted outputs Fig. 2. A simple representation of is the Pearson’s linear correlation coefficient r. In this study, RMSE and r are applied to evaluate the perfor- mance of the negative binomial, BPNN and the proposed GRNN yield model. 4. Proposed approach 4.1. Defect clustering patterns A major cause affecting yield is the degree to which defects are clustered (Friedman, Hansen, Nair, & James, 1997; Stapper, Armstrong, & Saji, 1983). Hence, the defects clustering phenomenon must be integrated when construct- ing a yield model. In this study, Borland Delphi program- ming language is employed to simulate a variety of defect clustering patterns for 8-in. wafers. Fig. 2 presents a simple representation of defect clustering patterns. Three design factors are employed in this study to simulate defect clus- tering patterns: the cluster pattern; percentage of defects located on grey regions; and, chip size. The following is a brief description of these three design factors. (1) Cluster pattern: Fig. 2 presents one random pattern and four clustering patterns (Friedman et al., 1997). The defects in a random pattern are distributed ran- domly over the entire wafer. Distribution of defects in the four clustering patterns depends on the per- centage of defects located in grey region. Grey region represents the defect-dense areas. (2) Percentage of defects located on grey regions: In the four clustering patterns, four percentages, 60%, 70%, 80% and 90%, of the total number of defects are located in grey regions, and the remaining defects are distributed randomly. the defect clustering patterns. 2338 L.-I. Tong, L.-C. Chao / Expert Systems with Applications 34 (2008) 2334–2341 (3) Chip size: Six chip sizes are considered: 1(1 · 1), 1.44(1.2 · 1.2), 1.96(1.4 · 1.4), 2.56(1.6 · 1.6), 3.24(1.8 · 1.8), and 4(2 · 2) cm2. From these three design factors, a total of 102 simula- tion trials can be obtained. For each simulation trial, up to 292 defects are obtained randomly. The number of defects obtained randomly creates simulations that are sig- nificantly close to real IC manufacturing conditions. The maximum number of chip that can be cut in an 8-in. wafer is 292. Therefore, the number of defects are simulated up to 292 defects to reduce the possible influence of outliers. Each simulation trial represents one defect clustering pattern. 4.2. Procedure of the proposed approach Models of predicting yield can be classified into macro yield modeling and micro yield modeling. Macro yield modeling uses die size, device density, and other large-scale factors to predict yields for new designs. Micro yield mod- eling uses critical device area, parametric sensitivity, redun- dancy effect, and other factors to predict yields (Mullenix, Zalnoski, & Kasten, 1997). Gruber’s general yield model (Gruber, 1994) is the most recognized model in Macro yield modeling, and can be described as Y ¼ Y 0ðD; A; hÞLðYÞ; ð14Þ whereY0 represents the asymptotic yield, which is a real- valued function of D, A, h, and; D represents point defect density per unit area, A represents chip area, h represents a set of parameters unique to the specific yield model, L(Y) represents a real-valued function describing learning effects. In this study, there are numerous attributes that can be obtained from each defect clustering pattern by sim- ple calculation: number of defects; chip size; mean number of defects per chip; mean number of defects per unit area; and, clustering index CI (Jun, Hong, Kim, Park, & Park, 1999). The clustering index CI can be calculated as CI ¼ min s2v �v2 ; s2w �w2 � � ; ð15Þ where vi and wi are a sequence of defect intervals on the x axis and y axis defined as vi ¼ xðiÞ � xði�1Þ; i ¼ 1; 2; . . . ; n; wi ¼ yðiÞ � yði�1Þ; i ¼ 1; 2; . . . ; n; where x(i) and y(i) denote the ith smallest defect coordinates on the x axis and y axis, respectively; �v and s2v represent the sample mean and the sample variance of vi, respectively; �w and s2w denote the sample mean and the sample variance of wi, respectively. The value of CI is close to 1 if the defects are randomly scattered, and the value of CI is expected to be greater than 1 if defects are clustered. These attributed values are input into the GRNN yield model, whereas the only output of GRNN yield model is the actual wafer yield. The number of replications for each simulation is 10; hence, a total of 1020 pairs of input–output data are obtained to train and test the GRNN yield model. The Poisson yield model is easy to apply; however, the effect of defect clustering is not considered by the conven- tional model. This study, proposes a GRNN yield model to predict wafer yield. The proposed approach assumes that wafer yield is affected by each wafer defect. Under this assumption, the proposed approach for the wafer yield pre- diction in IC manufacturing can be described as follows: Step 1: Determine the defect clustering pattern. Obtain the simulated defect wafer map. Utilize Borland Del- phi programming language to simulate all possible defect clustering patterns for 8-in. wafers. Step 2: Calculate all attributed values for each pattern. For each defect clustering pattern on a wafer, cal- culate the following attributed values of patterns: number of defects; chip size; mean number of defects per chip; mean number of defects per unit area; and, clustering index CI. Step 3: Build a GRNN yield model. Input the attributed values in Step 2 into the GRNN yield model. The actual yield of the wafer is the only output of the GRNN yield model. The percentage of the chip without defects on a wafer is used as the actual yield value of the wafer. In this study, the neural networks package NeuroShell 2 is employed to train and test the GRNN network. A trained GRNN network can be obtained after a few training patterns have been input. Finally, the trained GRNN network produces the net- work’s prediction for each pattern in the test set. Step 4: Calculate predicted yields. Input the attributed values in Step 2 into the negative binomial yield model to derive the predicted yields of the model. Then build the BPNN yield model as in Step 3 to obtain the predicted yields for the BPNN yield model. Step 5: Predict and analyze the wafer yield. Utilize the pre- dicted yields obtained by the negative binomial yield model, BPNN yield model and the proposed GRNN yield model to predict the actual yields for the wafer and compare these three yields. 5. Implementation 5.1. A simulation study This section presents a simulation study to demonstrate the effectiveness of the proposed approach. The data required in this simulation study are obtained by employ- ing the Borland Delphi programming language to simulate a variety of defect clustering patterns for 8-in. wafers. A total of 102 combinations for simulation trials are obtained by combining the three design factors outlined in Section 4. Fig. 3. The relationships between the predicted and actual yields for the negative binomial, BPNN and proposed yield models. Table 1 The comparisons of RMSE and correlation coefficients between predicted and actual yields Yield model RMSE Correlation coefficient Negative binomial yield model 0.1203 0.8838 BPNN yield model 0.0960 0.9030 Proposed yield model 0.0914 0.9127 L.-I. Tong, L.-C. Chao / Expert Systems with Applications 34 (2008) 2334–2341 2339 The number of replications is 10 in each simulation; hence, a total of 1020 pairs of input–output wafer attributed values are obtained and used to train and test the GRNN network. These wafers are divided into two parts: one part contains 816 wafers which are used to train the GRNN network; and, the second part contains 204 wafers which are employed to test the accuracy of the GRNN network. These attributed values and the actual yields of 1020 wafers are, respectively, utilized as the inputs and output for the proposed GRNN network. The percentage of the chip without defects on a wafer is used as the actual yield value of the wafer. NeuroShell 2 is utilized to train and test the GRNN network. Trained GRNN network are obtained after inputting the 1020 training patterns. Finally, the trained GRNN network is utilized to produce the net- work’s prediction of yields for these 204 test patterns. In this study, the unique parameter of GRNN network, that is, the smoothing factor r, is set at 0.076, and training is terminated when the error with no improvement of 1%. Substitute the attributed values of these identical 204 simulated wafers, respectively, into the negative binomial yield model to calculate the predicted yields of the model. Then build a BPNN yield model to obtain the predicted yields for the same wafers. The BPNN network in this study are constructed as three layers (Hornick, Stinch- combe, & White, 1990), with input and output units the same as those in the proposed GRNN network and 17 hid- den units (Widrow, Winter, & Baxter, 1987). The learning rate and momentum are 0.6 and 0.9, respective, which are the defaults in NeuroShell 2. The learning epochs are 10,000. To evaluate the performance of these yield models, the relationships between the predicted and actual yield are evaluated. The proposed GRNN yield model (Fig. 3) effec- tively estimates the actual yield. Table 1 presents the com- parisons of RMSE and correlation coefficients r for these three predicted yields and the actual yields. A low RMSE value and high r value indicates that the yield model per- forms better than the other models. The RMSE of the pro- posed approach is 0.0914, which is the smallest value of these three RMSEs, and the correlation coefficient is 0.9127, which is the largest value of these three correlation coefficients. These findings reveal that the proposed approach precisely estimates wafer yield, and more accu- rately predicts yield than the other models. The influences of each of the following three designed factors on the wafer yield are analyzed: cluster pattern; per- centage of defects located on grey regions; and, chip size. The RMSE and correlation coefficients r are applied to analyze the performance of the negative binomial, BPNN and the proposed GRNN yield model. Table 2 shows the RMSE and correlation coefficients of these three yield models for three designed factors. Table 2 reveals that the proposed approach produces the best prediction for wafer yield of the three yield models. This study varies the simulated wafer sizes from 6 to 12 in., and compares the prediction results for wafer yield from the three yield models. The simulation procedure is the same as that in the previous simulation. Table 3 sum- marizes the prediction results for three wafer sizes. Table 3 reveals that the proposed approach performs best of these three yield models regardless of wafer size. This simulation obtains the same result as obtained in the 8-in. wafer sim- ulation; that is, the proposed model performs best, and is followed by the BPNN yield model and the negative bino- mial yield model. The performance of the proposed approach and the BPNN yield model are very close and are better than that of the negative binomial yield model. Although the performance of the proposed GRNN yield model is slightly better than the BPNN yield model, the proposed GRNN yield model has numerous advanta- ges over the BPNN yield model. The GRNN yield model learns quickly and requires only one parameter. Table 2 The RMSE and correlation coefficients of these three yield models for three design factors Design factor Level Negative binomial BPNN Proposed GRNN RMSE Correlation coefficient RMSE Correlation coefficient RMSE Correlation coefficient Cluster pattern Random 0.0989 0.9465 0.0506 0.9716 0.0214 0.9947 Bull’s eye 0.1225 0.7785 0.0622 0.927 0.0386 0.9732 Crescent Moon 0.1315 0.8221 0.0884 0.847 0.0848 0.8599 Bottom 0.1218 0.7833 0.0977 0.7951 0.0977 0.8066 Edge 0.1463 0.831 0.0722 0.9371 0.0691 0.9466 Percentage 60% 0.1136 0.9239 0.0433 0.9801 0.0464 0.9807 70% 0.1312 0.8686 0.0511 0.9687 0.0496 0.9701 80% 0.1287 0.8332 0.0481 0.9511 0.0462 0.9558 90% 0.1359 0.9009 0.0622 0.9256 0.0577 0.9324 Chip size 1 0.1626 0.9529 0.0671 0.9191 0.0626 0.9255 1.44 0.098 0.9018 0.1191 0.7168 0.084 0.8766 1.96 0.0628 0.9503 0.1075 0.8212 0.0839 0.9002 2.56 0.0819 0.9197 0.1381 0.7185 0.0939 0.8824 3.24 0.1526 0.8155 0.1118 0.8047 0.0848 0.9039 4 0.153 0.9078 0.1476 0.7409 0.1128 0.8616 Table 3 The prediction results for three wafer sizes Wafer size The actual yield Yield model The predicted yield RMSE Correlation coefficient Average Std. dev. Average Std. dev. 6-in. 0.6610 0.2070 Negative binomial 0.6350 0.2348 0.1225 0.8597 BPNN 0.6538 0.1939 0.0854 0.9114 Proposed GRNN 0.6593 0.1906 0.0836 0.9145 8-in. 0.5662 0.2207 Negative binomial 0.5784 0.2561 0.1203 0.8838 BPNN 0.5599 0.2142 0.0960 0.9030 Proposed GRNN 0.5666 0.1852 0.0914 0.9127 12-in. 0.7007 0.1584 Negative binomial 0.7517 0.1772 0.0902 0.9073 BPNN 0.6955 0.1365 0.0667 0.9085 Proposed GRNN 0.7045 0.1393 0.0529 0.9448 2340 L.-I. Tong, L.-C. Chao / Expert Systems with Applications 34 (2008) 2334–2341 Furthermore, the proposed GRNN yield model has no overtraining or undertraining problems and the likelihood of obtaining a global optimal solution is higher than that of a BPNN yield model. 6. Conclusion As wafer size increases, the clustering of defects increases. Under this scenario, the conventional Poisson yield model cannot predict wafer yield. In this study, a pro- posed neural network-based approach is presented for defect clustering patterns to predict the wafer yield in IC manufacturing. The GRNN network is used to construct the yield model that can accurately predict wafer yield. The merits of the proposed approach are as follows: 1. The proposed approach utilizes five relevant variables as input variables to predict the wafer yield, rather than utilizing only some of those variables as do the Poisson yield model, compound Poisson yield models and the negative binomial yield model. Therefore, the proposed model is more accurate than both the negative binomial and BPNN yield model. 2. The influences of each of the three designed factors on the wafer yield are analyzed. The RMSE and correlation coefficients of these three yield models for three designed factors reveals that the proposed approach produces the best prediction for wafer yield of the three yield models. 3. This study varies the simulated wafer sizes from 6 to 12 in., and compares the prediction results for wafer yield from the three yield models. This simulation obtains the same result as obtained in the 8-in. wafer simulation regardless of wafer size. 4. The proposed GRNN yield model is fast learning and requires only one parameter to identify for the learning. 5. The proposed approach does not need to construct a complex mathematical yield model and more advanced statistics skill – it only requires a neural network pack- age to predict wafer yield. References Albin, S. L., & Friedman, D. J. (1991). Clustered defects in IC fabrication: impact on process control charts. IEEE Transactions on Semiconductor Manufacturing, 4(1), 36–42. L.-I. Tong, L.-C. Chao / Expert Systems with Applications 34 (2008) 2334–2341 2341 Bishop, C. M. (1994). Neural networks and their applications. Review of Scientific Instrumentation, 65(6), 1803–1832. Cunningham, J. A. (1990). The use and evaluation of yield models in integrated circuit manufacturing. IEEE Transactions on Semiconductor Manufacturing, 3(2), 60–71. Dupret, Y., & Kielbasa, R. (2004). Modeling semiconductor manufactur- ing yield by test data and partial least squares. In Proceedings of 16th International Conference on Microelectronics (pp. 404–407). France. Fausett, L. (1994). Fundamentals of neural networks architectures, algorithms, and applications. Englewood CLiffs, NJ: Prentice Hall. Ferris-Prabhu, A. V. (1992). Introduction to semiconductor device yield modeling. Boston: Artech House. Friedman, D. J., Hansen, M. H., Nair, V. N., & James, D. A. (1997). Model-free estimation of defect clustering in integrated circuit fabri- cation. IEEE Transactions on Semiconductor Manufacturing, 10(3), 344–359. Gruber, H. (1994). Learning and strategic product innovation: Theory and evidence for the semiconductor industry. Amsterdam, Netherlands: Elsevier. Hong, C., Milor, L., Choi, M., & Lin, T. (2005). Study of area scaling effect on integrated circuit reliability based on yield models. Micro- electronics Reliability, 45(9–11), 1305–1310. Hornick, K., Stinchcombe, M., & White, H. (1990). Universal approxi- mation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Networks, 3(5), 551–560. Jun, C. H., Hong, Y., Kim, S. Y., Park, K. S., & Park, H. (1999). A simulation-based semiconductor chip yield model incorporating a new defect cluster index. Microelectronics Reliability, 39(4), 451–456. Kim, C., & Baldwin, D. F. (2005). A theoretical yield model for assembly process of area array solder interconnect packages with experimental verification. IEEE Transactions on Electronics Packaging Manufactur- ing, 28(4), 344–354. Langford, R. E., Liou, J. J., & Raghavan, V. (2001). The application and validation of a new robust windowing method for the Poisson yield model. In Advanced Semiconductor Manufacturing Conference, IEEE/ SEMI (pp. 157–160). Germany. Liou, J. J., Zhang, Q., McMacken, J., Thomson, J. R., Stiles, K., & Layman, P. (2002). Statistical modeling of MOS devices for parametric yield prediction. Microelectronics Reliability, 42(4), 787–795, 9. Meyer, F. J., & Park, N. (2003). Predicting defect-tolerant yield in the embedded core context. IEEE Transactions on Computers, 52(11), 1470–1479. Mullenix, P., Zalnoski, J., & Kasten, A. J. (1997). Limited yield estimation for visual defect sources. IEEE Transactions on Semiconductor Manufacturing, 10(1), 17–23. Okabe, T., Nagata, M., & Shimada, S. (1972). Analysis of yield of integrated circuits and a new expression of the yield. Electrical Engineering in Japan, 92(12), 135–141. Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33(3), 1065–1076. Raghavachari, M., Srinivasan, A., & Sullo, P. (1997). Poisson mixture yield models for integrated circuits: A critical review. Microelectronics Reliability, 37(4), 565–580. Skinner, K. R., Montgomery, D. C., Runger, G. C., Fowler, J. W., McCarville, D. R., Rhoads, T. R., et al. (2002). Multivariate statistical methods for modeling and analysis of wafer probe test data. IEEE Transactions on Semiconductor Manufacturing, 15(4), 523–530. Specht, D. F. (1991). A general regression neural network. IEEE Transactions Neural Networks, 2(6), 568–576. Stapper, C. H. (1973). Defect density distribution for LSI yield calcula- tions. IEEE Transactions on Electron Devices (Correspondence), 20(7), 655–657. Stapper, C. H. (1985). The effects of wafer to wafer defect density variations on integrated circuit defect and fault distributions. IBM Journal of Research Development, 29(1), 87–97. Stapper, C. H. (1991). On Murphy’s yield integral. IEEE Transactions on Semiconductor Manufacturing, 4(4), 294–297. Stapper, C. H., Armstrong, F. M., & Saji, K. (1983). Integrated circuit yield statistics. Proceedings of the IEEE, 71(4), 453–470. Stapper, C. H., & Rosner, R. J. (1995). Integrated circuit yield manage- ment and yield analysis: Development and implementation. IEEE Transactions on Semiconductor Manufacturing, 8(2), 95–102. Tyagi, A., & Bayoumi, A. M. (1992). Defect clustering viewed through generalized Poisson distribution. IEEE Transactions on Semiconductor Manufacturing, 5(3), 196–206. Widrow, B., Winter, R.G., & Baxter, R.A. (1987). Learning phenomena in layered neural networks. In Proceedings of the First IEEE International Conference on Neural Networks (pp. 411–429). San Diego. Novel yield model for integrated circuits with clustered defects Introduction Yield models General regression neural network implementation Proposed approach Defect clustering patterns Procedure of the proposed approach Implementation A simulation study Conclusion References