key: cord-0045471-00od3ff9 authors: García-Rueda, Graciela; Valdovinos, Rosa M.; Valdés-González, Jesús; Alejo, Roberto; González-Ruiz, J. Leonardo; Marcial-Romero, José R. title: Analysis of Repair Costs of Scholar Buildings Affected by Earthquakes Using Data Mining. Case Study: Earthquakes of 2017 in Mexico date: 2020-04-29 journal: Pattern Recognition DOI: 10.1007/978-3-030-49076-8_5 sha: 7469a5c9557e501ec689aa18d9e6a3572014df95 doc_id: 45471 cord_uid: 00od3ff9 Earthquakes are events that cannot be predicted. However, when they occur, devastating consequences are shown in economic, social and structural areas, among others. In this paper, the mining of association rules is carried out in order to estimate the repair cost required by schools affected during the earthquakes of September 7th and 19th, of 2017 in Mexico. For that, we use the public data collected by the Mexican FONDEN. Throughout history, the structures collapse is a factor that generates the most material and human losses when earthquakes occur. This is mainly due to the use of low quality materials, deficiencies in construction processes and noncompliance with standards, among some other causes [1] . Data Mining (DM) support the diagnosis and analysis of the structures performance. According to the data characteristics collected after an earthquake, the DM techniques most used is the Association Rules (AR) [2] . An example of its application in earthquake is presented by Martínez-Álvarez et al. [3] who apply descriptive techniques for obtaining Quantitative AR (QAR), and uses a regression method (M5P Algorithm) for predict the earthquakes occurrence based on the relationship between Frequency and magnitude in order to observe the earthquakes variation. The QAR obtained showed that for earthquakes of moderate magnitude (between 3.5-4.4) earthquakes occur after short time intervals, while for high magnitudes (earthquakes of magnitude 4.4 to 6.2) the time intervals in relation to the Frequency and Magnitude had a significant decrease. In similar way Galán Montaño F. [4] uses DM techniques to describe the behavior of earthquakes according to their magnitude too. In their study, Montaño uses a genetic algorithm capable of finding frequent patterns and obtain behavior models of the time series according to the occurrence of earthquakes. The result shown that before an earthquake of magnitude greater than 4.5 occurs, it is high probability that an earthquake of magnitude 4.4 occur. On September 7th 2017, an earthquake of magnitude 8.2 was registered in Oaxaca, Mexico, which caused damage to 57,621 homes, 1,988 schools, 102 cultural buildings and 104 public buildings [5] . Few days after, on September 19th 2017 occurs another earthquake of magnitude 7.1 with epicenter in Puebla, which damaged more than 150 thousand homes in Oaxaca, Chiapas, Guerrero, Puebla, Morelos and State of Mexico, with an estimated repair cost up to 38,150 million of Mexican pesos [6] . After these two earthquakes, in the education sector were registered 12,931 schools with damages: 577 will require a total reconstruction, 1,847 a partial reconstruction and the remains with minor damages. The repair costs was estimated around to the 13,650 million of Mexican pesos [7] . Other structures with affectation were historical and culturally valuable buildings, such as the archaeological zone of Chiapa de Corzo, Zocalo of Mexico City, the National Museum of Art, among others, whose repair has an estimated cost of 8000 million of Mexican pesos [8] . In this paper we apply association rule mining for estimate the repair cost required for reconstructing school buildings damaged by the earthquakes of September 7th and 19th, using the data provided by the Fund of Budget Transparency (the Mexican FONDEN). It is important regarding that the data bases obtained during the earthquakes of September 7th and 19th are the first of that type that have been obtained in Mexico. For structural engineering purposes the data bases are not full completed, but the data contained is valuable to do by first time a study of earthquake engineering based on DM. One of the main contribution of the work is to explore the use of DM in earthquake engineering. The rules obtained are valuable because they describe the distribution of earthquake damage costs as a function of some basic structural characteristics of the buildings. These rules are exclusive for the earthquakes of September 7th and 19th in México and for the studied structures. Currently, there are not any parallel study of structural engineering that allows to validate the rules. The AR is a technique for discovering interesting associations or correlations from a transactional data set, where the rows represent the transactions and the column the items [9] . Let I = {i 1 , i 2 , ..., i n } be a set of n attributes or items and D = {t 1 , t 2 , ..., t n } a set of transactions in a data set. ∀t i ∈ D, ∃ T id as an unique identifier where T id ⊂ I [10] . An association rule can be defined as an implication of the form X ⇒ Y , where X is the antecedent and Y is the consequent of the rule. For example the rule {Bread, Cheese} ⇒ {Ham}, means that, when Bread and Cheese occur, Ham also occurs. For validate the quality of the rules and the probability that they reflect real relationships, two of the most used metrics for this purpose are [11] : Support: Probability P that a set of items appears in several transaction, support(X ⇒ Y ) = P (X ∪ Y ), and Confidence: Fraction of transactions in which X and Y appears, conf idence(X ⇒ Y ) = support(X∪Y ) support(X) . There are different algorithms for obtaining AR, however in this paper we use the Apriori algorithm and the PSO-GES metaheuristic. The Apriori algorithm is one of the first methods developed for association rules mining and is currently one of the most used. Apriori consider two stages [11] : firstly the algorithm identifies all the frequent itemsets and then convert them to an AR (see Algorithm 1). PSO-GES (see Algorithm 2) is a metaheuristic that generates quality AR with relatively low execution times. The algorithm consists of two stages: in the first stage, the dataset is transformed into a binary matrix and non-frequent items are eliminated. In the second stage, rule mining is carried out through a Guided Exploration Strategy. In which only those items that positively influence the fitness of a rule are added during the particle evolution process. This is done by computing fast estimating, using the summary matrix, values for the support and confidence of the rule represented by a new particle [9] . Dataset used in this work corresponds to infrastructure affected of schools during earthquakes of September 2017 in Mexico (ES17M dataset) 1 . ES17M consists of 19,194 records and 76 features, which was designed by Structuralist Experts in seismic risk analysis [1] . Starting from the total features in ES17M, only the most relevant to seismic risk analysis were chosen, leading 36 of the 76 available features ( Table 1 ). The last four rows (funding source) have six values: cost, total amount of attention, amount exercised and description, which gives a total of 36 items. 1: T able ← CreateT able(D, K) 2: P OP ← CreateP opulation(N ) 3: while t