A hybrid computational approach for seismic energy demand prediction This is a repository copy of A hybrid computational approach for seismic energy demand prediction. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/156298/ Version: Accepted Version Article: Gharehbaghi, S, Gandomi, AH, Achakpour, S et al. (1 more author) (2018) A hybrid computational approach for seismic energy demand prediction. Expert Systems with Applications, 110. C. pp. 335-351. ISSN 0957-4174 https://doi.org/10.1016/j.eswa.2018.06.009 © 2018, Elsevier. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/. eprints@whiterose.ac.uk https://eprints.whiterose.ac.uk/ Reuse This article is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs (CC BY-NC-ND) licence. This licence only allows you to download this work and share it with others as long as you credit the authors, but you can’t change the article in any way or use it commercially. More information and the full terms of the licence here: https://creativecommons.org/licenses/ Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing eprints@whiterose.ac.uk including the URL of the record and the reason for the withdrawal request. mailto:eprints@whiterose.ac.uk https://eprints.whiterose.ac.uk/ A HYBRID COMPUTATIONAL APPROACH FOR 1 SEISMIC ENERGY DEMAND PREDICTION 2 3 S. Gharehbaghia, A.H. Gandomib,*, and S. Achakpourc, M.N. Omidvard 4 5 aDepartment of Civil Engineering, Behbahan Khatam Alanbia University of Technology, Behbahan, Iran 6 bSchool of Business, Stevens Institute of Technology, Hoboken, NJ 07030, USA 7 cDepartment of Civil Engineering, Iran University of Science and Technology, Narmak, Tehran, Iran 8 dCenter of Excellence for Research in Computational Intelligence and Applications, School of Computer 9 Science, University of Birmingham, Birmingham B15 2TT, U.K. 10 11 Abstract. In this paper, a hybrid genetic programming (GP) with multiple genes is implemented for 12 developing prediction models of spectral energy demands. A multi-objective strategy is used for 13 maximizing the accuracy and minimizing the complexity of the models. Both structural properties and 14 earthquake characteristics are considered in prediction models of four demand parameters. Here, the 15 earthquake records are classified based on soil type assuming that different soil classes have linear 16 relationships in terms of GP genes. Therefore, linear regression analysis is used to connect genes for 17 different soil types, which results in a total of sixteen prediction models. The accuracy and 18 effectiveness of these models were assessed using different performance metrics and their performance 19 was compared with several other models. The results indicate that not only the proposed models are 20 simple, but also they outperform other spectral energy demand models proposed in the literature. 21 Keywords: Evolutionary computation; Genetic programming; Regression Analysis; Input energy; 22 Hysteretic energy; Seismic energy spectra. 23 24 *Corresponding author. Email:a.h.gandomi@stevens.edu Tel.: +1 (201) 216-5029; Fax: 201-216-5385 URL: http://gandomi.beacon-center.org/ https://mail.google.com/mail/u/0/h/mur9z0mq4ulr/?&cs=wh&v=b&to=a.h.gandomi@stevens.edu http://gandomi.beacon-center.org/ Table of Contents 25 26 1. Introduction 27 2. Seismic Energy Concept and Formulation 28 3. Genetic Programming 29 3.1. Multi-Gene Symbolic Regression 30 3.2. Multi-Objective Genetic Programming (MOGP) 31 3.3. Accelerating GP Process 32 4. Predicting Seismic Energy Demand Spectra Using MOGP 33 4.1. Preparing Exact Data 34 4.1.1. Inelastic SDOF Systems 35 4.1.2. Earthquake Ground Motions 36 4.2. Model Development Using MOGP 37 5. Results and Discussion 38 5.1. Mathematical Model for EDP1 39 5.2. Mathematical Model for EDP2 40 5.3. Mathematical Model for EDP3 41 5.4. Mathematical Model for EDP4 42 5.5. Accuracy of the Models 43 5.6. Comparative Study 44 5.6.1. Assumptions 45 5.6.2. Comparison 46 6. Summary and Conclusion 47 48 1. Introduction 49 The approaches currently used for the seismic analysis and design of structures need to be improved 50 through considering appropriate engineering demand parameters that would represent the 51 characteristics of a structure and the design earthquake. In current approaches, either the structural 52 members are designed based on satisfying the balance between the force demand and the corresponding 53 strength supply while providing an adequate level of ductility (based on e.g. ASCE/SEI7 2010), or 54 based on the concept whether they are force- or deformation-controlled (based on e.g. FEMA-356 55 2000). Such approaches disregard the frequency content and duration of earthquake ground motion, as 56 well as the velocity response and hysteretic behavior (Gupta 1990). The importance of considering 57 these factors lies in evidences suggesting that, for instance, dissipated hysteretic energy due to repeated 58 inelastic excursions could result in a certain amount of seismic damage (Fajfar and Vidic 1994). In fact, 59 in addition to the force and deformation, the energy demand is of great importance in capturing the 60 mentioned seismic factors as the inelastic behavior is expected to occur due to the design and maximum 61 earthquakes. Housner (1956) was first to introduce these factors through defining the energy concept. 62 This concept requires that the energy dissipation capacitybe less than the input energy demand. 63 Both structural properties and earthquake characteristics affect the seismic energy demand. 64 Determining the spectral values of energy demand is beneficial due to its connection tothe amount of 65 structural damage (Fajfar and Vidic 1994, Gharehbaghi 2018). The design codes have not explicitly 66 implemented the energy demand parameter in predicting seismic demands yet. Moreover, the priority 67 of the energy-based design approach compared with the conventional strength-based design approach 68 needs further studies. Although, previous studies (e.g. Housner 1956, Fajfar and Vidic 1994, Kalkan 69 and Kunnath 2007, Akiyama 1985, Bertero and Uang 1988, Decanini and Mollaioli 2001, Manfredi 70 2001) have stipulated that the seismic energy parameters are of great importance in seismic design of 71 structures. It was shown that the hystertic energy demand is directly connectedto the structural damage 72 (Fajfar and Vidic 1994, Kalkan and Kunnath 2007, Akiyama 1985, Bertero and Uang 1988, Decanini 73 and Mollaioli 2001, Manfredi 2001, Benavent-Climent et al. 2010, Gharehbaghi 2018). After more than 74 two decades that the Housner’s proposal was almost neglected, it was received considerable attention 75 among researchers (Akiyama 1985, Kuwamura and Galambos 1989), and became the key issue of a 76 conference held in Bled city of Slovenia (Fajfar and Krawinkler 1992). It was recognized that input 77 energy and hysteretic energy are the indicators of ground motion and have a correlation with the 78 structural damage, and the quantity related to cumulative damage is the hysteretic energy (Fajfar and 79 Vidic 1994, Bertero and Uang 1988, Decanini and Mollaioli 2001). Most recently, Deniz et al. (2017) 80 also found that the most appropriate and reliable intensity measure for the seismic fragility analysis of 81 buildings is the seismic energy demand. 82 Estimation of input and hysteretic energy demands using mathematical models could be considered 83 as one of the important steps aligned with the extension of the energy-based seismic analysis and design. 84 As previously mentioned, both structural and earthquake characteristics need to be accounted for the 85 issue. Some earthquake characteristics such as soil type, earthquake magnitude, peak ground 86 acceleration (PGA), peak ground velocity (PGV), cumulative energy index, fault type, distance from 87 the hypocenter, were used by researchers in determining the energy spectra (e.g. Zahra and Hall 1984, 88 Uang and Bertero 1988, Fajfar et al. 1989, Uang and Bertero 1990, Sucuoğlu and Nurtuğ 1995, 89 Khashaee 2004). In addition to the earthquake characteristics, ductility ratio, damping ratio, and 90 hysteretic behavior model (e.g. elastic-perfectly plastic, bilinear, pinching, Takeda, and Clough models) 91 were the influential structural properties involved in the estimation of seismic energy demand spectra 92 (e.g. Sucuoğlu and Nurtuğ 1995, Decanini and Mollaioli 2001, Benavent-Climent et al. 2011).Several 93 works have been carried out on the estimation of the seismic energy demand parameters. Housner 94 (1956) presented a model to determine input energy based on the spectral velocity of SDOF system. 95 Kuwamura and Galambos (1989) presented energy demand spectra considering the soil type and 96 dominant period of the earthquake. Chou and Uang (2000) estimated absorbed energy for an inelastic 97 system by using an attenuation relation. They used nonlinear regression analysis considering both 98 structural and earthquake variables. Manfredi (2001) proposed simple/efficient mathematical models 99 to estimate input and hysteretic energy spectra. A dimensionless seismic index that is a function of 100 PGA, PGV and cumulative energy was proposed to estimate the seismic energy spectra. Although the 101 estimation models were simple and effective, the effect of soil behavior was not considered, and the 102 number of earthquake ground motions was rather limited. Decanini and Mollaioli (2001) proposed the 103 formulation of elastic seismic energy spectra. They also presented a comprehensive study to propose 104 the design inelastic energy spectra by introducing the response modification factor for the input energy. 105 Several structural variables (e.g. ductility ratio and hysteretic behavior) and earthquake characteristics 106 such as soil type, source-to-site distance, and earthquake magnitude were considered in the proposed 107 spectra. Arroyo and Ordaz (2007) estimated the hysteretic energy demand spectra from elastic response 108 parameters in accordance with the earthquake events recorded in Mexico City. Their mathematical 109 models were a function of pseudo-acceleration, velocity and displacement spectra. Elastic design input 110 energy spectra based on Iranian earthquakes were also presented by Ghodrati Amiri et al. (2008). 111 Recently, Dindar et al. (2015) proposed two regression-based simple mathematical models to estimate 112 the input and hysteretic energy spectra. A database of earthquake ground motion records composed of 113 near- and far-fault ones, PGA, soil types, earthquake magnitude, ductility ratio, and hysteretic behavior 114 model was included in the proposed models. Using the regression analysis, Quinde et al. (2016) also 115 proposed mathematical models to estimate the seismic energy spectra of inelastic systems located on 116 the soft soil for Mexico City. They captured the effect of ductility ratio of inelastic systems and 117 dominant period of the probable earthquakes on the presented models. More recently, Zhai et al. (2016) 118 proposed an expression to account for the effect of after-shock on the input energy spectra using an 119 equivalent velocity. Alıcı and Sucuoğlu (2016) carried out a regression analysis to estimate inelastic 120 input energy spectrum. The prediction equations for the input energy spectra were expressed in terms 121 of an equivalent velocity. Some crucial earthquake characteristics including soil type, epicentral 122 distance, moment magnitude, and the fault type were considered in the proposed models. All the 123 previously mentioned works use conventional regression methods to estimate their energy parameters 124 of interest. 125 Based on the capability of soft computing approaches and their recent advances, it is worthwhile to 126 use such efficient approaches for seismic demand prediction. Computational complexity of the 127 conventional methods and their limitations has made soft computing techniques, such as evolutionary 128 algorithms, artificial neural networks, support vector machines, and fuzzy logic, popular for solving 129 complex engineering problems. A common application of these tools is in predictive analysis for 130 modeling the nonlinear dependency of the input parameters to the output value(s) where the 131 conventional approaches (e.g. regression analysis) fail or perform poorly (Khan et al. 2003, Gandomi 132 and Roke 2015). Despite the success of artificial neural networks (ANNs) in prediction, they are 133 inappropriate to develop practical intelligible equations. In addition to ANNs, support vector machines 134 (SVMs) are another primary class of soft computing methods used to discover patterns and approximate 135 relationships when large quantities of data is available. Although both ANNs and SVMs have received 136 significant attention (e.g. Salajegheh and Heidari 2005, Gholizadeh and Salajegheh 2009, Papadopoulos 137 et al. 2012, Gharehbaghi and Khatibinia 2015, Khatibinia et al. 2015, Yazdani et al. 2016), they require 138 a pre-defined and initial structure for the equation and network architecture to be determined by the 139 user. Genetic programming (GP), a learning algorithm originated from genetic algorithms, is another 140 well-known and successful technique for developing nonlinear mathematical models for the complex 141 problems. GP and its variants have been effectively used for solving various problems in civil 142 engineering (e.g. Kayadelenet al. 2009, Alavi et al. 2011, Mirzahosseini et al. 2011, Gandomi et al. 143 2012, Vardhan et al. 2016; Lim et al. 2016). Several variants of GP have been proposed in the literature, 144 such as gene expression programming (Ferreira 2006) and multi-stage genetic programming (Gandomi 145 and Alavi 2011).One of the robust variants of GP is multi-gene genetic programming (MGGP) that 146 adds the capability of conventional regression to the standard GP ability in parameter estimation. The 147 effectiveness of MGGP has been proved in the works reported by Gandomi et al. (e.g. Gandomi and 148 Alavi 2012a,b, Gandomi et al. 2013, Babanajad et al. 2013, Gandomi et al. 2016). 149 Structural and earthquake engineering has benefited from the soft computing techniques in different 150 applications. For instance, ANNs and SVMs have been widely used for risk assessment, seismic 151 response prediction, control and health monitoring (Tsompanakis and Topping 2011). In this paper, 152 MOGP is used for predicting the seismic energy demand spectra considering both typical structural and 153 earthquake characteristics. For this purpose, eighteen set of the single-degree-of-freedom (SDOF) 154 systems with the structural properties of different hardening ratios of bilinear hysteretic behavior model, 155 damping ratios, and ductility ratios are used to determine the energy demand spectra mentioned. Also, 156 four different sets of earthquake ground motion records based on their soil types (soft, firm, stiff and 157 rock) with the source-to-site distances of more than 17.5 km and the magnitudes of greater than 5.5 158 were used. It was assumed that the different soil classes have linear relationships in terms of GP genes 159 which help to find one equation with different coefficients for different soil types. The records were 160 scaled to two PGA levels 0.5g and 1.0g. Finally, four mathematical models corresponding to the four 161 engineering demand parameters (EDPs) of spectral input and hysteretic energy, spectral hysteretic to 162 input energy ratio, and spectral energy modification factor, are proposed using MOGP. Then, the 163 effectiveness of the models is revealed using the performance metrics compared with those of available 164 in the literature. 165 In this study, section 2 describes the seismic energy concept and its formulation. Also this section 166 introduces the seismic energy based EDPs which can be useful in seismic design of inelastic structures. 167 Section 3 express a hybrid computational approach based on genetic programming used as a predictive 168 tool herein. Section 4 describes a framework for prediction of the EDPs. A set of mathematical models 169 are proposed and their accuracy are examined using some performance metrics in section 5. Finally, 170 the developed models are discussed and compared with some other methods proposed in the literature. 171 172 2. Seismic Energy Concept and Formulation 173 Housner (1956) first proposed the idea of the energy-based seismic design approach. When ground 174 motion transmits energy into a structure, some of the energy is dissipated through the damping and 175 inelastic behavior. The remained energy of the structure is stored in the form of kinetic energy and 176 elastic strain energy. Housner stipulated that the energy supply should be more than the energy demand 177 during an earthquake in the form of this principle that energy supply < energy demand for controlling 178 and avoiding the structural collapse (Housner 1956). 179 When a structure is subjected to earthquake excitation, its governing equation for the dynamic 180 behavior of an inelastic SDOF system could be written as (Chopra 2012): 181 ( ) ( ) ( ( ), ( )) ( )s gmu t cu t f u t u t mu t+ + = − (1) where m, c and fs represent the mass, damping coefficient and lateral resisting force of SDOF system, 182 respectively; ü(t), ( )u t and u(t) are acceleration, velocity and displacement relative to the ground with t 183 representing time in SDOF system, respectively; and üg(t) is the earthquake ground acceleration. 184 Governing equation of energy equilibrium is obtained by the integration of Eq. (1) with respect to u 185 (Uang and Bertero 1990, Chopra 2012): 186 ( ) ( ) ( ( ), ( )) ( )s gmu t du cu t du f u t u t du mu t du+ + = −    (2) In fact, Eq. (2) expresses the energy balance for a structural system during an earthquake while Eq. 187 (1) explains the force balance. Substituting displacement unit (du) by velocity term and integrating it 188 over the time of the earthquake ground motion t, Eq. (2) is expressed as: 189 ( ) ( ) ( ) ( ) ( ( ), ( )) ( ) ( ) ( )s gmu t u t dt cu t u t dt f u t u t u t dt mu t u t dt+ + = −    , (3) where t is the time of interest across the earthquake ground motion. Eq. (3) can also be written in a 190 general form as follows: 191 ( ) ( ) ( ) ( )K D A IE t E t E t E t+ + = , (4) where EI(t), EK(t) and ED(t) are the input energy demand, kinetic energy, and damping energy, 192 respectively; EA(t) is encompassed the recoverable elastic strain energy ES(t) and the irrecoverable 193 plastic hysteretic energy EH(t). The amount of EH(t) is equal to zero in the elastic systems and is 194 appeared in the inelastic systems. Therefore, Eq. (4) can be expressed as: 195 ( ) ( ) ( ) ( ) ( )K D S H IE t E t E t E t E t+ + + = , (5) where EK(t) and ES(t) are cumulative during the earthquake ground motion and are vanished at the end 196 of motion of the inelastic systems. In effect, these two terms are very small in comparison with ED(t) 197 and EH(t). Since the most portion of energy demand is dissipated through the damping energy and 198 hysteretic energy, the Eq. (5) can be approximately written as (Uang and Bertero 1990): 199 ( ) ( ) ( )D H IE t E t E t+  . (6) According to Eq. (6), the main portion of input energy demand is converted to the damping energy 200 and hysteretic energy. Besides, if the structural system remains in the elastic range during an 201 earthquake, EH(t) is trivial, and the energy-based analysis is not useful for seismic design (Uang and 202 Bertero 1990).As mentioned before, for design basis earthquakes, it is expected that a structure will 203 experience inelastic cyclic deformations resulting in hysteretic energy dissipation. As previously 204 mentioned, the hysteretic energy dissipation (EH) is directly attributed to the structural damage where 205 the EH /EI ratio has been introduced as a good indicator of expected damage (Fajfar and Vidic 1994, 206 Sucuoğlu and Nurtuğ 1995, Decanini and Mollaioli 2001, Manfredi 2001). For a given ductility ratio 207 (), EH /EI ratio is defined as follow: 208 H I E HI E    = , (7) where the EI and EH are the input and hysteretic energy corresponding to the ductility ratio of . 209 Another parameter that is of crucial importance for earthquake resistant design based on the energy 210 concept could be the response modification factor of the input energy that can be expressed as follow: 211 I I I E RE E   = . (8) The literature suggests that to have a practical energy based seismic design, the computation of the 212 input and hysteretic energy (EI and EH), hysteretic energy to input energy ratio (HI), response 213 modification factor of input energy (REI), and energy dissipation capacity is useful. Since the design 214 method requires inelastic dynamic analyses resulting in the expensive computational efforts, the use of 215 soft computing techniques is of great importance in predicting the practical mathematical models 216 (formulations). Except for the energy dissipation capacity that needs comprehensive experimental and 217 theoretical studies, the spectral values of the mentioned energy-based EDPs were predicted using 218 MOGP. The next section describes the MOGP. 219 3. Genetic Programming 220 There are two groups of models which can be used for modeling the complex nonlinear engineering 221 systems: phenomenological and behavioral (Gandomi et al. 2016). Phenomenological models need a 222 predefined structure obtained from the physical laws requiring a previous understanding about the 223 system. Concerning the complex systems, sometimes it is hard to find such models. Unlike the 224 phenomenological models, behavioral models can be simply generated by finding a reasonable 225 approximate relation between input variables and the output value(s) for a collection of data 226 (experimental or theoretical) irrespective of their governing physical principles. Although one of the 227 main advantages of behavioral modeling techniques is their independence on prior knowledge about 228 the governing physical relationships of input and outputs (Walter and Pronzato 1997, Metenidis 2004), 229 most of these models need the user to pre-assign a formulation pattern requiring optimization of its 230 unknown coefficients. Concerning the complex engineering systems, the use of conventional 231 techniques such as regression analysis cannot be guaranteed to find an accurate and reliable behavioral 232 model (Gandomi et al. 2016). It has been well recognized that most of the structural earthquake 233 engineering problems such as determining earthquake response of inelastic structures could be 234 considered as such complex problems. 235 Genetic programming (GP) (Koza 1990) is a novel behavioral modeling methodology with 236 completely new characteristics. GP is an extension of the genetic algorithm capable of functionalizing 237 data using tree structures. In fact, unlike classic regression models and ANNs, GP is capable of 238 generating a prediction equation irrespective of a predefined structure. The successful application of 239 GP and its variants have been reported in solving the various real-world problems (e.g. Gandomi and 240 Alavi 2012a,b, Gandomi et al. 2013, Babanajad et al. 2013, Gandomi et al. 2016). One of the robust 241 variants of GP is MGGP that adds the capability of conventional regression to the standard GP ability 242 in parameter estimation, which has been proposed recently (Searson et al. 2007).The initial MGGP 243 studies show that it can outperform other GP variants (Gandomi and Alavi 2012a,b). MGGP is 244 described in the next subsection. 245 3.1. Multi-Gene Symbolic Regression 246 GP can generally be defined as a supervised machine learning technique that searches a program space 247 instead of a parameter space. One of the useful variants of GP is Multi-gene genetic programming 248 (MGGP) (Searson et al. 2007, Gandomi and Alavi2012a). MGGP is used to design mathematical model 249 predictions which are inherently multi-gene, i.e., those models consisting of linear combinations of low 250 order nonlinear transformations of the input variables. Unlike conventional GP that is based on an 251 evaluation of single tree expression, MGGP uses a single GP particle swarm model selection program 252 constructed from a number of genes where each gene has a tree expression (Searson et al. 2010). In 253 effect, the model development procedure is decomposed by MGGP consisted of some relatively simple, 254 fixed-depth sub-models. 255 To develop a population of trees, GP typically uses symbolic regression. Compared with the 256 conventional GP, a weighted linear combination of outputs from a number of GP trees is considered as 257 a symbolic model in which each of these trees could be considered as a “gene” (Searson et al. 2010). 258 The number of genes and tree depth of any gene can be specified by the user which their maximum 259 values depend on the complexity of the models developed by MGGP. The evolved models are linear 260 combinations of low-order nonlinear transformations of the predictor variables (Searson et al. 2010). 261 The ordinary least squares method is used to estimate the linear coefficients for each of the evolved 262 genes of an individual. A more detailed explanation of MGGP is available in Refs. (Searson et al. 2007, 263 Searson et al. 2010). 264 A fixed linear illustration associated with binary encoding of all parameters is used in GA, which 265 results in a string of numbers as output. In GP however, the optimization problems are solved 266 irrespective of a pre-defined solution structure. Depending on the problem domain, the first generation 267 consisting of a population of possible solutions is randomly created by GP, and variation operators 268 generate new candidate solutions then. During the evolutionary process, crossover selects a node from 269 the parental individuals and exchanges the subtrees under the selected nodes randomly and creates a 270 new individual. Mutation truncates and replaces one node of a tree with another randomly generated 271 node from the same set, and then creates a new individual from an existing tree in the population. The 272 individuals with higher fitness values have higher survival likelihood in the successive generation. To 273 find the best fitting solution, the individual solution of a population is updated after executing a number 274 of runs in each generation, and the parental individuals are selected from the population based on a 275 good fitness function. To develop a population of genes, symbolic regression method can be 276 implemented using standard GP in which a symbolic mathematical expression is directly encoded by 277 each of the genes. Based on Figure 1 showing a typical multi-gene model, known as multi-gene 278 symbolic regression, three input variables (x1, x2, and x3) are used to predict the response. As shown in 279 the figure, although the nonlinear terms such as “sin” and “log” are used, the overall model is a weighted 280 linear combination of each gene utilizing the coefficients go, g1, and g2. Mathematically, the general 281 formulation of the multi-gene symbolic regression model can be written as (Gandomi et al. 2016): 282 1 ˆ( , , ) ( ( )) n o i i i y g g G = = + x g x  (9) where go is a bias term; gi is the gene weight, and Gi(ș,x) is the outputs vector from the ith gene 283 encompassing a multi-gene individual; ș is the vector of the unknown parameters for each gene; and n 284 is the number of genes. It should be noted that the algorithmic structure of MGGP, and standard GP is 285 the same, except for crossover and mutation of multi-gene individuals. MGGP does not necessitate any 286 simplifying assumption in the model development process, and it is more accurate and efficient than 287 the standard GP for modeling complex nonlinear problems (Gandomi and Alavi 2012a,b). To construct 288 an initial population in MGGP, random individuals are created by using different nonlinear functions, 289 input variables, and a range of random constants and each individual includes 1 and Gmax number of 290 genes. The algorithm attempts to maximize diversity by ensuring that no individuals contain duplicate 291 genes. The genes are randomly selected and the least squares normal equation is used to estimate the 292 vector of unknown coefficients g as follows (Searson et al. 2007, Searson et al. 2010, Hii et al. 2011, 293 Searson 2014): 294 ( ) 1T T−=g G G G y (10) where G = [1 G1…Gn] is the gene response matrix. Since the columns of matrix G can be collinear, the 295 Moore–Penrose pseudo-inverse (GTG)# can be computed by means of the singular value decomposition 296 instead of the standard matrix inverse (GTG)-1.Genes can be employed or eliminated using a tree 297 crossover operator (high-level crossover) during the MGGP run which is in addition to the standard GP 298 sub-tree crossover (a low-level crossover). The low-level crossover chooses a gene randomly from each 299 parent individual. Next, the standard sub-tree crossover is employed and the generated trees replace the 300 parent trees in the otherwise unaltered individual in the next generation. The high-level crossover allows 301 the exchange of one or more genes with another selected individual subject to the Gma x constraint. The 302 maximum number of genes of an individual is limited to Gmax.If any individual contains more genes, 303 the additional genes are randomly selected and deleted (Searson et al. 2007, Searson et al. 2010, Hii et 304 al. 2011, Searson 2014). 305 3.2. Multi-Objective Genetic Programming (MOGP) 306 Generally, both tree-based GP and MGGP deal with a single objective optimization problem for each 307 individual considering the defined fitness function in which, for symbolic regression, the goodness-of-308 fit to the training data is considered as the only objective to be maximized. Although MGGP yields 309 more compacted models compared with standard GP (Searson 2014), ineffective genes may be acquired 310 by multi-gene models and the single objective optimization problem results in the evolution of overly 311 complex, impractical and non-robust models (Gandomi et al. 2016). The simplest solution to eliminate 312 such shortcoming can be provided by limiting G in a model to Gmax which is a hard-to-determine unique 313 value for any given problem (Searson 2014). One good solution is the use of multi-objective concepts 314 into symbolic regression which is commonly referred to as multi-objective genetic programming 315 (MOGP). Using this methodology, both the goodness-of-fit and the complexity of the developed models 316 can be optimized simultaneously by searching the so-called Pareto front (non-dominated solutions) set. 317 Herein, the GPTIPS 2 toolbox (Searson 2014) associated with the related subroutines coded in 318 MATLAB (2013) is used to solve the MOGP by using a non-dominated sorting technique (Deb et al. 319 2002). To sort the non-dominant solutions by their complexity and precision, the non-dominated sorting 320 method is applied at the end of each generation of the MGGP algorithm. At first, the individuals are 321 classified from both the new and old population based on their position on the Pareto front. Pareto front 322 of each level encompasses a set of Pareto optimal solutions which other solutions do not dominate. In 323 addition, the solutions that include Pareto front of each level are not dominated by any other solution, 324 apart from those of in its previous Pareto front level. After that, a “crowding factor” (i.e., the average 325 distance of a solution from the nearest solutions (either side) on the same Pareto front) is computed for 326 separately all individual to increase population diversity, giving lower priority to the solutions that are 327 crowded together during the ranking process. Finally, the position of solutions is used to rank them 328 (those on level 1 are ranked above those on level 2, and so on),and a crowding factor of each solution 329 is used to rank those within the same level. The top 50% of the population is remained to participate in 330 the next generation, whilst the rest are eliminated (Searson 2014). 331 3.3. Accelerating GP Process 332 Typically, the data sets used in engineering studies are complex and do not include a very large number 333 of records particularly for experimental studies (Gani et al. 2016). While the successful application of 334 GP in modeling engineering systems has been reported in the literature (e.g., (Sajjadi et al. 2016)), it 335 can be difficult to model the systems with big data using GP. The evolutionary approaches are often 336 slower than statistical data mining. Since GP is usually used to find the structure of solution(s), it is one 337 of the slowest evolutionary algorithms. In addition, the extra process of non-dominated sorting of a 338 multi-objective GP magnifies the problem. To improve this weakness, in this paper, two strategies are 339 used in the prediction process: 340 • 60% of data (2160 samples) were randomly selected and used for training process, and the rest 341 (1440 samples) used as training set for each run; 342 • The final Pareto front was determined from merging the Pareto fronts for all runs. 343 In general, there are two classes of machine learning algorithms including trajectory based 344 algorithms and population-based algorithms. ANNs and regression analysis are two well-known 345 examples of trajectory algorithms. In contrast, GP is one of the mostly used population-based 346 algorithms which it deals with a set of the solution in each generation. This feature makes it flexible to 347 adopt with parallel processing, therefore, the paralleled computations can be used to accelerate MOGP 348 procedure using a distributed computing machine in order to deal with the Big Data issue in GP. 349 Although only twelve cores were used to evolve and evaluate new models herein, the number of cores 350 can be increased up to the population size using this framework. The schematic of parallel processing 351 in the GP process is shown in Figure 2. 352 4. Predicting Seismic Energy Demand Spectra Using MOGP 353 4.1. Preparing Exact Data 354 Since the inelastic responses of an SDOF system highly depend on both structural and earthquake 355 ground motion variables, the most influential ones are contributed in predicting spectral seismic energy 356 demand. The input variables are described in the next subsections in detail. 357 4.1.1. Inelastic SDOF Systems 358 The structural variables used in this study are the hardening ratio of bilinear hysteretic behavior model 359 (), damping ratio (), and the displacement ductility ratio () which are the structural properties used 360 to determine the energy demand spectra mentioned. The values assumed for  were 0.0, corresponding 361 to the elastic-perfectly plastic model, and 0.1 indicating bilinear model. Three values of 0.05, 0.10 and 362 0.15 were also used for  of the inelastic SDOF systems. In addition, three common values of 2, 4 and 363 6 were taken into account for . The periods range studied for the prediction of the energy spectra was 364 between 0.01 to 5.0 second for every 0.05 second. These considered variables (, , and ) resulted in 365 18 inelastic SDOF systems used for the prediction. 366 4.1.2. Earthquake Ground Motions 367 Three factors of the site class, source-to-site distance, and PGA are the three variables considered for 368 the earthquake ground motion records used. Based on the shear wave velocities corresponding to the 369 30 m in depth (Vs,30) of more than 750, 360 to 750, 180 to 360 and less than 180 m/s, four soil types of 370 S1, S2, S3, and S4 were assumed for the records used. Site types of S1, S2, S3, and S4 are respectively 371 representing the soft, firm, stiff and rock soil types. To consider the source-to-site distance, the records 372 having the Joyne-Boor distance (RJB) in the range of more than 17.5km and less than 150km were used 373 (known as far-fault records). In addition, two values of 0.5g and 1.0g were used for the PGA of the 374 records. All records are non-pulse-like and have the magnitude (M) of greater than 5.5. All the records 375 were downloaded from NGA-West-II project of PEER ground motion database (2017). The diversity 376 of M, RJB and Vs,30, and the number of records are shown in Figure 3. As shown in this figure, the records 377 are selected in a way that they have a large variety of the properties mentioned. The individual pseudo-378 spectral acceleration of each record of each soil type and their mean spectra are also shown in Figure 379 4. 380 Four main seismic energy-based EDPs were chosen to be predicted: (i) EDP1: spectral EI/m; (ii) 381 EDP2: spectral EH/m; (iii) EDP3: spectral HI; and (iv) EDP4: spectral REI For this purpose, based 382 on the values assumed for the structural variables, the 18 SDOF systems were modeled and subjected 383 to the mentioned earthquake records. A large number of inelastic time history dynamic analyses (more 384 than 1 million) were carried out, and the exact EDPs were determined. The entire process was simulated 385 in MATLAB platform (2013). 386 For each EDP, individual spectral energy responses of the SDOF systems under each set of 387 earthquake records were obtained. Then, based on the normal distribution, the mean plus one standard 388 deviation (mean+) for each EDP under each set of the earthquake records were determined to be 389 predicted. 390 4.2. Model Development Using MOGP 391 To develop powerful models, the suitable parameters ought to be utilized as a part of the MOGP 392 predictive algorithm. To obtain the optimum MOGP models, basic arithmetic operators (+, -, ×, /) and 393 mathematical functions (e.g. tanh, ln) were used. The models are formed by randomly combining the 394 components from the functional set and the terminal set. The number of programs (solutions) in the 395 population is determined by the population size, and the number of levels that the calculation would 396 apply before the run ends are resolved as per the number of generations.The nature of thedata set, 397 problem complexity, and the number of variables are the three determining factors for the population 398 size and the number of generations. Note that, the upper bounds of an individual (Gmax) and the 399 maximum tree depth (Dmax) need to be defined in order to restrict the complexity. To conduct a trade-400 off between the running time and the complexity of the evolved solutions, optimal values of 3 and 5 401 were respectively assumed for Gmax and Dmax. The parameter settings used for the MOGP 402 implementation listed in Table 1 are based on the previously suggested values in the literature (Searson 403 et al. 2007, Searson et al. 2010, Hii et al. 2011, Searson 2014) and employing a case-dependent trial-404 and-error process. 405 In order to generate new genes for individuals as well as to decrease the overall number of genes for 406 one model and increase the total number of genes for the other, a “rate-based high-level crossover” 407 through the use of a crossover rate parameter (CR) is employed herein. A uniform random number 408 between 0 and 1 with a default value of 0.5 is generated separately for each gene in the parents. If r 409 isless than CR, the corresponding gene is moved to the other individual. In case the exchanging process 410 results in offspring that contain more genes than the Gmax, the gene is randomly eliminated such that the 411 constraint is no longer violated (Searson 2014). Two data sets are needed for the analyses. Therefore, 412 data are randomly divided into two subsets for training and validation. The training dataset is used for 413 learning and the validation set for determining the quality of the evolved programs on unseen data. 414 Several combinations of training and validation sets were considered to determine a consistent data 415 division. To evaluate the evolved expressions and finding the best-encoded one, the minimum of the 416 root mean square error (RMSE) is used as the fitness function. RMSE can be expressed as follows: 417 2 1 1 ( ) n i ii RMSE e p n = = − , (11) where pi and ei are the predicted, and exact output values for the ith output, respectively; and n is the 418 number of samples. Two objectives of maximizing the correlation coefficient (R) and minimizing the 419 model complexity are used in MOGP approach in order to select the best final model. 420 5. Results and Discussion 421 Using MOGP, all EDPs (EDP1, EDP2, EDP3, and EDP4) were predicted, and their optimal 422 mathematical models (formulations) were determined. Four cases (S1 to S4) based on different soil 423 types of the earthquake records were considered in the prediction. Although it is quite possible that the 424 obtained model formulations be different for different soil types, it is more practical to develop a unique 425 mathematical model for an EDP with different coefficients. Therefore, in this paper, the complete 426 database (which includes four soil types of S1 to S4) is employed to develop a unique prediction model 427 for each EDP. The final mathematical model is selected based on a compromise between the prediction 428 accuracy (as measured by the correlation coefficient R) and the model complexity (as measured by the 429 number of input variables). After that, the complete database is divided into four groups based on the 430 four soil types. Using each group of data, the predicted coefficients of the final model (gi) are re-431 evaluated by conducting the regression analysis to reflect the influence of the soil type. Finally, four 432 mathematical models including structural variables (and PGA of earthquake records) with four different 433 groups of coefficients corresponding to the four cases mentioned above (S1 to S4) obtained are 434 presented herein. The contribution of each input variable in the mathematical prediction models and the 435 Pareto front obtained by using a nondominated sorting method at the end of an MOGP run are presented. 436 The results of all EDP models developed by MOGP for EDP1 to EDP4 have been shown in Figures 437 5(a)-(d). The Pareto front sets are shown in green circles and the rest of the models are shown in solid 438 blue circles. As mentioned earlier, the Pareto front set is obtained by using a non-dominated sorting of 439 populations at the end of all MOGP runs. This process simultaneously optimizes the accuracy and the 440 complexity of all developed models. The final model in each Pareto front set is selected and highlighted 441 in a red circle. 442 In order to benchmark the MOGP models, they were compared with gene expression programming 443 (GEP) models as a well-known and widely used GP algorithm. The GEP model also uses multiple gene 444 structure, which makes it similar to the MGGP in this respect. GEP requires 10 times more generations 445 to converge. It is because the MOGP algorithm converges quickly since it uses regression analysis 446 beside the evolutionary process. GEP’s parameter setting is similar to that of MOGP (shown in Table 447 1). The final results of GEP are presented in Figure 5. The results show that none of the models found 448 by GEP are among the Pareto front sets for any of the EDP1 to EDP4 problems. 449 The contribution of each input variable in the mathematical prediction models can be investigated 450 through their frequencies where a frequency value of 1.0 for a variable indicates that it has the maximum 451 contribution within the best-generated models (Gandomi et al. 2010). It was assumed that the models 452 with R2> 0.8 are the best-generated models. The frequency histograms of the input variables for all 453 predicted EDPs are shown in Fig. 6. As shown, for the selected database of EDP1, PGA and 454  respectively show the most and the least statistically significant contributions in the best-generated 455 MOGP models. For the collected database of EDP2, although all the input variables almost have 456 significant contributions in the best generated MOGP models, T and  are the most and the least 457 influential variables on the EDP2 prediction model. According to the literature (e.g. Kuwamura and 458 Galambos 1989, Manfredi 2001, Dindar et al. 2015), there is no report about the role of PGA on EDP3 459 https://en.wikipedia.org/wiki/Gene_expression_programming (hysteretic to input energy ratio). Moreover, based on the physics of the problem, when the damping 460 ratio is increased, the portion of hysteretic energy dissipation of the imparted seismic input energy is 461 decreased. These issues have been confirmed by the frequencies of  and PGA for EDP3 where they 462 have the largest and smallest statistically significant contributions, respectively. Moreover, as can be 463 seen in the figure, for the collected database relevant to EDP4, T and PGA have the largest and smallest 464 statistically significant contributions in the best-generated MOGP models, respectively. 465 5.1. Mathematical Model for EDP1 466 The mathematical model obtained for EDP1 is expressed as follows: 467 ( )0 1 1 2 2 I V E a a X a X S m  = + + , (12-a) 1 .tanh(tanh( ))X PGA T T  = , (12-b) 1.5 2 ( ) e T PGA X     − = , (12-c) where a0, is the bias term, a1 and a2 are the gene weight for the EDP1 prediction model. These 468 coefficients are listed in Table 2. SV is the spectral velocity of the elastic SDOF system. 469 5.2. Mathematical model for EDP2 470 The mathematical model derived for EDP2 is expressed as follows: 471 ( )0 1 1 2 2 H D E b b Y b Y S m  = + + , (13-a) 1 Y PGA= , (13-b) 2 2 ( 2 )( ) T Y e PGA   −= − + − , (13-c) where b0, is the bias term, b1 and b2 are the gene weight for EDP2. These coefficients are listed in Table 472 3. SD is the spectral displacement of the elastic SDOF system. 473 5.3. Mathematical model for EDP3 474 The mathematical model derived for EDP3 is expressed as follows: 475 0 1 1 2 2 H I E HI c c Z c Z E    = = + + , (14-a) 2 0.151 1 TZ e e  − − − −= , (14-b) 2 log ( 7.75) tanh( ) T Z T    = +    , (14-c) where c0 is the bias term, c1 and c2 are the gene weight for EDP3. These coefficients are listed in Table 476 4. 477 5.4. Mathematical model for EDP4 478 The mathematical model obtained for EDP4 is expressed as follows: 479 0 1 1 2 2 I I I E RE d d U d U E   = = + + , (15-a) 1 ( 0.422)( 1.4074) tanh( ) T U e T T   −= + + + − − , (15-b) 2 2 1 ln( )U T T   = + + , (15-c) where d0 is the bias term, d1 and d2 are the gene weight for EDP4. These coefficients are listed in Table 480 5. 481 It should be noted that EDP2 can also be determined by the following relationship: 482 I H I E E H m   = (16) In fact, there are two ways to compute EDP2: one is by using Eq. (13) directly (called as EDP2-1), and 483 another is by using Eq. (16) (called EDP2-2). 484 5.5. Accuracy of the Models 485 It should be noted that all the models are only valid in the range of actual database (discussed in section 486 4.1). In order to investigate the effectiveness and accuracy of the EDP models in the range of our 487 database, the common performance metrics, including the mean absolute percentage error (MAPE), the 488 relative root mean square error (RRMSE), the linear correlation coefficient (R), the performance index 489 (PI), coefficient of determination (r2), coefficient of efficiency (E), and the index of agreement (d) 490 corresponding to the predicted formulation of each EDP are obtained. MAPE, RRMSE, R, PI, r2, E, and 491 d are expressed as follows: 492 1 1 n i i i i e p MAPE n e= − =  , (16) 2 1 ( )1 n i ii e p RRMSE e n = − =  , (17) 1 2 2 1 1 ( )( ) ( ) ( ) n i i i ii n n i i i ii i e e p p R e e p p = = = − − = − −    , (18) 1 RRMSE PI R = + , (19) 2 2 1 2 1 ( ) 1 ( ) n i ii n ii e p r p = = − = −   , (20) 2 1 2 1 ( ) 1 ( ) n i ii n i ii e p E e e = = − = − −   , (21) 2 1 2 1 ( ) 1 ( ) n i ii n i i i ii e p d e e p e = = − = − − + −   , (22) where ie and ip are the average values of the exact and predicted outputs, respectively; and n, ei, and 493 pi were defined before. Lower MAPE and RRMSE, higher R (or R2), r2, E and d indicate the accuracy 494 and effectiveness of the prediction model used. Based on Eq. (19), higher R values and lower RRMSE 495 values result in lower PI and, subsequently, indicate a more precise model. It should be noted that PI 496 varies from 0 to ∞ and its values close to 0 indicate the model fits very well to the exact (actual) values. 497 It is worth noting that two sources of complexity affect the accuracy of the models. First, the structures 498 used are inelastic which leads to high nonlinearity, and second, the earthquake ground motion records 499 have some influential characteristics, such as frequency content, which make a structure to experience 500 different cyclic excursions associated with complex behavior. These problems are accentuated when 501 PGA is increased. 502 The abovementioned performance metrics of the predicted EDP1, EDP2 (including EDP2-1 and 503 EDP2-2), EDP3 and EDP4 models using MOGP are listed in Table 6. As can be seen in this table, 504 EDP1 and EDP2-2 have MAPE values respectively less than almost 16% and 17%, and it is less than 505 33.1% for EDP2-1 all of which are in an acceptable/reasonable range of the performance metric for 506 inelastic and complex systems. The MAPE values are very low for both EDP3 and EDP4. Lower 507 RRMSE values (near-zero usually less than 50%) also confirm the accuracy of the mathematical models. 508 According to Table 6, except for EDP2-1 with soil type of S4 which has an approximate RRMSE of 509 54%, EDP1, EDP2-1 and EDP2-2 respectively have RRMSE values less than 36.3%, 44%, and 42.1%, 510 and the remaining EDPs have RRMSE values less than 6.4%, indicating the accuracy of the predicted 511 models. The higher accuracy of predicted EDP3 and EDP4 is evident. The R2, E, d and r2 values near 512 1.0 (e.g. more than 0.8) indicate a good correlation, efficiency and agreement of the predicted values 513 with the exact ones. Based on Table 6, the R2, E, d, and r2 values are more than 0.8 for EDP1, EDP2-514 2, EDP3, and EDP4. The minimum values of R2, E, d, and r2 of EDP2-1are almost equal to 0.68, 0.55, 515 0.83 and 0.65 which belong to the soil type S4 whilst for other soil types almost all of them are more 516 than 0.8. Regarding the PI, that is a combination of R and RRMSE, the values less than 0.3 and 0.2 517 indicate a good and an excellent prediction, respectively. The PI values are less than 0.2 for EDP1, less 518 than 0.3 for EDP2-1, and less than 0.22 for EDP2-2 which indicate a good prediction capability of their 519 corresponding proposed MOGP models. This index is less than 0.04 for both EDP3 and EDP4, 520 confirming the excellent prediction capability of the proposed MOGP models. 521 To make a more informative and general evaluation of the proposed MOGP models, the average 522 values of the performance metrics of all soil types, for each of EDPs, are presented in Table 6. The 523 average results of EDP2 demonstrate that EDP2-2 has a better performance than EDP2-1. In fact, 524 MAPE, RRMSE, and PI of EDP2-2 model have about 39%, 34.5%, and 37% smaller values as well as 525 R2, E, d and r2 have almost 17%, 21%, 5.6%, and 13% larger values as compared to those of EDP2-1. 526 Finally, considering the stochastic nature of earthquake engineering problems and the nonlinear 527 relations governing the inelastic SDOFs behavior, the results indicate the good capability of MOGP 528 prediction models for EDP1, EDP2-1 and EDP2-2, and excellent capability of MOGP prediction 529 models for EDP3 and EDP4. 530 5.6. Comparative study 531 5.6.1. Assumptions 532 In the previous section, it was shown that all the mathematical models precisely predict each EDP of 533 interest. In this section, the accuracy of the proposed mathematical models of all EDPs (EDP1 to EDP4) 534 is compared with those of some available models from the relevant literature. The works presented by 535 Housner (1956), Kuwamura and Galambos (1989), Fajfar and Vidic (1994), Manfredi (2000), Bakhshi 536 and Tavallali (2006), and Dindar et al. (2015) are selected to make the comparison. All the works 537 selected herein have dealt with the inelastic SDOF systems having elastoplastic behavior model ( = 0) 538 and damping ratio () of 0.05. The PGAs of 0.5 and 1.0g and displacement ductility ratios () of 2, 4 539 and 6, are considered for the comparison. The well-known performance metrics including MAPE, 540 RRMSE, R2, PI, E, d, and r2 are used for comparison. To make an informative comparison, all the 541 performance metrics are listed in separated Tables for EDP1, EDP2, (including EDP2-1 and EDP2-2), 542 EDP3 and EDP4 (Tables 7-10 respectively). 543 In order to show the varying trend of the predicted models using MOGP and the best model from 544 literature compared with the exact results, a graphical comparison is also made. As mentioned in the 545 Introduction section and as can be concluded in the above comparison, Manfredi (2000) model is one 546 of the best models presented in the literature. Therefore, this is the only model selected to make a more 547 clear comparison for EDP1, EDP2 (including EDP2-1 and EDP2-2) and EDP3. Moreover, Bakhshi and 548 Tavallali (2006) model is also used for making a comparison for EDP4. In addition, the inelastic SDOF 549 systems having  = 0,  = 6 and  = 0.05 are used and are subjected to earthquake records corresponding 550 to all soil types S1 to S4 with PGAs of 0.5 and 1.0g. The exact and predicted energy-based spectrum 551 using MOGP and another selected model from the literature are shown in Figures 7-10 respectively for 552 EDP1 to EDP4. 553 5.6.2. Comparison 554 The mathematical model of Eq. (12), proposed for prediction of EDP1 using MOGP, is compared with 555 the seismic input energy equation presented by Housner (1956), Kuwamura and Galambos (1989), 556 Manfredi (2000), and Dindar et al. (2015). As shown in Table 7, Eq. (12) and Kuwamura and Galambos 557 (1989) models almost have similar performance for the soil type of S1 which are better than the other 558 models. Although Eq. (12) works very well for the soil type of S2, performance metric values of 559 Manfredi (2000) model, excepting MAPE, show that it works better than Eq. (12) and the other models. 560 Concerning the soil types of S3 and S4, Eq. (12) has an outperformance compared with the other models 561 in total. Based on the average values of performance metrics listed in Table 7, Eq. (12) has lower MAPE, 562 RRMSE and PI, higher R2 and E compared with the other models (and slightly lower d and r2 compared 563 with Manfredi (2000) model), indicating that the predicted model using MOGP totally outperforms the 564 other models available in the literature. The varying trend of the Eq. (12) and Manfredi (2000) model 565 compared with the exact results is also shown in Figure 7. As shown, the varying trend of Eq. (12) is 566 almost conform to the exact results trend excepting a difference for soil type of S1 at medium-periods 567 (see Fig. 7(a)) which is not considerable. This is also true for Manfredi (2000) model except for the 568 long-periods of soil types of S2 and S4 shown in Fig. 7(d). 569 The mathematical models of Eq. (13) and Eq. (16), proposed for prediction of EDP2 using MOGP, 570 are compared with the presented models by Kuwamura and Galambos (1989), Manfredi (2000), and 571 Dindar et al. (2015). The results of the comparison are listed in Table 8. As shown in this table, MOGP-572 2 model (Eq. (16)) significantly outperforms MOGP-1 model (Eq. (13)). Eq. (16) is resulted to lower 573 MAPE, RRMSE and PI, and higher R2, E, d and r2 compared with those of for other models with the 574 exception of RRMSE, PI, R2, E and r2 of soil type of S2 and r2 of soil type of S3 for Manfredi (2000) 575 model and soil type of S1 for Kuwamura and Galambos (1989) which are slightly better than those of 576 for Eq. (16). To make an overall comparison, the average values of the performance metrics are listed 577 in Table 8. These results confirm the superiority of the EDP2 prediction model using MOGP-2 (Eq. 578 (16)) as compared to those from the literature. The varying trend of the Eq. (13), Eq. (16) and Manfredi 579 (2000) models compared with the exact results is also shown in Figure 8. As shown, all models 580 relatively have a consistent varying trend to the exact results except for Eq. (13) at medium- and long-581 periods and for Manfredi model at long-periods of soil types of S1 and S4. Moreover, as shown in Figs. 582 7 and 8, the effect of PGA on both EDP1 and EDP2 is obvious as it is confirmed by their frequencies 583 in MOGP models depicted in Fig. 6(a) and Fig.6 (b). 584 A similar comparison was also carried out for EDP3 prediction model of Eq. (14) using MOGP. 585 The predicting equations presented in the works by Kuwamura and Galambos (1989), Fajfar and Vidic 586 (1994), Manfredi (2000), and Dindar et al. (2015) were selected for comparison. The results of this 587 comparison are shown in Table 9. According to the table, although some of the mentioned works show 588 relatively appropriate results, the performance metrics of the prediction model based on Eq. (14) show 589 lower MAPE, RRMSE and PI, and higher R2, E, d and r2 as compared to the other presented models, 590 except for R2 for soil type S1 of the model used in Manfredi (2000). This indicates the novel capability 591 of MOGP for accurate prediction of the EDP3 as compared with those presented in the literature. To 592 make a general comparison on the performance of the MOGP prediction model of EDP3, the average 593 values of the performance metrics are listed in Table 9. The results show that Eq. (14) highly 594 outperforms the other presented models in the literature. Figure 9 also shows the varying trend of Eq. 595 (14) and Manfredi (2000) model as compared to exact values. As depicted in the figure, the varying 596 trend of Eq. (14) is in good agreement with exact values except for soil type S1 which has inconsiderable 597 differences. Despite the exact values and Eq. (14), the Manfredi (2000) model has a constant trend 598 which has significant differences in the majority of periods. It should be noted that, as depicted in Fig. 599 9, EDP3 is not affected by PGA as evidenced by its very low frequency in MOGP prediction model for 600 EDP3 shown in Fig. 6(c). 601 EDP4 prediction model of Eq. (15) using MOGP was also compared with the Bakhshi and Tavallali 602 (2006) model. The comparative results are shown in Table 10. According to the table, for all soil types, 603 lower MAPE, RRMSE and PI, and higher R2, E, d and r2 is obtained for Eq. (15) with respect to the 604 Bakhshi and Tavallali (2006) model. As shown in the table, average performance metrics are computed 605 for making a general comparison, demonstrating the superiority of Eq. (15) using MOGP in order to 606 predict EDP4 rather than the Bakhshi and Tavallali (2006) model. Figure 10 shows the varying trend 607 of Eq. (15) and Bakhshi and Tavallali (2006) model compared with exact values. The shown graphs 608 confirm the considerable differences between the Bahshi and Tavalali (2006) model and the exact 609 values. In contrast, the MOGP model of Eq. (15) has the closest fit to the exact results. In addition, as 610 shown in this figure, EDP4 is not influenced by PGA as evidenced by its very low frequency in MOGP 611 prediction model for EDP4 shown in Fig. 6(d). 612 It should be noted that most models in the literature are developed for a system with a limited 613 number of SDOF and a low number of earthquake ground motion records. However, the proposed 614 MOGP-based models can deal with SDOF systems with a wide range of structural features subjected 615 to moderate-to-severe earthquake ground motions.. 616 6. Summary and Conclusion 617 Formulation of the seismic energy demand of inelastic SDOF systems is one of the main steps of 618 extending the energy-based seismic analysis and design approach. A comprehensive study was carried 619 out to propose accurate and simple mathematical models for predicting seismic energy demand spectra. 620 Multi-objective genetic programming (MOGP) was employed to formulate some main energy-based 621 EDPs, i.e., spectral input and hysteretic energy, spectral hysteretic to input energy ratio, and spectral 622 energy modification factor. Maximizing the accuracy and minimizing the complexity of the predictive 623 models were considered as two objectives of the multi-objective optimization procedure. Both 624 structural and earthquake characteristics were included in the proposed mathematical models. 625 Regarding each EDP, one equation with different coefficients was proposed for various soil types 626 assuming that different soil types have linear relationships. The frequency of each input variable in the 627 best-generated models was also presented to measure the importance of the variable. 628 Finally, the capability of the proposed models was examined using several common performance 629 metrics. The results indicate the accuracy and effectiveness of the proposed mathematical models in 630 predicting the seismic energy demand spectra compared with some of the models presented in the 631 literature. 632 633 References 634 Akiyama, H., 1985. Earthquake-resistant limit-state design for buildings, The University of Tokyo Press, Tokyo, 635 Japan. 636 Alavi, A.H., Aminian, P., Gandomi, A.H., and Esmaeili, M.A., 2011. Genetic-based modeling of uplift capacity 637 of suction caissons, Expert Systems with Applications 10, 12608–12618. 638 Alıcı, F.S., and Sucuoğlu, H., 2016. Prediction of input energy spectrum: attenuation models and velocity 639 spectrum scaling, Earthquake Engineering and Structural Dynamics 45, 2137–2161. 640 Amiri, G.G., Darzi, G.A., and Amiri J.V., 2008. Design elastic input energy spectra based on Iranian earthquakes, 641 Canadian Journal of Civil Engineering 35, 635–646. 642 ANSI/AISC, 2010. Specification for Structural Steel Buildings, American Institute of Steel Construction, 643 Chicago, IL. 644 Arroyo, D., and Ordaz, M., 2007. On the estimation of hysteretic energy demands for SDOF systems, Earthquake 645 Engineering and Structural Dynamics 36, 2365–82. 646 Babanajad, S.K., and Gandomi, A.H., Mohammadzadeh, D., Alavi, A.H., 2013. Numerical modeling of concrete 647 strength under multiaxial confinement pressures using linear genetic programming, Automation in Construction 648 36, 136–144. 649 Benavent-Climent, A., and Lopez-Almansa, F., and Bravo-Gonzalez, D.A., 2010. Design energy input spectra for 650 moderate-to-high seismicity regions based on Colombian earthquakes, Soil Dynamics and Earthquake 651 Engineering 30, 1129–1148. 652 Bertero, V.V., and Uang, C.M., 1988. Implications of recorded earthquake ground motions on seismic design of 653 building structures, Research Report, UCB/EERC-88/13, University of California at Berkeley, Los Angeles, CA. 654 Cabalar, A. F., and Cevik, A., 2011. Triaxial behavior of sand–mica mixtures using genetic programming. Expert 655 Systems with Applications 38, 10358–10367. 656 Chopra, A.K., 2012. Dynamics of structures: theory and applications to earthquake engineering, Fourth Edition, 657 Prentice Hall, CA. 658 Chou, C.C., and Uang, C.M., 2000 Establishing absorbed energy spectra – an attenuation approach,Earthquake 659 Engineering and Structural Dynamics 29, 1441–1455. 660 Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T., 2002. A fast and elitist multi objective genetic algorithm: 661 NSGA-II, IEEE Transactions on Evolutionary Computation6, 182–197. 662 Decanini, L.D., and Mollaioli, F., 2001. An energy-based methodology for the assessment of seismic demand, 663 Soil Dynamic and Earthquake Engineering 21, 113–137. 664 Deniz, D., Song, J., and Hajjar, J.F., 2017. Energy-based seismic collapse criterion for ductile planar structural 665 frames, Engineering Structures 141, 1–13. 666 Dindar, A.A., Yalçın, C., Yüksel, E., Özkaynak, H., and Büyüköztürke, O., 2015. Development of earthquake 667 energy demand spectra, Earthquake Spectra 31, 1667–1689. 668 Fajfar, P., and Vidic, T., 1994. Consistent inelastic design spectra: hysteretic and input energy, Earthquake 669 Engineering and Structural Dynamics 23, 523–537. 670 Fajfar, P., Vidic, T., and Fischinger, M., 1989. Seismic demand in medium- and long-period structures, 671 Earthquake Engineering and Structural Dynamics 18, 1133–1144. 672 Fajfar. P., and Krawinkler. H., 1992. Nonlinear seismic analysis and design of reinforced concrete buildings, 673 Elsevier Applied Science, New York. 674 Federal Emergency Manage Agency (FEMA), 2000. Prestandard and commentary for the seismic rehabilitation 675 of buildings. FEMA-356, prepared by the American Society of Civil Engineers, Washington, D.C. 676 Ferreira, C., 2006. Gene expression programming: mathematical modeling by an artificial intelligence (Vol. 21). 677 Springer. 678 Gandomi, A. H., and Alavi, A. H., 2011. Multi-stage genetic programming: a new strategy to nonlinear system 679 modeling. Information Sciences 181, 5227-5239. 680 Gandomi, A. H., Babanajad, S. K., Alavi, A. H., and Farnam, Y. (2012). Novel approach to strength modeling of 681 concrete under triaxial compression. Journal of Materials in Civil Engineering 24, 1132–1143. 682 Gandomi, A.H., Alavi, A.H., Mirzahosseini, M.R., and Nejad, F.M., 2010. Nonlinear genetic-based models for 683 prediction of flow number of asphalt mixtures, Journal of Materials in Civil Engineering 23, DOI: 684 10.1061/(ASCE)MT.1943-5533.0000154. 685 Gandomi, A.H., and Alavi, A.H., 2012a. A new multi-gene genetic programming approach to nonlinear system 686 modeling. Part I: materials and structural engineering problems, Neural Computing and Applications 21, 171–687 187. 688 Gandomi, A.H., and Alavi, A.H., 2012b. A new multi-gene genetic programming approach to nonlinear system 689 modeling. Part II: geotechnical and earthquake engineering problems, Neural Computing and Applications 21, 690 189–201. 691 Gandomi, A.H., and Roke, D.A., 2015. Assessment of artificial neural network and genetic programming as 692 predictive tools, Advances in Engineering Software 88, 63–72. 693 http://ascelibrary.org/journal/jmcee7 https://doi.org/10.1061/(ASCE)MT.1943-5533.0000154 Gandomi, A.H., Roke, D.A., and Sett, K., 2013. Genetic programming for moment capacity modeling of 694 ferrocement members, Engineering Structures 57, 169–176. 695 Gandomi, A.H., Sajedi, S., Kiani, B., and Huang, Q., 2016. Genetic programming for experimental big data 696 mining: A case study on concrete creep formulation, Automation in Construction 70, 89–97. 697 Gani, A., Siddiqa, A., Shamshirband, S.,and Hanum, F., 2016. A survey on indexing techniques for big data: 698 taxonomy and performance evaluation, Knowledge and Information Systems 46, 241–284. 699 Gharehbaghi, S., 2018. Damage controlled optimum seismic design of reinforced concrete framed structures, 700 Structural Engineering and Mechanics 65, 53–68. 701 Gharehbaghi, S., and Khatibinia, M., 2015. Optimal seismic design of reinforced concrete structures subjected to 702 time–history earthquake loads using an intelligent hybrid algorithm, Earthquake Engineering and Engineering 703 Vibration 14, 97–109. 704 Gholizadeh, S., and Salajegheh, E., 2009. Optimal design of structures for time history loading by swarm 705 intelligence and an advanced metamodel, Computer Methods in Applied Mechanics and Engineering 198, 2936–706 49. 707 Gupta, A.K., 1990. Response spectrum method in seismic analysis and design of structures. CRC Press, Boca 708 Raton, FL. 709 Hii, C., Searson, D.P., and Willis, M., 2011. Evolving toxicity models using multigene symbolic regression and 710 multiple objectives, International Journal of Machine Learning and Computing 1, 30–35. 711 Housner, G.W., 1956. Limit design of structures to resist earthquakes, in Proceedings, of the First World 712 Conference on Earthquake Engineering, 5, 1–13. 713 Kalkan, E., and Kunnath, S.K., 2007. Effective cyclic energy as a measure of seismic demand effective cyclic 714 energy as a measure of seismic demand, Journal of Earthquake Engineering 11,725–751. 715 Kayadelen, C., Günaydın, O., Fener, M., Demir, A., &Özvan, A. (2009). Modeling of the angle of shearing 716 resistance of soils using soft computing systems. Expert Systems with Applications 36, 11814-11826. 717 Khan, S.A., Shahani, D.T., and Agarwala, A.K., 2003. Sensor calibration and compensation using artificial neural 718 network, ISA Trans 42, 337–352. 719 Khashaee, P., 2004. Energy–based seismic design and damage assessment for structures, Ph.D. Dissertation, 720 Department of Civil Engineering, Southern Methodist University, USA. 721 Khatibinia, M., Gharehbaghi, S., and Moustafa, A., 2015. Seismic reliability-based design optimization of 722 reinforced concrete structures including soil-structure interaction effects, Earthquake Engineering- From 723 Engineering Seismology to Optimal Seismic Design of Engineering Structures, Chapter 11, Moustafa A (ed.), 724 InTech, 267–304. 725 Koza, J. R., 1990. Genetic programming: A paradigm for genetically breeding populations of computer programs 726 to solve problems (Vol. 34). Stanford, CA: Stanford University, Department of Computer Science. 727 Kuwamura, H., and Galambos, T.V., 1989. Earthquake load for structural reliability, Journal of Structural 728 Engineering 115, 1446-1462. 729 Manfredi, G., 2001. Evaluation of Seismic Energy Demand, Earthquake Engineering and Structural Dynamics 730 30, 485–499. 731 MATLAB, 2010. The language of technical computing, Math Works Inc. 732 Metenidis, M.F., Witczak, M., and Korbicz, J., 2004. A novel genetic programming approach to nonlinear system 733 modelling: application to the DAMADICS benchmark problem, Engineering Applications of Artificial 734 Intelligence17, 363–370. 735 Mirzahosseini, M.R., Aghaeifar, A., Alavi, A.H., Gandomi, A.H., and Seyednour, R., 2011. Permanent 736 deformation analysis of asphalt mixtures using soft computing techniques, Expert Systems with Applications 38, 737 6081–6100. 738 Papadopoulos, V., and Giovanis, D.G., Lagaros, N.D., Papadrakakis, M., 2012. Accelerated subset simulation 739 with neural networks for reliability analysis, Computer Methods in Applied Mechanics and Engineering 223–224, 740 70–80. 741 PEER Strong Motion Database. 2017; http://ngawest2.berkeley.edu/. 742 Quinde, P., and Reinoso, E., Terán-Gilmore, A., 2016. Inelastic seismic energy spectra for soft soils: Application 743 to Mexico City, Soil Dynamics and Earthquake Engineering 89,198–207. 744 Sajjadi, S., Shamshirband, S., Alizamir, M., Yee, L., Mansor, Z., Manaf, A.A. et al. 2016, Extreme learning 745 machine for prediction of heat load in district heating systems, Energy and Buildings 122, 222–227. 746 Salajegheh, E., and Heidari, A., 2005. Optimum design of structures against earthquake by wavelet neural network 747 and filter banks, Earthquake Engineering and Structural Dynamic 34, 67挑82. 748 Searson, D., Willis, M., and Montague, G., 2007. Co-evolution of non-linear PLS model components, Journal of 749 Chemometrics 21, 592–603. 750 Searson, D.P., 2014, GPTIPS 2: an open-source software platform for symbolic data mining, (arXiv preprint 751 arXiv:14124690). 752 https://www.infona.pl/resource/bwmeta1.element.elsevier-14dffa12-d307-3f3e-a661-a4637cd14bf6/tab/jContent https://www.infona.pl/resource/bwmeta1.element.elsevier-14dffa12-d307-3f3e-a661-a4637cd14bf6/tab/jContent http://www.sciencedirect.com/science/article/pii/S0045782512000552 http://www.sciencedirect.com/science/article/pii/S0045782512000552 http://ngawest2.berkeley.edu/ Searson, D.P., Leahy, D.E., and Willis, M.J., 2010. GPTIPS: an open source genetic programming toolbox for 753 multigene symbolic regression, in Proceedings of the International Multiconference of Engineers and Computer 754 Scientists: Citeseer , 77–80. 755 Sucuoğlu, H., and Nurtuğ, A., 1995. Earthquake ground motion characteristics and seismic energy dissipation, 756 Earthquake Engineering and Structural Dynamics 24, 1195–1213. 757 Tsompanakis, Y., and Topping, B.H.V., 2011. Soft computing methods for civil and structural engineering, Saxe-758 Coburg Publications, Stirlingshire, UK. 759 Uang, C.M., and Bertero, V.V., 1988. Implications of recorded earthquake ground motions on seismic design of 760 building structures, Earthquake Engineering Research Center , Report No. UCB/EERC-88/13, University of 761 California, Berkeley. 762 Uang, C.M., and Bertero, V.V., 1990. Evaluation of seismic energy in structures, Earthquake Engineering and 763 Structural Dynamics 19, 77–90. 764 Vardhan, H., Garg, A., Li, J., and Garg, A., 2016. Measurement of stress dependent permeability of unsaturated 765 clay, Measurement 91, 371–376. 766 Walter, E., and Pronzato, L., 1997. Identification of parametric models: from experimental data , 767 Communications and Control Engineering, Springer, 413 pp. 768 Yazdani, H., Khatibinia, M., Gharehbaghi, S., and Hatami, K., 2016. Probabilistic optimum seismic design of 769 reinforced concrete structures considering soil-structure interaction effects, ASCE-ASME Journal of Risk and 770 Uncertainty in Engineering Systems, Part A: Civil Engineering 3, G4016004–1–12; DOI: 771 10.1061/AJRUA6.0000880. 772 Zahra, T.F., and Hall W.J., 1984. Earthquake energy absorption in SDOF structures, Journal of Structural 773 Engineering 110, 1757–1772. 774 Zhai, C., Ji, D., Wen, W., Lei, W., Xie, L., and Gong, M., 2016. The inelastic input energy spectra for main shock–775 aftershock sequences Earthquake Spectra 32, 2149–2166. 776 A HYBRID COMPUTATIONAL APPROACH FOR SEISMIC ENERGY DEMAND PREDICTION S. Gharehbaghia, A.H. Gandomib, , and S. Achakpourc, M.N. Omidvard 1. Introduction 2. Seismic Energy Concept and Formulation 3. Genetic Programming 3.1. Multi-Gene Symbolic Regression 4. Predicting Seismic Energy Demand Spectra Using MOGP 5. Results and Discussion