key: cord-0043223-hgowgq41 authors: Zhang, Ruixi; Zen, Remmy; Xing, Jifang; Arsa, Dewa Made Sri; Saha, Abhishek; Bressan, Stéphane title: Hydrological Process Surrogate Modelling and Simulation with Neural Networks date: 2020-04-17 journal: Advances in Knowledge Discovery and Data Mining DOI: 10.1007/978-3-030-47436-2_34 sha: e917605c98c65b60221c55462649cc0e7910f399 doc_id: 43223 cord_uid: hgowgq41 Environmental sustainability is a major concern for urban and rural development. Actors and stakeholders need economic, effective and efficient simulations in order to predict and evaluate the impact of development on the environment and the constraints that the environment imposes on development. Numerical simulation models are usually computation expensive and require expert knowledge. We consider the problem of hydrological modelling and simulation. With a training set consisting of pairs of inputs and outputs from an off-the-shelves simulator, We show that a neural network can learn a surrogate model effectively and efficiently and thus can be used as a surrogate simulation model. Moreover, we argue that the neural network model, although trained on some example terrains, is generally capable of simulating terrains of different sizes and spatial characteristics. An article in the Nikkei Asian Review dated 13 September 2019 warns that both the cities of Jakarta and Bangkok are sinking fast. These iconic examples are far from being the only human developments under threat. The United Nation Office for Disaster Risk Reduction reports that the lives of millions were affected by the devastating floods in South Asia and that around 1,200 people died in the Bangladesh, India and Nepal [30] . Climate change, increasing population density, weak infrastructure and poor urban planning are the factors that increase the risk of floods and aggravate consequences in those areas. Under such scenarios, urban and rural development stakeholders are increasingly concerned with the interactions between the environment and urban and rural development. In order to study such complex interactions, stakeholders need effective and efficient simulation tools. A flood occurs with a significant temporary increase in discharge of a body of water. In the variety of factors leading to floods, heavy rain is one of the prevalent [17] . When heavy rain falls, water overflows from river channels and spills onto the adjacent floodplains [8] . The hydrological process from rainfall to flood is complex [13] . It involves nonlinear, time-varying interactions between rain, topography, soil types and other components associated with the physical process. Several physics-based hydrological numerical simulation models, such as HEC-RAS [26] , LISFLOOD [32] , LISFLOOD-FP [6] , are commonly used to simulate floods. However, such models are usually computation expensive and expert knowledge is required for both design and for accurate parameter tuning. We consider the problem of hydrological modelling and simulation. Neural network models are known for their flexibility, efficient computation and capacity to deal with nonlinear correlation inside data. We propose to learn a flood surrogate model by training a neural network with pairs of inputs and outputs from the numerical model. We empirically demonstrate that the neural network can be used as a surrogate model to effectively and efficiently simulate the flood. The neural network model that we train learns a general model. With the trained model from a given data set, the neural network is capable of simulating directly spatially different terrains. Moreover, while a neural network is generally constrained to a fixed size of its input, the model that we propose is able to simulate terrains of different sizes and spatial characteristics. This paper is structured as follows. Section 2 summarises the main related works regarding physics-based hydrological and flood models as well as statistical machine learning models for flood simulation and prediction. Section 3 presents our methodology. Section 4 presents the data set, parameters setting and evaluation metrics. Section 5 describes and evaluates the performance of the proposed models. Section 6 presents the overall conclusions and outlines future directions for this work. Current flood models simulate the fluid movement by solving equations derived from physical laws with many hydrological process assumptions. These models can be classified into one-dimensional (1D), two-dimensional (2D) and threedimensional (3D) models depending on the spatial representation of the flow. The 1D models treat the flow as one-dimension along the river and solve 1D Saint-Venant equations, such as HEC-RAS [1] and SWMM [25] . The 2D models receive the most attention and are perhaps the most widely used models for flood [28] . These models solve different approximations of 2D Saint-Venant equations. Two-dimensional models such as HEC-RAS 2D [9] is implemented for simulating the flood in Assiut plateau in southwestern Egypt [12] and Bolivian Amazonia [23] . Another 2D flow models called LISFLOOD-FP solve dynamic wave model by neglecting the advection term and reduce the computation complexity [7] . The 3D models are more complex and mostly unnecessary as 2D models are adequate [28] . Therefore, we focus our work on 2D flow models. Instead of a conceptual physics-based model, several statistical machine learning based models have been utilised [4, 21] . One state-of-the-art machine learning model is the neural network model [27] . Tompson [29] uses a combination of the neural network models to accelerate the simulation of the fluid flow. Bar-Sinai [5] uses neural network models to study the numerical partial differential equations of fluid flow in two dimensions. Raissi [24] developed the physics informed neural networks for solving the general partial differential equation and tested on the scenario of incompressible fluid movement. Dwivedi [11] proposes a distributed version of physics informed neural networks and studies the case on Navier-Stokes equation for fluid movement. Besides the idea of accelerating the computation of partial differential equation, some neural networks have been developed in an entirely data-driven manner. Ghalkhani [14] develops a neural network for flood forecasting and warning system in Madarsoo river basin at Iran. Khac-Tien [16] combines the neural network with a fuzzy inference system for daily water levels forecasting. Other authors [31, 34] apply the neural network model to predict flood with collected gauge measurements. Those models, implementing neural network models for one dimension, did not take into account the spatial correlations. Authors of [18, 35] use the combinations of convolution and recurrent neural networks as a surrogate model of Navier-Stokes equations based fluid models with a higher dimension. The recent work [22] develops a convolutional neural network model to predict flood in two dimensions by taking the spatial correlations into account. The authors focus on one specific region in the Colorado River. It uses a convolutional neural network and a conditional generative adversarial network to predict water level at the next time step. The authors conclude neural networks can achieve high approximation accuracy with a few orders of magnitude faster speed. Instead of focusing on one specific region and learning a model specific to the corresponding terrain, our work focuses on learning a general surrogate model applicable to terrains of different sizes and spatial characteristics with a datadriven machine learning approach. We propose to train a neural network with pairs of inputs and outputs from an existing flood simulator. The output provides the necessary supervision. We choose the open-source Python library Landlab, which is LISFLOOD-FP based. We first define our problem in Subsect. 3.1. Then, we introduce the general ideas of the numerical flood simulation model and Landlab in Subsect. 3.2. Finally, we present our solution in Subsect. 3.3. We first introduce the representation of three hydrological parameters that we use in the two-dimensional flood model. A digital elevation model (DEM) D is a w × l matrix representing the elevation of a terrain surface. A water level H is a w × l matrix representing the water elevation of the corresponding DEM. A rainfall intensity I generally varies spatially and should be a matrix representing the rainfall intensity. However, the current simulator assumes that the rainfall does not vary spatially. In our case, I is a constant scalar. Our work intends to find a model that can represent the flood process. The flood happens because the rain drives the water level to change on the terrain region. The model receives three inputs: a DEM D, the water level H t and the rainfall intensity I t at the current time step t. The model outputs the water level H t+1 as the result of the rainfall I t on DEM D. The learning process can be formulated as learning the function L: Physics-driven hydrology models for the flood in two dimensions are usually based on the two-dimensional shallow water equation, which is a simplified version of Navier-Stokes equations with averaged depth direction [28] . By ignoring the diffusion of momentum due to viscosity, turbulence, wind effects and Coriolis terms [10] , the two-dimensional shallow water equations include two parts: conservation of mass and conservation of momentum shown in Eqs. 1 and 2, where h is the water depth, g is the gravity acceleration, (u, v) are the velocity at x, y direction, Z(x, y) is the topography elevation function and S fx , S fy are the friction slopes [33] which are estimated with friction coefficient η as For the two-dimensional shallow water equations, there are no analytical solutions. Therefore, many numerical approximations are used. LISFLOOD-FP is a simplified approximation of the shallow water equations, which reduces the computational cost by ignoring the convective acceleration term (the second and third terms of two equations in Eq. 2) and utilising an explicit finite difference numerical scheme. The LISFLOOD-FP firstly calculate the flow between pixels with mass [20] . For simplification, we use the 1D version of the equations in x-direction shown in Eq. 3, The result of 1D can be directly transferable to 2D due to the uncoupled nature of those equations [3] . Then, for each pixel, its water level h is updated as Eq. 4, To sum up, for each pixel at location i, j, the solution derived from LISFLOOD-FP can be written in a format shown in Eq. 5, where H t i,j is the water level at location i, j of time step t, or in general as H t+1 = Θ (D, H t , I t ) . However, the numerical solution as Θ is computationally expensive including assumptions for the hydrology process in flood. There is an enormous demand for parameter tuning of the numerical solution Θ once with high-resolution two-dimensional water level measurements mentioned in [36] . Therefore, we use such numerical model to generate pairs of inputs and outputs for the surrogate model. We choose the LISFLOOD-FP based opensource Python library, Landlab [2] since it is a popular simulator in regional two-dimensional flood studies. Landlab includes tools and process components that can be used to create hydrological models over a range of temporal and spatial scales. In Landlab, the rainfall and friction coefficients are considered to be spatially constant and evaporation and infiltration are both temporally and spatially constant. The inputs of the Landlab is a DEM and a time series of rainfall intensity. The output is a times series of water level. We propose here that a neural network model can provide an alternative solution for such a complex hydrology dynamic process. Neural networks are well known as a collection of nonlinear connected units, which is flexible enough to model the complex nonlinear mechanism behind [19] . Moreover, a neural network can be easily implemented on general purpose Graphics Processing Units (GPUs) to boost its speed. In the numerical solution of the shallow water equation shown in Subsect. 3.2, the two-dimensional spatial correlation is important to predict the water level in flood. Therefore, inspired by the capacity to extract spatial correlation features of the neural network, we intend to investigate if a neural network model can learn the flood model L effectively and efficiently. We propose a small and flexible neural network architecture. In the numerical solution Eq. 5, the water level for each pixel of the next time step is only correlated with surrounding pixels. Therefore, we use, as input, a 3 × 3 sliding window on the DEM with the corresponding water levels and rain at each time step t. The output is the corresponding 3 × 3 water level at the next time step t + 1. The pixels at the boundary have different hydrological dynamic processes. Therefore, we pad both the water level and DEM with zero values. We expect that the neural network model learns the different hydrological dynamic processes at boundaries. One advantage of our proposed architecture is that the neural network is not restricted by the input size of the terrain for both training and testing. Therefore, it is a general model that can be used in any terrain size. Figure 1 illustrates the proposed architecture on a region with size 6 × 6. In this Section, we empirically evaluate the performance of the proposed model. In Subsect. 4.1, we describe how to generate synthetic DEMs. Subsect. 4.2 presents the experimental setup to test our method on synthetic DEMs as a micro-evaluation. Subsect. 4.3 presents the experimental setup on the case in Onkaparinga catchment. Subsect. 4.4 presents details of our proposed neural network. Subsect. 4.5 shows the evaluation metrics of our proposed model. In order to generate synthetic DEMs, we modify Alexandre Delahaye's work 1 . We arbitrarily set the size of the DEMs to 64 × 64 and its resolution to 30 metres. We generate three types of DEMs in our data set that resembles real world terrains surface as shown in Fig. 2a , namely, a river in a plain, a river with a mountain on one side and a plain on the other and a river in a valley with mountains on both sides. We evaluate the performance in two cases. In Case 1, the network is trained and tested with one DEM. This DEM has a river in the valley with mountains on both sides, as shown in Fig. 2a right. In Case 2, the network is trained and tested with 200 different synthetic DEMs. The data set is generated with Landlab. For all the flood simulations in Landlab, the boundary condition is set to be closed on four sides. This means that rainfall is the only source of water in the whole region. The roughness coefficient is set to be 0.003. We control the initial process, rainfall intensity and duration time for each sample. The different initial process is to ensure different initial water level in the whole region. After the initial process, the system run for 40 h with no rain for stabilisation. We run the simulation for 12 h and record the water levels every 10 min. Therefore, for one sample, we record a total of 72 time steps of water levels. Table 1 summarises the parameters for generating samples in both Case 1 and Case 2. The Onkaparinga catchment, located at Lower Onkaparinga river, south of Adelaide, South Australia, has experienced many notable floods, especially in 1935 and 1951. Many research and reports have been done in this region [15] . We get two DEM data with size 64 × 64 and 128 × 128 from the Australia Intergovernmental Committee on Surveying and Mapping's Elevation Information System 2 . Figure 2b shows the DEM of Lower Onkaparinga river. We implement the neural network model under three cases. In Case 3, we train and test on 64 × 64 Onkaparinga river DEM. In Case 4, we test 64 × 64 Onkaparinga river DEM directly with Case 2 trained model. In Case 5, we test 128 × 128 Onkaparinga river DEM directly with Case 2 trained model. We generate the data set for both 64 × 64 and 128 × 128 DEM from Landlab. The initial process, rainfall intensity and rain duration time of both DEM are controlled the same as in Case 1. The architecture of the neural network model is visualized as in Fig. 1 . It firstly upsamples the rain input into 3 × 3 and concatenates it with 3 × 3 water level input. Then, it is followed by several batch normalisation and convolutional layers. The activation functions are ReLU and all convolutional layers have the same size padding. The total parameters for the neural network are 169. The model is trained by Adam with the learning rate as 10 −4 . The batch size for training is 8. The data set has been split with ratio 8:1:1 for training, validation and testing. The training epoch is 10 for Case 1 and Case 3 and 5 for Case 2. We train the neural network model on a machine with a 3 GHz AMD Ryzen TM 7-1700 8-core processor. It has a 64 GB DDR4 memory and an NVIDIA GTX 1080Ti GPU card with 3584 CUDA cores and 11GB memory. The operating system is Ubuntu 18.04 OS. In order to evaluate the performance of our neural network model, we use global measurements metrics for the overall flood in the whole region. These metrics are global mean squared error: Case 5 is to test the scalability of our model for the different size DEM. In Table 2b , for global performance, the MAPE of Case 5 is around 50% less than both Case 3 and Case 4, and for local performance, the MAPE of Case 5 is 34.45%. Similarly, without retraining the existed model, the trained neural network from Case 2 can be applied directly on DEM with different size with a good global performance. We present the time needed for the flood simulation of one sample in Landlab and in our neural network model (without the training time) in Table 3 . The average time of the neural network model for a 64 × 64 DEM is around 1.6 s, while it takes 47 s in Landlab. Furthermore, for a 128 × 128 DEM, Landlab takes 110 more time than the neural network model. Though the training of the neural network model is time consuming, it can be reused without further training or tuning terrains of different sizes and spatial characteristics. It remains effective and efficient (Fig. 4 ). We propose a neural network model, which is trained with pairs of inputs and outputs of an off-the-shelf numerical flood simulator, as an efficient and effective general surrogate model to the simulator. The trained network yields a mean absolute percentage error of around 20%. However, the trained network is at least 30 times faster than the numerical simulator that is used to train it. Moreover, it is able to simulate floods on terrains of different sizes and spatial characteristics not directly represented in the training. We are currently extending our work to take into account other meaningful environmental elements such as the land coverage, geology and weather. HEC-RAS river analysis system, user's manual, version 2 The Landlab v1. 0 OverlandFlow component: a Python tool for computing shallow-water flow across watersheds Improving the stability of a simple formulation of the shallow water equations for 2-D flood modeling A review of surrogate models and their application to groundwater modeling Learning data-driven discretizations for partial differential equations A simple raster-based model for flood inundation simulation A simple inertial formulation of the shallow water equations for efficient two-dimensional flood inundation modelling Rainfall-Runoff Modelling: the Primer HEC-RAS river analysis system hydraulic userś manual Numerical solution of the two-dimensional shallow water equations by the application of relaxation methods Distributed physics informed neural network for data-efficient solution to partial differential equations Integrating gis and HEC-RAS to model assiut plateau runoff Flood hydrology processes and their variabilities Application of surrogate artificial intelligent models for real-time flood routing Extreme flood estimation-guesses at big floods? Water Down Under 94: Surface Hydrology and Water Resources Papers The data-driven approach as an operational real-time flood forecasting model Analysis of flood causes and associated socio-economic damages in the Hindukush region Deep fluids: a generative network for parameterized fluid simulations Fully convolutional networks for semantic segmentation Optimisation of the twodimensional hydraulic model LISFOOD-FP for CPU architecture Neural network modeling of hydrological systems: a review of implementation techniques Physics informed data driven model for flood prediction: application of deep learning in prediction of urban flood development Application of 2D numerical simulation for the analysis of the February 2014 Bolivian Amazonia flood: application of the new HEC-RAS version 5 Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations Storm water management model-user's manual v. 5.0. US Environmental Protection Agency Hydrologic engineering center hydrologic modeling system, HEC-HMS: interior flood modeling Decentralized flood forecasting using deep neural networks Flood inundation modelling: a review of methods, recent advances and uncertainty analysis Accelerating Eulerian fluid simulation with convolutional networks Comparison of the ARMA, ARIMA, and the autoregressive artificial neural network models in forecasting the monthly inflow of Dez dam reservoir Lisflood: a GIS-based distributed model for river basin scale water balance and flood simulation Real-time waterlevel forecasting using dilated causal convolutional neural networks Latent space physics: towards learning the temporal evolution of fluid flow In-situ water level measurement using NIRimaging video camera Acknowledgment. This work is supported by the National University of Singapore Institute for Data Science project WATCHA: WATer CHallenges Analytics. Abhishek Saha is supported by National Research Foundation grant number NRF2017VSG-AT3DCM001-021.