AI Insights

FloodGuard: Integrating Rainfall Time Series and GIS Data for Flood Prediction and Waterbody Forecasting in Bangladesh

March 22, 2024


article featured image

Table of Contents

  • Introduction
  • Related Work
  • Data Collection
  • Exploratory Data Analysis
  • Model Development
  • Conclusion
  • References

1. Introduction

Floods stand as one of the most devastating natural disasters, posing significant threats to lives, livelihoods, infrastructure, and the environment across the globe. Bangladesh, in particular, is susceptible to flooding due to its geographical location, topography, and dependency on the monsoon season. The nation’s vulnerability to floods is exacerbated by the confluence of major rivers such as the Ganges, Brahmaputra, and Meghna, which overflow during the monsoon, affecting millions of inhabitants annually. Recent advancements in technology have opened new avenues for flood forecasting and management, offering hope for more effective mitigation strategies.

This report outlines the development of a flood forecasting model by the Omdena Bangladesh Local Chapter team, aimed at improving disaster management through the use of rainfall time series and GIS data. The project’s goal is to provide accurate flood predictions to enhance preparedness and resource allocation. It follows a comprehensive workflow that includes collecting data, exploratory analysis, model training, and deployment, where the final outcome serves as a working prototype of an alerting system designed to empower disaster managers with actionable insights.

2. Related Work

2.1. Deep Learning Applications in Flood Forecasting

The comprehensive review by Kumar et al. (2023) underscores the significant potential of deep learning applications in flood forecasting and management. The study meticulously evaluates various deep learning models, including Convolutional Neural Networks (CNNs), Long Short- Term Memory (LSTM) networks, and Gated Recurrent Units (GRUs), showcasing their effectiveness in predicting river flow and rainfall with high accuracy. The integration of these models with data from diverse sources, such as satellite imagery and weather stations, illustrates a paradigm shift in how flood forecasting can be approached, moving towards more dynamic and real-time prediction systems.

2.2. Geospatial Technologies for Flood Hazard Zonation

Uddin et al. (2021) provides valuable insights into the use of Sentinel-1 Synthetic Aperture Radar images for flood hazard zonation in Bangladesh. By leveraging geospatial technologies, the study identifies flood-prone areas and suitable locations for flood shelters, offering a strategic framework for disaster risk mitigation. This approach emphasizes the importance of spatial analysis in understanding flood dynamics and underscores the necessity of incorporating geospatial data into flood management practices.

2.3. Machine Learning for Flood Prediction

The application of machine learning algorithms in flood prediction presents a promising avenue for enhancing early warning systems. For instance, Hossain et al. (2023) explores the utilization of the k-Nearest Neighbors algorithm, decision trees, and random forests in predicting flood probabilities based on rainfall indices. These models, trained on historical weather data, demonstrate the capability to forecast flood events with considerable accuracy, thereby enabling preventive actions to reduce flood-related damages.

2.4. Integration of Machine Learning and GIS

The synergy between machine learning and Geographic Information Systems (GIS) for flood forecasting is highlighted in the work by Saikh et al. (2023). This integrated approach leverages the spatial analytical power of GIS with the predictive accuracy of machine learning models, offering a nuanced understanding of flood risk factors and facilitating targeted intervention strategies. Such multidisciplinary methods are crucial for developing comprehensive flood management systems that are both effective and adaptable.

3. Data Collection

3.1. Historical Data

Websites of meteorological agencies, such as the Bangladesh Climate Data Portal, along with open-source repositories i.e. Kaggle, were programmatically scraped or manually downloaded to gather historical data on river discharge and rainfall across Bangladesh’s 10 divisions. This information was systematically compiled and stored in a CSV format.

3.2. GIS Data

Leveraging the capabilities of the Google Earth Engine platform, geospatial data covering all divisions of Bangladesh were acquired. This encompassed critical factors such as Elevation, Slope, Curvature, Land Use Land Cover, Soil Types, and Land Surface Temperature, among others. The datasets were downloaded in a Raster format.

4. Exploratory Data Analysis

4.1. Historical Data

In the analysis of flooding patterns across districts in Bangladesh, Chittagong and Barisal experience the highest levels of flooding, as demonstrated in Figure 1. The bar chart illustrates the mean daily precipitations, detailing average rainfall throughout the year, during rainy periods, and specifically during the monsoon season by division.

Figure 1: Distribution of Mean Daily Precipitations Across Divisions in Bangladesh

Figure 1: Distribution of Mean Daily Precipitations Across Divisions in Bangladesh

Complementing this, the line graph in Figure 2 represents the annual mean rainfall from 1985 to 2016. Several fluctuations are evidentover the years, with peaks corresponding to likely years of higher flood events. The variance in annual rainfall can be associated with climatic cycles and potentially with significant flood events, especially in districts with higher average rainfall.

Figure 2: Trends in Annual Mean Rainfall in Bangladesh from 1985 to 2016

Figure 2: Trends in Annual Mean Rainfall in Bangladesh from 1985 to 2016

4.2. GIS Data

Several GIS layers, including land surface, soil water content, sand content, and land surface temperature, were analyzed, yielding insightful findings. Figure 3 categorizes the land cover across Bangladesh, showing a predominance of water and flooded vegetation. This distribution is critical for understanding flood behavior, as areas with significant water bodies and flooded vegetation are likely more susceptible to flooding events.

Figure 3: Distribution of Land Cover Types in Bangladesh

Figure 3: Distribution of Land Cover Types in Bangladesh

Figure 4 and 5 depict the average soil/sand water content across different districts, providing insights into soil composition and water retention capability, which are influential factors in flood risk assessment. Districts with higher soil water content may indicate a greater risk of flooding due to lower infiltration rates, while high sand content could suggest better drainage and potentially lower flood ris

Figure 4: Average Soil Water Content

Figure 4: Average Soil Water Content

Figure 5: Average Sand Water Content

Figure 5: Average Sand Water Content

Figure 6 repesents the temperature distribution, which can influence evapotranspiration rates and, consequently, the hydrological cycle. Understanding these patterns can aid in predicting the saturation levels of the land and the potential for flooding.

Figure 6: Land Surface Temperature Distribution in Bangladesh

Figure 6: Land Surface Temperature Distribution in Bangladesh

5. Model Development

5.1. Flood Prediction Model using Historical Data

Using the same mechanism described in Section 5.2, we prepared a balanced dataset from historical flood events that occurred from 2003 to 2023. The dataset contained both topological features (such as Digital Elevation Model) and weather/meteorological features (such as precipitation, wind direction, etc..). Since there are static and dynamic features present in the dataset, one of the key experimental model development approaches is to develop two separate models where one model considers the temporal aspects while the other doesn’t.

Before the development of the model, weather features are also extensively explored to feature engineer that can better capture weather dynamics, such wind vectorization, cyclical time encoding to capture seasonality, precipitation analysis and soil saturation trends. Afterwards, we were able to develop these two models and here are some key findings:

  • Spatial & Weather Condition Model: A Random Forest using topographical and weather-related features is developed prioritizing recall and Cohen Kappa Score. We found that topographical features alone, especially elevation, were strong predictors of flood occurrences.
  • Temporal Model: An XGBoost model focused on modeling time-series data using only weather-related features, is also developed achieving high precision, balanced recall, and an exceptional AUC of 0.99. One of the key insights we discovered was that temporal factors like the day of the year and precipitation patterns were integral to the model’s success.

However, while the modeling approach above represents a right step in the direction of creating an effective predictive system to enhance disaster management capabilities, we recognize the need to improve the data collection strategy, aiming for a more nuanced approach to identifying flood-prone areas based on historical data, before deploying them.

In section 6.1, I described a modeling approach the team for deployment.

5.2. Flood Susceptibility Model using GIS Data

In order to prepare the GIS data, a Flood Inventory derived from historical flood extents in Bangladesh consisting of 2,766 sample points (flood points – 1408 and non-flood points – 11358) is utilized to form a binary classification problem. With the help of GIS software, 11 flood conditioning factors are extracted for all the points, which includes: slope, aspect, curvature, elevation, Stream Power Index (SPI), flow accumulation, Topographic Wetness Index (TWI), soil permeability, soil texture, land use/land cover (LULC), geology, distance to rivers, and drainage density.

For model development, the dataset was partitioned into an 80:20 training-test split. The training set, constituting 80% of the data, was employed to train a Random Forest classifier for predicting flood scenarios, while the test set comprised the remaining 20%.

The model’s performance is notably strong, reflected by an AUC score of 96%. It demonstrates high precision, recall, and an F1 score of 90% for classifying both flooded and non-flooded scenarios. Additionally, the model achieved a Cohen Kappa Score of 80%, indicating a significant level of agreement beyond chance. This model stands out as robust and reliable for flood susceptibility assessment.

6. Model Deployment

The finalized models were then integrated into a Streamlit application and was enhanced to include important information for users to comprehend.

6.1. Flood Prediction Model using Historical Data

Since we learned the significance of the precipitation on the flood, the team decided to mainly focus on forecasting the average daily rainfall using weather-related features for a selected number of cities in Bangladesh to improve flood preparedness. Thus, we developed various machine learning models carefully tuned for optimal performance from each city’s historical weather data. As an example, an XGBoost model achieved a mean absolute error of 2.8471 mm on a single day, highlighting the average prediction capabilities of the model. Moreover, the R squared coefficient was 72.98% indicating a strong correlation between the model’s predictions and the actual data. We found similar encouraging results across other cities as well.

6.2. Flood Susceptibility Model using GIS Data

 The outcome of the Random Forest classifier is rendered into a spatial format using Inverse Distance Weighting (IDW) interpolation to produce a continuous raster surface. Utilizing the leafmap library, these results are then transformed into an interactive map interface. This enables users to visualize flood susceptibility across Bangladesh, with the flexibility to select a specific division or view the entire country, as depicted in the figures below.

The flood susceptibility map of Bangladesh indicates that the Khulna Division, particularly near the coastline, exhibits over 50% high risk of flooding, aligning with scientific studies that identify the Satkhira district within Khulna as one of the country’s most flood-prone regions. In stark contrast, the Dhaka Division is shown to be the least susceptible, which can be attributed to its inland location and higher elevation, providing some natural protection against flooding.

However, while Dhaka might be less prone to large-scale natural flooding, urban flood risks due to infrastructural challenges remain a concern. This analysis underscores the varying degrees of flood risks across regions and emphasizes the need for region-specific flood mitigation and adaptation strategies to safeguard vulnerable communities and infrastructure.

7. Conclusion

FloodGuard: Integrating Rainfall Time Series and GIS Data for Flood Prediction and Waterbody Forecasting in Bangladesh” represents a significant step forward in addressing the perennial challenge of flooding in Bangladesh. Through meticulous data collection, analysis, and the application of machine learning, this project provides a prototype that can guide disaster managers in making informed decisions. The utilizationn of historical rainfall data and GIS layers has enabled a nuanced understanding of flood dynamics, informing the development of a robust flood prediction and susceptibility model with high accuracy

8. References

  • Hossain, M. S., & Zeyad, M. (2023). Prediction of Flood in Bangladesh Using Different Classifier Model. AIUB Journal of Science and Engineering (AJSE), 22(1), 45-52.
  • Kumar, V., Azamathulla, H. M., Sharma, K. V., Mehta, D. J., & Maharaj, K. T. (2023). The state of the art in deep learning applications, challenges, and future prospects: A comprehensive review of flood forecasting and management. Sustainability, 15(13), 10543.
  • Saikh, N. I., & Mondal, P. (2023). Gis-based machine learning algorithm for flood susceptibility analysis in the Pagla river basin, Eastern India. Natural Hazards Research.
  • Siam, Z. S., Hasan, R. T., Anik, S. S., Noor, F., Adnan, M. S. G., Rahman, R. M., & Dewan, A. (2022). National- scale flood risk assessment using GIS and remote sensing-based hybridized deep neural network and fuzzy analytic hierarchy process models: a case of Bangladesh. Geocarto International, 37(26), 12119-12148.
  • Uddin, K., & Matin, M. A. (2021). Potential flood hazard zonation and flood shelter suitability mapping for disaster risk mitigation in Bangladesh using geospatial technology. Progress in disaster science, 11, 100185.
Prepared by Zainab Akhtar and Kyaw Htet Paing Win.

Want to work with us?

If you want to discuss a project or workshop, schedule a demo call with us by visiting: https://form.jotform.com/230053261341340

Related Articles

media card
AI-Powered Wildfire Detection and Monitoring in Government
media card
Interactive Geospatial Mapping for Crime Prevention
media card
AI-Powered Crop Yield Prediction in Agriculture
media card
Vehicle Image Analysis and Insurance Fraud Prevention through EDA Techniques