Flood Risk Assessment Using Analytical Hierarchy Process (AHP) and Machine Learning Models
October 15, 2021
A step by step case study on using analytical hierarchy process (AHP) and machine learning models to help decision makers with flood risk assessment. The models quantify the risk and percentage as to how much damage or destruction has been done to assets like buildings and crops.
This project was hosted with impact-driven startup Finz.
Introduction
Natural disasters are regarded as the most pressing issue that must be addressed on a global, regional, and local level. Climate change may increase the frequency and magnitude of catastrophic events like floods, droughts, and wildfires.
Togo, Africa, is highly vulnerable to these natural calamities. Flooding and drought are common occurrences in the country, having negative socioeconomic consequences for the inhabitants, the environment, and the economy. Floods have been extremely devastating in recent years, wrecking infrastructure and destroying cultivated land.
While excessive rainfall is the primary cause of flooding, there are numerous other factors that contribute to flooding which includes deforestation, land degradation, rapid population growth, urbanization, poor land use planning and inadequate drainage and discharge management.
Monitoring and predicting flood risk is critical in order to provide appropriate flood and environmental management solutions. Flood risk mapping is an important part of land use planning and mitigation techniques.Analytical Hierarchy Process (AHP)
The Analytical Hierarchy Process (AHP) model has been built to identify and map areas of high flood risk in Togo. AHP is a multi-criteria decision-making method which integrates several features/conditioning factors like drainage density, slope, type of soil, precipitation, population density, Euclidean distance and land use to map flood risk. Vulnerability map and hazard map has been generated from various factors.
Hazard map
- Drainage density (D): – Drainage density is the length of all channels within the basin divided by the area of the basin.
- Drainage Density = Length of all channels / Area of basin
- If the drainage network is dense at any area, it is a good indicator that the area is more likely to get flooded as it would have a high flow accumulation path.
- Precipitation (Isohyet ): It is a major determining factor while creating hazard maps.
- Slope – Slope is one of the important conditioning factors in floods. The danger from floods increases as the slope increases.
- Soil type – The type of soil and the texture are very important factors in determining the infiltration and water holding capacity of an area which affects flood susceptibility. The runoff from intense rainfall is likely to be more rapid with clay soils than with sand.
Vulnerability map
- Euclidean distance- Areas located close to the main channel and flow accumulation path are more likely to get flooded
- Land use land cover: The amount and type of vegetation, which reflects its use, environment, cultivation, and seasonal phenology, is used to classify the landscape.
- Population density (PD): Rapid population growth demands severe land use change/uncontrolled urbanization
Analytical Hierarchy Process uses hierarchical structures to represent a problem and then develop priorities for alternatives based on user judgement (Saaty, 1980). The process consists of the following steps:
- Break down the problem into its component factors
- Develop the hierarchy
- Develop the paired comparison matrix based on subjective judgements
- Calculate the relative weights of each criterion
- Check consistency of subjective judgement
The AHP process is broadly divided into the following steps:
- Data collection
- Data pre-processing
- AHP modelling
Data Collection
The following datasets were used to create the vulnerability and hazard maps.
- Country boundary shape file is downloaded from DIVA GIS website [3]
- Digital Elevation Model(DEM) is generated from Advanced Land Observing Satellite (ALOS) PULSAR, ALOS Global Digital Surface Model ” ALOS World 3D” which has a resolution of 30 m. [4]
- Land Use Land Cover (LULC) is generated from the Copernicus Global Land Service website. [5]
- Precipitation data is generated from University of California’s (UCI) Centre for Hydro-meteorology and Remote Sensing (CHRS) website. [6]
- Population density data is generated from Facebook’s Data for Good program [7]
- Soil map is downloaded from Food and Agriculture Organization of United Nations (FAO) website [8]
- Stream network shape-file is downloaded from Stanford University [9]
Data Pre-processing
Data pre-processing includes:
- Generating the layers from collected data: The layers are generated using QGIS/ArcGIS
The slope map is created from DEM and Euclidean distance and drainage density map is created from the river network.
- Reclassifying layers
The generated layers cannot be used for further analysis because they are defined in different units. They must be classified and converted to the same units.
AHP modelling
Creating hierarchy:
In AHP, there are different levels set up as a hierarchy:
- Level 0: main objective, which in our case is the flood risk map
- Level 1: Different Criteria which are hazard map and vulnerability map
- Level 2: Elements (Parameters) to be considered in each criterion. We try to measure their influence on the criteria
Pairwise Comparison Matrix:
- Generating pairwise comparison matrix and checking consistency ratio
For each criterion, a pairwise comparison matrix is created. The scores to be used in the matrix are based on the Saaty scale (Saaty 1980) as shown below:
Scale | Meaning |
1 | Equally important |
3 | Moderately important |
5 | Important |
7 | Very strongly important |
9 | Extremely important |
2, 4, 6, 8 | Intermediate values between adjacent scales |
For every pair in the hazard comparison matrix, the better option is assigned between 1 (equally good) and 9 (better), while the other option is assigned the reciprocal of the value. For example, for the pair D (Row)-ST(Column), we assign a value of 3 while for the pair ST(Column)-R(Row), we assign a value of ⅓. Applying this operation for each pair gives the matrix:
D | ST | S | P | |
D | 1 | 3 | 1/3 | 1/5 |
ST | 1/3 | 1 | 1/3 | 1/5 |
S | 3 | 3 | 1 | 1/3 |
P | 5 | 5 | 3 | 1 |
Please note the values used are based on literature.
Then, for each row the eigenvector Vp is determined using the formula below:
Vp = (W1 X … X Wk)1/k
Vp = eigenvector, Wk = element, k = number of elements
We then get the following:
D | ST | S | P | Vp | |
D | 1 | 3 | 1/3 | 1/5 | 0.67 |
ST | 1/3 | 1 | 1/3 | 1/5 | 0.39 |
S | 3 | 3 | 1 | 1/3 | 1.32 |
P | 5 | 5 | 3 | 1 | 2.94 |
We then calculate the weighting coefficients Cp using the equation below:
Cp = Vp / (Vp1 + …. + Vpk)
The sum of Cp of all the parameters must equal to 1. We then get the following:
D | ST | S | P | Vp | Cp | |
D | 1 | 3 | 1/3 | 1/5 | 0.67 | 0.13 |
ST | 1/3 | 1 | 1/3 | 1/5 | 0.39 | 0.07 |
S | 3 | 3 | 1 | 1/3 | 1.32 | 0.25 |
P | 5 | 5 | 3 | 1 | 2.94 | 0.55 |
Sum | 5.32 | 1 |
Check Consistency:
Now that we have our weights, we need to check if the weights are correct. In other words, we need to check whether the scores we assigned to the pairwise comparison matrix based on our subjective judgement are acceptable.
We create a matrix, let’s call it A3, by doing matrix multiplication on the pairwise matrix (which is a 4×4 matrix) and the weights matrix (which is a 4×1 matrix).We then create another matrix, call it A4, by dividing every value above by the corresponding weight. For example, for row ‘D’, we will divide 2.87/0.67. We get the following matrix:
D | 4.29 |
ST | 4.21 |
S | 4.15 |
P | 4.15 |
We then average the above values to get 4.1975. This value is known as the maximum Eigenvalue (ƛmax).
We then calculate the consistency index (CI) using the formula:
CI = (ƛmax – k)/(k-1), k=number of parameters
CI = (4.1975-4)/(4-1) = 0.066
We then determine the consistency ratio (CR) by the formula:
CR = CI/RI, RI = random index
The random index value is taken from the following table:
(Saaty, 1980)
Number of parameters | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
RI | 0 | 0 | 0.58 | 0.9 | 1.12 | 1.24 | 1.32 | 1.41 | 1.45 | 1.49 |
CR = 0.066/0.9 = 0.073
If the value of the Consistency Ratio is less than or equal to 10%, the weights are acceptable. If the value is greater than 10%, we need to revise our subjective judgment.
- Generating vulnerability and hazard map
AHP Hazard Map: Hazard is defined as a natural and man-made phenomenon that occurs with intensity that can cause harm due to a stream overflow.
The hazard map can identify all regions that are at risk of flooding. The spatial extent and possibly vulnerable locations to climatic threats that can induce floods are mapped by combining conditioning factors. Different weights are assigned to determine hazard.
We can calculate the hazard map using the formula:
Hazard index = 0.13*D + 0.07*ST + 0.25*S + 0.55*P
AHP vulnerability map
Vulnerability represents the extent of expected repercussions of a natural phenomenon, while risk is the most important component of vulnerability because it decides whether or not someone is exposed to a hazard.
Flood vulnerability mapping is the process of determining a given area’s flooding susceptibility and exposure.
Using the same process, we used for generating the hazard map, we calculate the weights of the vulnerability map and use the given formula to get the vulnerability map:
Vulnerability index = 0.26*PD + 0.64*LULC + 0.1*ED
Flood risk map
The flood risk map is a combination of hazard map and vulnerability map.
Flood risk = Hazard index * Vulnerability index
After creating flood risk maps, training and testing datasets have been generated by using stratified random sampling to build machine learning algorithms. The problem of class imbalance was rectified and outliers were removed. Automl library MLjar is used to find the best model. MLjar is a state-of-art automated machine learning library used to create an end-to-end machine learning pipeline. The following machine learning models were built using MLjar library [11]:- Linear regression model
- Decision tree
- Random forest
- XGboost
- Neural network
- Ensemble model
Model validation is done using the ROC curve. An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters:
- True Positive Rate
- False Positive Rate
This shows that the ensemble model performed best considering the time factor.
Comparison between AHP and Machine Learning models:
As seen from the above diagrams, the machine learning model was able to classify high flood risk areas more accurately than the AHP model. We had few samples for moderate and high-risk categories, but still the model is at par with AHP technique. The map is smoother and is even more accurate using Machine Learning. We can predict risk with features half of that in the AHP.Conclusion
While generating the hazard map, the team found out that precipitation is the most dominating factor. The hazard map shows how much the region is prone to floods. The regions in red indicate that they are more prone to flood because of high precipitation in that region. The regions in green are very less prone to floods because of low precipitation.
While generating vulnerability maps, land use land cover maps were given highest weight-age. When we combined hazard map and vulnerability map to generate flood risk map, the most dominating factors were precipitation, land use, land cover and population density.
The study shows that stringent action needs to be taken. There is a need for proper land use planning, drainage and discharge management is necessary in order to mitigate flood risk.
These risk maps can be further improved by adding more relevant information like flow accumulation and lithology etc. There is a wide spectrum of research opportunities available in which AHP modelling could be applied.
Alternatively, the AHP model could be used for target countries. And a machine learning model to create a risk score for the surrounding areas without having to create an AHP model. Using a machine learning model directly on the input of the AHP model reduces the computational steps to create a risk score. Expert Knowledge can be used to set up a regional AHP model to refine the scoring for the areas where the machine learning model is not estimating credible scores.
References
- worldbank.org. 2021. World Bank Climate Change Knowledge Portal. [online] Available at: https://climateknowledgeportal.worldbank.org/country/togo/climate-data-historical [Accessed 26 September 2021].
- diva-gis.org. 2021. Download data by country | DIVA-GIS. [online] Available at: http://diva-gis.org/gdata [Accessed 26 September 2021].
- eorc.jaxa.jp/ALOS/ 2021 Advanced Land Observing Satellite[online] Available at: https://www.eorc.jaxa.jp/ALOS/en/aw3d30/index.htm [Accessed 26 September 2021].
- land.copernicus.eu/global/products/lc 2021. Copernicus Global Land Service[online] Available at: https://land.copernicus.eu/en/products/global-dynamic-land-cover [Accessed 26 September 2021].
- chrsdata.eng.uci.edu/ 2021. CHRS Data Portal Service[online] Available at: https://chrsdata.eng.uci.edu/ [Accessed 26 September 2021].
- Data.humdata.org 2021 HDX Facebook data Service [online] Available at: https://data.humdata.org/ [Accessed 26 September 2021].
- fao.org/ 2021 Food and Agriculture Organization of United Nations Service[online] Available at: http://www.fao.org/soils-portal/data-hub/soil-maps-and-databases/faounesco-soil-map-of-the-world/en/ [Accessed 26 September 2021].
- maps.princeton.edu/ 2021 Princeton University Library Digital Maps and Geospatial Data [online] Available at: https://maps.princeton.edu/catalog/stanford-jr133wm5800 [Accessed 26 September 2021].
- https://geoenvironmental-disasters.springeropen.com/articles/10.1186/s40677-016-0044-y
- MLjar library: https://mljar.com/automl/