Flood Risk Assessment Using Analytical Hierarchy Process (AHP) and Machine Learning Models

October 15, 2021

A step by step case study on using analytical hierarchy process (AHP) and machine learning models to help decision makers with flood risk assessment. The models quantify the risk and percentage as to how much damage or destruction has been done to assets like buildings and crops.

This project was hosted with impact-driven startup Finz.

Introduction

Natural disasters are regarded as the most pressing issue that must be addressed on a global, regional, and local level. Climate change may increase the frequency and magnitude of catastrophic events like floods, droughts, and wildfires.

Togo, Africa, is highly vulnerable to these natural calamities. Flooding and drought are common occurrences in the country, having negative socioeconomic consequences for the inhabitants, the environment, and the economy. Floods have been extremely devastating in recent years, wrecking infrastructure and destroying cultivated land.

While excessive rainfall is the primary cause of flooding, there are numerous other factors that contribute to flooding which includes deforestation, land degradation, rapid population growth, urbanization, poor land use planning and inadequate drainage and discharge management.

Figure 2: Monthly prediction of temperature and Precipitation [1]

Monitoring and predicting flood risk is critical in order to provide appropriate flood and environmental management solutions. Flood risk mapping is an important part of land use planning and mitigation techniques.

Analytical Hierarchy Process (AHP)

The Analytical Hierarchy Process (AHP) model has been built to identify and map areas of high flood risk in Togo. AHP is a multi-criteria decision-making method which integrates several features/conditioning factors like drainage density, slope, type of soil, precipitation, population density, Euclidean distance and land use to map flood risk. Vulnerability map and hazard map has been generated from various factors.

Hazard map

Drainage density (D): – Drainage density is the length of all channels within the basin divided by the area of the basin.
Drainage Density = Length of all channels / Area of basin
If the drainage network is dense at any area, it is a good indicator that the area is more likely to get flooded as it would have a high flow accumulation path.
Precipitation (Isohyet ): It is a major determining factor while creating hazard maps.
Slope – Slope is one of the important conditioning factors in floods. The danger from floods increases as the slope increases.
Soil type – The type of soil and the texture are very important factors in determining the infiltration and water holding capacity of an area which affects flood susceptibility. The runoff from intense rainfall is likely to be more rapid with clay soils than with sand.

Vulnerability map

Euclidean distance- Areas located close to the main channel and flow accumulation path are more likely to get flooded
Land use land cover: The amount and type of vegetation, which reflects its use, environment, cultivation, and seasonal phenology, is used to classify the landscape.
Population density (PD): Rapid population growth demands severe land use change/uncontrolled urbanization

Analytical Hierarchy Process uses hierarchical structures to represent a problem and then develop priorities for alternatives based on user judgement (Saaty, 1980). The process consists of the following steps:

Break down the problem into its component factors
Develop the hierarchy
Develop the paired comparison matrix based on subjective judgements
Calculate the relative weights of each criterion
Check consistency of subjective judgement

The AHP process is broadly divided into the following steps:

Data collection
Data pre-processing
AHP modelling

Figure 3: AHP Pipeline [2]

Data Collection

The following datasets were used to create the vulnerability and hazard maps.

Country boundary shape file is downloaded from DIVA GIS website [3]
Digital Elevation Model(DEM) is generated from Advanced Land Observing Satellite (ALOS) PULSAR, ALOS Global Digital Surface Model ” ALOS World 3D” which has a resolution of 30 m. [4]
Land Use Land Cover (LULC) is generated from the Copernicus Global Land Service website. [5]
Precipitation data is generated from University of California’s (UCI) Centre for Hydro-meteorology and Remote Sensing (CHRS) website. [6]
Population density data is generated from Facebook’s Data for Good program [7]
Soil map is downloaded from Food and Agriculture Organization of United Nations (FAO) website [8]
Stream network shape-file is downloaded from Stanford University [9]

Figure 4: Data collected from various sources [4] [9] [7] [6] [5]

Data Pre-processing

Data pre-processing includes:

Generating the layers from collected data: The layers are generated using QGIS/ArcGIS

The slope map is created from DEM and Euclidean distance and drainage density map is created from the river network.

Figure 5: Maps generated from collected data [2]

Reclassifying layers

The generated layers cannot be used for further analysis because they are defined in different units. They must be classified and converted to the same units.

AHP modelling

Creating hierarchy:

In AHP, there are different levels set up as a hierarchy:

Level 0: main objective, which in our case is the flood risk map
Level 1: Different Criteria which are hazard map and vulnerability map
Level 2: Elements (Parameters) to be considered in each criterion. We try to measure their influence on the criteria

Figure 6: AHP hierarchy [10]

All elements above for each criterion were determined based on literature.

Pairwise Comparison Matrix:

Generating pairwise comparison matrix and checking consistency ratio

For each criterion, a pairwise comparison matrix is created. The scores to be used in the matrix are based on the Saaty scale (Saaty 1980) as shown below:

Scale	Meaning
1	Equally important
3	Moderately important
5	Important
7	Very strongly important
9	Extremely important
2, 4, 6, 8	Intermediate values between adjacent scales

For every pair in the hazard comparison matrix, the better option is assigned between 1 (equally good) and 9 (better), while the other option is assigned the reciprocal of the value. For example, for the pair D (Row)-ST(Column), we assign a value of 3 while for the pair ST(Column)-R(Row), we assign a value of ⅓. Applying this operation for each pair gives the matrix:

	D	ST	S	P
D	1	3	1/3	1/5
ST	1/3	1	1/3	1/5
S	3	3	1	1/3
P	5	5	3	1

Please note the values used are based on literature.

Then, for each row the eigenvector V_p is determined using the formula below:

V_p = (W₁ X … X W_k)^1/k

V_p= eigenvector, W_k= element, k = number of elements

We then get the following:

	D	ST	S	P	V_p
D	1	3	1/3	1/5	0.67
ST	1/3	1	1/3	1/5	0.39
S	3	3	1	1/3	1.32
P	5	5	3	1	2.94

We then calculate the weighting coefficients C_p using the equation below:

C_p = V_p/ (V_p1 + …. + V_pk)

The sum of C_pof all the parameters must equal to 1. We then get the following:

	D	ST	S	P	V_p	C_p
D	1	3	1/3	1/5	0.67	0.13
ST	1/3	1	1/3	1/5	0.39	0.07
S	3	3	1	1/3	1.32	0.25
P	5	5	3	1	2.94	0.55
Sum					5.32	1

Check Consistency:

Now that we have our weights, we need to check if the weights are correct. In other words, we need to check whether the scores we assigned to the pairwise comparison matrix based on our subjective judgement are acceptable.

We create a matrix, let’s call it A3, by doing matrix multiplication on the pairwise matrix (which is a 4×4 matrix) and the weights matrix (which is a 4×1 matrix).We then create another matrix, call it A4, by dividing every value above by the corresponding weight. For example, for row ‘D’, we will divide 2.87/0.67. We get the following matrix:

D	4.29
ST	4.21
S	4.15
P	4.15

We then average the above values to get 4.1975. This value is known as the maximum Eigenvalue (ƛ_max).

We then calculate the consistency index (CI) using the formula:

CI = (ƛ_max – k)/(k-1), k=number of parameters

CI = (4.1975-4)/(4-1) = 0.066

We then determine the consistency ratio (CR) by the formula:

CR = CI/RI, RI = random index

The random index value is taken from the following table:

(Saaty, 1980)

Number of parameters	1	2	3	4	5	6	7	8	9	10
RI	0	0	0.58	0.9	1.12	1.24	1.32	1.41	1.45	1.49

CR = 0.066/0.9 = 0.073

If the value of the Consistency Ratio is less than or equal to 10%, the weights are acceptable. If the value is greater than 10%, we need to revise our subjective judgment.

Generating vulnerability and hazard map

AHP Hazard Map: Hazard is defined as a natural and man-made phenomenon that occurs with intensity that can cause harm due to a stream overflow.

The hazard map can identify all regions that are at risk of flooding. The spatial extent and possibly vulnerable locations to climatic threats that can induce floods are mapped by combining conditioning factors. Different weights are assigned to determine hazard.

We can calculate the hazard map using the formula:

Hazard index = 0.13*D + 0.07*ST + 0.25*S + 0.55*P

Figure 7: AHP workflow for generating hazard map [2]

AHP vulnerability map

Vulnerability represents the extent of expected repercussions of a natural phenomenon, while risk is the most important component of vulnerability because it decides whether or not someone is exposed to a hazard.

Flood vulnerability mapping is the process of determining a given area’s flooding susceptibility and exposure.

Using the same process, we used for generating the hazard map, we calculate the weights of the vulnerability map and use the given formula to get the vulnerability map:

Vulnerability index = 0.26*PD + 0.64*LULC + 0.1*ED

Figure 8: AHP workflow for generating vulnerability map [2]

Flood risk map

The flood risk map is a combination of hazard map and vulnerability map.

Flood risk = Hazard index * Vulnerability index

Figure 9: AHP workflow for generating flood risk map [2]

Figure 10: Hazard, vulnerability and flood risk map [2]

After creating flood risk maps, training and testing datasets have been generated by using stratified random sampling to build machine learning algorithms. The problem of class imbalance was rectified and outliers were removed. Automl library MLjar is used to find the best model. MLjar is a state-of-art automated machine learning library used to create an end-to-end machine learning pipeline.

Figure 11: Feature importance or heatmap for machine learning models [2]

The following machine learning models were built using MLjar library [11]:

Linear regression model
Decision tree
Random forest
XGboost
Neural network
Ensemble model

Model validation is done using the ROC curve. An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters:

True Positive Rate
False Positive Rate

Figure 12: ROC curves comparisons of Machine Learning Models [2]

Other flood factors to be considered might be: altitude, aspect, curvature, stream power index (SPI), topographic wetness index(TWI), sediment transport index(STI), topographic roughness index (TRI), geology, surface runoff. The confusion matrix is also generated which shows that the ensemble model and XGboost model performed best.

Figure 13: Confusion matrix for ensemble and Xgboost models [2]

This shows that the ensemble model performed best considering the time factor.

Comparison between AHP and Machine Learning models:

Figure 14: Flood risk map created using machine learning model [2]

Figure 15: Flood risk map created using AHP model [2]

As seen from the above diagrams, the machine learning model was able to classify high flood risk areas more accurately than the AHP model. We had few samples for moderate and high-risk categories, but still the model is at par with AHP technique. The map is smoother and is even more accurate using Machine Learning. We can predict risk with features half of that in the AHP.

Conclusion

While generating the hazard map, the team found out that precipitation is the most dominating factor. The hazard map shows how much the region is prone to floods. The regions in red indicate that they are more prone to flood because of high precipitation in that region. The regions in green are very less prone to floods because of low precipitation.

While generating vulnerability maps, land use land cover maps were given highest weight-age. When we combined hazard map and vulnerability map to generate flood risk map, the most dominating factors were precipitation, land use, land cover and population density.

The study shows that stringent action needs to be taken. There is a need for proper land use planning, drainage and discharge management is necessary in order to mitigate flood risk.

These risk maps can be further improved by adding more relevant information like flow accumulation and lithology etc. There is a wide spectrum of research opportunities available in which AHP modelling could be applied.

Alternatively, the AHP model could be used for target countries. And a machine learning model to create a risk score for the surrounding areas without having to create an AHP model. Using a machine learning model directly on the input of the AHP model reduces the computational steps to create a risk score. Expert Knowledge can be used to set up a regional AHP model to refine the scoring for the areas where the machine learning model is not estimating credible scores.

References

worldbank.org. 2021. World Bank Climate Change Knowledge Portal. [online] Available at: https://climateknowledgeportal.worldbank.org/country/togo/climate-data-historical [Accessed 26 September 2021].
diva-gis.org. 2021. Download data by country | DIVA-GIS. [online] Available at: http://diva-gis.org/gdata [Accessed 26 September 2021].
eorc.jaxa.jp/ALOS/ 2021 Advanced Land Observing Satellite[online] Available at: https://www.eorc.jaxa.jp/ALOS/en/aw3d30/index.htm [Accessed 26 September 2021].
land.copernicus.eu/global/products/lc 2021. Copernicus Global Land Service[online] Available at: https://land.copernicus.eu/en/products/global-dynamic-land-cover [Accessed 26 September 2021].
chrsdata.eng.uci.edu/ 2021. CHRS Data Portal Service[online] Available at: https://chrsdata.eng.uci.edu/ [Accessed 26 September 2021].
Data.humdata.org 2021 HDX Facebook data Service [online] Available at: https://data.humdata.org/ [Accessed 26 September 2021].
fao.org/ 2021 Food and Agriculture Organization of United Nations Service[online] Available at: http://www.fao.org/soils-portal/data-hub/soil-maps-and-databases/faounesco-soil-map-of-the-world/en/ [Accessed 26 September 2021].
maps.princeton.edu/ 2021 Princeton University Library Digital Maps and Geospatial Data [online] Available at: https://maps.princeton.edu/catalog/stanford-jr133wm5800 [Accessed 26 September 2021].
https://geoenvironmental-disasters.springeropen.com/articles/10.1186/s40677-016-0044-y
MLjar library: https://mljar.com/automl/