Crop Yield Prediction Using Deep Neural Networks and LSTM

Crop Yield Prediction Using Deep Neural Networks and LSTM

Crop yield prediction using deep neural networks to increase food security in Senegal, Africa. The case study covers leveraging vegetation indices with land cover satellite images from Google Earth Engine and applying deep learning models combined with ground truth data from the IPAR dataset.

 

By Margaux Masson-Forsythe

 

As a part of the COVID-19: Data for a resilient Africa initiative with the UN Economic Commission for Africa, the Global Partnership brokered a collaboration between Omdena, and stakeholders in Senegal to support the use of AI in addressing data gaps on food security. The Global Partnership contributed to shaping the initiative’s objectives, while Omdena worked with technical teams in Senegal to provide support in developing data-driven tools.

 

Problem Statement

AI can be used in crop yield prediction using deep neural networks in order to assure food security by guiding the farmers, planning food storage and transport, and by helping policymakers focus on the most vulnerable communities. Yield prediction in developing countries can help prevent famine, support the local economy, and improve sustainable agricultural practices.

Senegal has had strong and stable economic growth in recent years. However, more than one-third of the population still lives below the poverty line, and 75 percent of families continue to struggle financially.

Agricultural success is essential to fight poverty and malnutrition, but with 70 percent of the crops in Senegal being rain-fed, the increase of droughts caused by climate change is a threat to essential crops and has a direct impact on the availability and prices of food.

Therefore, in this project, we studied a cheap approach using satellite imagery to predict crop yield. The main challenge we faced was the lack of ground truth data: surveys reporting the yields in Senegal.

 

Our Approach

 

Identify literature used for crop yield prediction using deep neural networks

The first step was to find research papers that could guide us to start this project efficiently considering the fact that we only had two months to implement a solution. The most interesting papers we found and used are

 

Get the data for the crop yield prediction

So the first question answered thanks to these papers was:

What data do we need to get crop yield prediction using a deep neural network?

Indeed, the authors used two types of raw data:

  • Remote sensing data downloaded with Google Earth Engine (GEE)
  • Ground truth crop yield data: we had yield data collected by IPAR for the production of maize, rice, and millet in 2014

So, we downloaded the datasets MOD09A1.006 Terra Surface Reflectance 8-Day Global 500m and MYD11A2.006 Aqua Land Surface Temperature and Emissivity 8-Day Global 1km for the regions and departments of Senegal using Shapefiles. The first dataset has 7 bands of surface spectral reflectance that can be used to calculate the Normalized Difference Vegetation Index (NDVI), an indicator of vegetation’s health. The NDVI is calculated from the red light (which vegetation absorbs) and near-infrared light (which vegetation strongly reflects) reflected by vegetation.

 

NDVI indices seen by GEE images

NDVI indices Source: NASA Illustration by Robert Simmon

 

The second dataset has two bands: temperature during the day and temperature during the night.

Which Crop Land cover should we use? What is the Crop Land Cover used for?

The Crop Land Cover dataset is used as a crop cover mask. This means that all pixels of the reflectance and temperature images that are not classified as cropland pixels will be removed from the images so that the model will only be exposed to data from the crops, and not from cities for example.

In the end, we used a different Land Cover dataset than the papers above. After comparing the MCD12Q1.006 MODIS Land Cover Type Yearly Global 500m that these papers used with the Copernicus Global Land Cover Layers, we decided to use the latter. This decision was taken after comparing the datasets with cropland maps of Senegal

 

locations of crops from the IPAR study (black=Millet, Blue=Rice, Purple=Rice Irrigated, Yellow=Maize) Source: Omdena

Locations of crops from the IPAR study (black=Millet, Blue=Rice, Purple=Rice Irrigated, Yellow=Maize) Source: Omdena

 

IRD http://www.cartographie.ird.fr/SenegalFIG/

Source: IRD http://www.cartographie.ird.fr/SenegalFIG/

 

Here, we see that the crops are mostly located in the South-West and the Northern regions of Senegal. However, when we look at the MCD12Q1.006 MODIS dataset where the cropland is labeled as brown, we see that a lot of the cropland is actually missing (Northern crops and most of the rice crops next to the Casamance river) but this is not the case with the Copernicus dataset, where the cropland is in labeled as pink. So we came to the conclusion that the Copernicus dataset was the most accurate for Senegal.

 

Senegal land cover seen by Copernicus Land Cover - GEE

Copernicus Land Cover 2015 VS MCD12Q1.006 MODIS Land Cover 2014 in Senegal — Source: Google Earth Engine

 

The only problem with the Copernicus dataset is the time range: 2015–2020, knowing that we had ground truth yield data for the year 2014. However, we assumed that the land cover for the year 2014 was close enough to the one from 2015 and still better than the 2014 MCD12Q1.006 MODIS dataset, so used the 2015 Copernicus land cover as a crop mask for images from 2014.

At the end of this step, we had collected data for the three GEE datasets presented above, for the entire country, the regions, departments, and GPS locations from the IPAR study.

 

Left to right: MYD11A2.006 Aqua Land Surface Temperature and Emissivity 8-Day Global 1km Senegal, MOD09A1.006 Terra Surface Reflectance 8-Day Global 500m Senegal, Copernicus Global Land Cover Layers Senegal — Source: Google Earth Engine

Left to right: MYD11A2.006 Aqua Land Surface Temperature and Emissivity 8-Day Global 1km Senegal, MOD09A1.006 Terra Surface Reflectance 8-Day Global 500m Senegal, Copernicus Global Land Cover Layers Senegal — Source: Google Earth Engine

 

Preprocessing of the data

According to the papers cited previously, using 3-D pixels count histograms instead of raw satellite images for the prediction of yield helps to avoid the model from overfitting (model too closely fit a limited set of data points).

 

First band of the masked images of reflectance (right) and temperature (left) and their corresponding 32-bins histograms (only showing first band) for the department Foundiougne in the Fatick Region of Senegal year 2015 — Source: Omdena

First band of the masked images of reflectance (right) and temperature (left) and their corresponding 32-bins histograms (only showing first band) for the department Foundiougne in the Fatick Region of Senegal year 2015 — Source: Omdena

 

What is the number of weeks mentioned above?

 

Crop calendar of Senegal — Source: FAO

Crop calendar of Senegal — Source: FAO

 

In order to focus on the growing season, we only studied the images of the weeks following the planting of the seeds to the harvesting, which is, for instance, week 19–30 for maize.

 

The deep learning model

We decided to use the Deep Learning architecture from this paper which is a CNN-LSTM:

“CNN can learn the relevant features from an image at different levels similar to a human brain. An LSTM has the capability of bridging long time lags between inputs over arbitrary time intervals. The use of LSTM improves the efficiency of depicting temporal patterns at various frequencies, which is a desirable feature in the analysis of crop growing cycles with different lengths.”

 

Architecture of the CNN-LSTM model used in modeling crop yield model

Architecture of the CNN-LSTM model used — Source: https://www.mdpi.com/1424-8220/19/20/4363/htm

 

We also tried to use the CNN architecture proposed in this paper but the results were not satisfying, the model was finding a random value for all data points that minimized the loss, and then did not learn or improve afterward even with different hyperparameters. Since we were having better results with the CNN-LSTM, we decided to only focus on the latter.

 

Example of CNN stuck at ~1.2 T/ha — Source: Omdena

Example of CNN stuck at ~1.2 T/ha — Source: Omdena

Transfer learning

We used transfer learning to improve the maize model. We had some yield data from South Sudan and Ethiopia (Source: deep-transfer-learning-crop-prediction) that we used to train the model and then fine-tuned it using the yield data from Senegal.

Data augmentation

We tried to do some data augmentation on the IPAR dataset by taking sliding windows around the point of origin (lat/long) and assuming the yield of the crops in these sliding windows was the same as at the point.

 

Figure 1 shows all the Points (lat/long) from the IPAR dataset for the Maize crops and the other three figures show the sliding windows around one of the Points (describing the data augmentation explained above) — Source: Omdena

Figure 1 shows all the Points (lat/long) from the IPAR dataset for the Maize crops and the other three figures show the sliding windows around one of the Points (describing the data augmentation explained above) — Source: Omdena

 

This method did not improve the maize and millet results but did improve the rice model.

Results

We ran several pieces of training with the different approaches explained previously and collected the resulting metrics:

 

Metrics of the prediction for several trainings and crop types (MSE in T/ha) — Source: Omdena

Metrics of the prediction for several trainings and crop types (MSE in T/ha) — Source: Omdena

 

We can see that the Transfer Learning for the maize model did improve the MSE (Mean square error) and therefore was our best MSE. In comparison, the millet model did not do as well as the maize model but we did not have any other data to perform transfer learning. Finally, the rice model could be improved using data from other countries from the same Github repository where you found the maize data. To be noted: the MSE is higher for the rice model because the yields of rice are higher than the maize and millet yields in the first place (up to 14 T/ha).

Here are some visualizations of the maize results:

 

CNN-LSTM results (Maize)  in Senegal Crop yield prediction— Source: Omdena

CNN-LSTM results (Maize) — Source: Omdena

 

CNN-LSTM results (Maize)  in Senegal crop yield predictions— Source: Omdena

CNN-LSTM results (Maize) — Source: Omdena

 

We also ran the predictions on every department of Senegal over 4 years (2015–2018) for maize, rice, and millet:

 

Yield prediction for three different types of crops over four years in each Senegalese department — Source: Omdena

Yield prediction for three different types of crops over four years in each Senegalese department — Source: Omdena

 

Yield prediction for three different types of crops over four years in each Senegalese department — Source: Omdena

Yield prediction for three different types of crops over four years in each Senegalese department — Source: Omdena

Final Product

We created an interactive notebook where the user can select the region they want to predict the yield. The user can also choose the year and crop type. While this notebook has several application areas, for instance, it can be used as a tool for policymakers to decide what food to import and export in order to maintain food security in the country. The tool can also help farmers make management and financial decisions. The following notebook is interactive and can be adapted to other countries.

 

Features settings used in Crop predictions - Source: Omdena

Features settings used in Crop predictions – Source: Omdena

 

Point of Interest on Google Earth engine for Yield predictions - Source: Omdena

Point of Interest on Google Earth engine for Yield predictions – Source: Omdena

 

Crop yield prediction histogram - Source: Omdena

Crop yield prediction histogram – Source: Omdena

 

After the user selects region on an interactive map, the notebook will then download the images for the region of interest, generate the 3D histograms, and use it as an input of the CNN-LSTM pre-trained model to predict the yield of the crop type selected — Source: Omdena

After the user selects region on an interactive map, the notebook will then download the images for the region of interest, generate the 3D histograms, and use it as an input of the CNN-LSTM pre-trained model to predict the yield of the crop type selected — Source: Omdena

 

In the example above, the estimated yield is 1.035 tons per hectare.

We also implemented another notebook that will take the GPS latitude and longitude as input instead of a selected area

 

GPS latitude and longitude as input instead of a selected area - Source: Omdena

GPS latitude and longitude as input instead of a selected area – Source: Omdena

 

Yield prediction with GPS latitude/longitude as input — Source: Omdena

Yield prediction with GPS latitude/longitude as input — Source: Omdena

Conclusion

To conclude, in these two months, we were able to implement a Deep Learning model that predicts crop yield in Senegal following this schema:

 

Summary of project’s structure — Source: Omdena

Summary of project’s structure — Source: Omdena

 

As mentioned in this article, To get crop yield prediction using deep neural networks. the lack of ground truth data was an issue that made the models not as efficient and accurate for Senegal as they could be. An improvement easily implementable would be to have ground-truth data like the 2014 IPAR dataset but for several years in order to be able to show the model the fluctuations over the years so that it could learn it and be more flexible to variations in the data.

I want to thank all the Omdena collaborators with who I have worked and learned so much in the past two months.

Getting to work on such a project with real-world data and so many collaborators from all around the world was a unique and amazing experience!

Neural Transfer Learning in NLP for Post-Traumatic-Stress-Disorder Assessment

Neural Transfer Learning in NLP for Post-Traumatic-Stress-Disorder Assessment

The main goal of the project was to research and prototype technology and techniques suitable to create an intelligent chatbot to mitigate/assess PTSD in low resource settings.

 

The Problem Statement

“The challenge is to build a chatbot where a user can answer some questions and the system will guide the person with a number of therapy and advice options.”

We were allocated to the ML modeling team of the challenge. Our initial scope was nailing the problem to the most relevant specific use case. After some iterations and consultations among the team, we decided to tackle among multiple possible avenues (e.g. conversational natural language algorithms, expert system, etc.) the problem with a risk a binary assessment classifier suggestion based on labeled DSM5 criteria. The working hypothesis was that the classifier could be used as a backend of a chatbot in a low resource device that could detect the risk and refer the user to more specialized information or as a screening mechanism (in a refugee camp, in a resource depleted health facility, etc.).

The frontend of the system would be a chatbot ( potentially conversational mixed with open-ended questions) and one of the classifiers would be a risk assessment based on the conversation.

The tool is strictly informational/educational and in no circumstances, the intent is to replace health practitioners.

Our team Psychologist guided the annotation process. After a couple of iterations in the process, we ended up on a streamlined process that allowed us to classify ~50 transcripts (each with the transcripts of conversations).

 

The Baseline

Baseline algorithm implementation by different team members demonstrated that without further data-preprocessing with traditional ML methods accuracy rate was around 75%. Given the fact that we had a serious category imbalance issue, this is definitely not a metric to consider. An article is in the works with the details of the baseline infrastructure and traditional ML techniques applied to text classification problems ( ).

 

The Data

The annotation team ended up having access to 1,700 transcripts of sessions. After careful inspection, the team realized that only around 48 transcripts were for actual PTSD issues.

Training Examples: #48 PTSD transcripts each with an average of 2k+ lines

Example of an excerpt of a transcript available in [3]:

 

 

Target Definition: No-Risk Detected-> 0 or Risk Detected: 1

 

 

From an NLP/ML problem taxonomy perspective, the number of datasets is extremely limited. So this problem would be classified as a few shots of classification problems [4].

Prior art on using these techniques when the data is limited prompted the team to explore the Transfer Learning avenue in NLP with recent encouraging results in a few shots training and data augmentation through back-translation techniques.

The picture below elucidates a pandas data frame resulted in an intense data munging process and target calculations ( based on DSM5 manual recommendations) and the amazing work of our annotation team:

 

 

 

The Solution

 

 

ULMFIT

The ULMFit algorithm was one of the initial techniques to provide effective neural transfer learning with success for the state of the art NLP benchmarks[1]

The algorithm and the paper introduce a myriad of techniques to improve the efficiency of RNNs training. We will delve below in the most fundamental ones.

The pre-assumption on modern transfer learning in NLP problems is that all the inputs of all the text will be transformed in numeric values based on word embeddings[8]. In that way, we ensure semantic representation and at the same time numeric inputs to feed the neural network architecture at hand.

From a context perspective. Traditional ML relies solely on the data that you have for the learning task while Transfer Learning trains on top of weights of neural networks (NLP) pre-trained on a large corpus (examples: Wikipedia, public domain books). Successes for transfer learning in NLP and Computer Vision are widespread in the last decade.

 

Copied from [5]

 

 

Transfer learning is a good candidate when you have few training examples and can leverage existing pre-trained powerful networks.

UMLFit works as shown by the diagram below:

 

Copied from [5]

 

 

  • Pre-trained Language Model (for example with Wikipedia data)
  • Data is fine-tuned with your corpus (not annotated)
  • A classifier layer is added to the end of your network.

A simple narrative for our case is the following: The model learns the general mechanics of the English language with the Wikipedia corpus. We specialize in the model with the available transcripts both annotated and not annotated and in the end, we are able to classify this model by chopping the sequence component final layer with a regular Softmax based classifier.

 

LSTM & AWD Regularization Technique

At the core of UMLFit implementation is a bidirectional LSTM’s and a technique called ASGD WD ( Average Stochastic Gradient Descent Weight Dropped).

LSTM ( Long Short Term Memory) networks are the basic block of state of the art deep learning approaches to solve Transfer Learning in NLP sequence 2 sequence prediction problems. A sequence prediction problem consists of predicting the next word given the previous text:

 

Copied from [6]

 

 

LSTM’s are ideal for language modeling and sequence prediction(increasingly being used in Time Series Forecasting as well ) because they maintain a memory of previous input elements. Each X element in our particular situation would be a token that would generate an output (sequence) and would be sent to the next block so it’s considered during the ht output calculation. Optimal weights will be backpropagated through the network-driven by the appropriate loss function.

One component of this regularization technique (WD) involves introducing dropouts on the weights of the hidden<->hidden states connections, which is quite unique compared with the drop out techniques.

 

Copied from [7]

 

 

Another component of the regularization is the Average Stochastic Gradient Descent, that basically instead of just including the current step it also takes into consideration the previous step and returns an average[7]. More details about the implementation can be found here.

A more detailed ULMFit Diagram can be seen below where the LSTM’s components are described with the different steps of the implementation of the algorithm:

 

Copied from [5]

 

 

General-Domain LM (Language Model) Pretraining

This is the initial phase of the algorithm where a Language model is pre-trained in powerful machines with a public corpus of data-set. The language model problem is very simple: given a phrase the set of probabilities of the next word (probably one of the most oblivious use of Deep Learning in our daily lives):

 

 

We will use for this problem, in particular, the available FastAI implementation of ULMFit to elucidate the process in practical terms:

In order to choose the ULMFit implementation in fastai, you will have to specify the language model AWD_LSTM as mentioned before.

 

language_model_learner(data_lm, AWD_LSTM, drop_mult=0.5)

 

The code above does a lot in the good style of using the libraries FastAI and sklearn is being used to produce a train and validation set and fastai is being used to instantiate and UMLFit language model learner.

 

Target task LM Fine-Tuning

On the code presented on LM model section, we basically instantiate a pre-trained ULMFit language model with the right configuration of the algorithm ( there are other options for language models TransformersXL + QNNs ):

 

from fastai import language_model_learner 
from fastai import TextLMDataBunch
from sklearn.model_selection import train_test_split
# split data into training and validation set
df_trn, df_val = train_test_split(final_dataset, 
stratify = df['label'], test_size = 0.3, random_state = 12)
data_lm = TextLMDataBunch.from_df(train_df = df_trn, 
valid_df = df_val, path = "")
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.5)

 

The (pseudo)/code above basically retrieves our training and validation datasets stratified and creates a language_model_learner based on our own model. The important detail of the language model is that it doesn’t need annotated data ( perfect of our situation with limited annotated data but a bigger corpus of non annotated transcripts). Basically, we are creating a language model for our own specialized domain on top of the huge general Wikipedia kind of language model.

 

language_model_learner.unfreeze()
language_model_learner.fit_one_cycle(1, 1e-2)

 

The code above basically unfreezes the pre-trained language model and executes one cycle of training on top of the new data with the specified learning rate.

Part of the process of ULMFit is applying discriminative learning rates through the different cycles of learning :

 

For a neural language model, the accuracy of around 30% is considered acceptable given the size of the corpus and possibilities [1].

 

After this point we are able to generate text from our very specific language model:

 

Excerpt from text generated from our language model.

 

At this point, we have a reasonable text generator for our specific context. The ultimate value of UMLFit is on the ability to transform a language model in a relatively powerful text classifier.

learn.save('ft')
learn.save_encoder('ft_enc')

The code above saves the model for further reuse.

 

Target Task Classifier:

The last step of the ULMFit algorithm is to replace the last component of the language model with a classifier softmax “head” and train on top of it the specific labeled data on our project. It means the PTSD annotated transcripts.

 

classifier = text_classifier_learner(data_clas, 
AWD_LSTM, drop_mult=0.5).to_fp16()
classifier.load_encoder('ft_enc')
classifier.fit_one_cycle(1, 1e-2)
#Unfreezing a train a bit more
classifier.unfreeze()
classifier.fit_one_cycle(3, slice(1e-4, 1e-2))

 

The same technique of discriminative learning rates was used above for the classifier with much better accuracy rates. Results on the classifier specifically were not the main goal for this article a subsequent article will delve into finetuning UMLFit comparison and addition of classifier specific metrics ranking and use of data augmentation techniques such us back-translation and different re-sampling techniques.

 

Initial Results of the UMLFit based classifier.

 
 
 
 
 
 

More About Omdena

 
 

Using Neural Networks to Predict Droughts, Floods and Conflict Displacements in Somalia

Using Neural Networks to Predict Droughts, Floods and Conflict Displacements in Somalia

 

The Problem

 

Millions of people are forced to leave their current area of residence or community due to resource shortage and natural disasters such as droughts, floods. Our project partner, UNHCR, provides assistance and protection for those who are forcibly displaced inside Somalia.

The goal of this challenge was to create a solution that quantifies the influence of climate change anomalies on forced displacement and/or violent conflict through satellite imaging analysis and neural networks for Somalia.

 

The Data 

The UNHCR Innovation team provided the displacement dataset, which contains:

Month End, Year Week, Current (Arrival) Region, Current (Arrival) District, Previous (Departure) Region, Previous (Departure) District, Reason, Current (Arrival) Priority Need, Number of Individuals. These internal displacements are weekly recorded since 2016.

While searching for how to extract the data we learned about NDVI (Normalized difference vegetation index), and NDWI (Normalized Difference Water Index).

Our focus was on finding a way to apply NDVI and NDWI on Satellite Imaging and Neural Networks to prevent Climate Change disasters.

Landsat (EarthExplorer) and MODIS, Hydrology (e.g. river levels, river discharge, an indication of floods/drought), Settlement/shelters GEO (GEO portal). These images have 13 bands and take up around 1GB of storage space per image.

Also, the National Environmental Satellite, Data, and Information Service (NESDIS) and National Oceanic and Atmospheric Administration (NOAA) offer very interesting data like Somalia Vegetation Health print screens taken from STAR — Global Vegetation Health Products.

 

 

 

By looking at the above picture points I figured that the Vegetation Health Index (VHI) could be having a correlation with people displacement.

 

We found an interesting chart, which captured my attention,

  • Go to STAR’s web page.
  • Click on Data type and select which kind of data you want
  • Check the following image

 

 

 

  •  Click on the region of interest and follow the steps below

 

 

 

 

VHI index’s weekly since 1984

 

 

STAR’s web page provides SMN, SMT, VCI, TCI, VHI index’s weekly since 1984 split in provinces.

SMN= Provincial mean NDVI with noise reduced
SMT=Provincial mean brightness Temperature with noice reduced
VCI = Vegetation cond index ( VCI <40 indicates moisture stress; VCI >60: favorable condition)
TCI= thermal condition Index (TCI <40 indicates thermal stress; TCI >60: favorable condition)
VHI =vegetation Health Index (VHI <40 indicates vegetation stress; VHI >60: favorable condition))

Drought vegetation

VHI<15 indicates drought from severe-to-exceptional intensity

VHI<35 indicates drought from moderate-to-exceptional intensity

VHI>65 indicates good vegetation condition

VHI>85 indicates very good vegetation condition

In order to derive insights from the findings, the following questions needed to be answered.

Does vegetation health correlate to displacements? And is there a lag between vegetation health and observed displacement? Below visualizations provide answers.

 

Correlation between Vegetation Health Index values of Shabeellaha Hoose and the number of individuals registered due to Conflict/Insecurity.

 

 

Correlation between the Number of Individuals from Hiiraan Displacements caused by flood and VHI data.

 

 

Correlation between the Number of Individuals from Sool Displacements caused by drought.

 

 

The Solution: Building the Neural Network

We developed a neural network that predicts the weekly VHI of Somalia using historical data as described above. You can find the model here.

The model produces a validation loss of 0.030 and training loss of 0.005, Below is the prediction of the neural network using test data.

 

Prediction versus the original value

 

 

 

More about Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

 

Using Unsupervised Learning on Satellite Images to Identify Climate Anomalies

Using Unsupervised Learning on Satellite Images to Identify Climate Anomalies

 

This work is a part of Omdena’s AI project with the United Nations High Commissioner for Refugees. The objective was to predict forced displacements and violent conflicts as a result of climate change and natural disasters in Somalia.

By Animesh Seemendra

Using unsupervised learning techniques on satellite images for capturing sudden environmental changes (after-effects of natural disasters or conflicts) to provide immediate relief to people affected. The solution functions as an alert system.

 

The problem

Somalia is a small country in the continent of Africa. The country exhibits a lot of natural disasters and terrorism as a result of which people of Somalia go through mass displacements leading towards a situation of lack of food and shelter.

This article shows how to build an anomaly detection system using Machine Learning. The system is capable of capturing sudden vegetation changes, which can be used as an alert mechanism to provide immediate relief to the people and communities in need.

 

 

What is Anomaly Detection?

Anomaly Detection System using satellite images is an area where a lot of research is happening to discover new and better methods.

We approached the problem using unsupervised learning technique i.e using Principal Component Analysis and K-Means. In the case of anomaly detection, unsupervised learning will take multi-temporal images to find changes in the images. Finally, the output map will have highlighted regions of change that could be used to send an alert to representatives at UNHCR if any major deviation occurs between two continuous temporal images.

 

Unsupervised Learning Climate Change

Fig 2: In 2017 Bomb Attack in Mogadishu (Somalia) Kills 276

 

The approach

First try: Convolutional Neural Networks

The first approach that I came up with was to use deep learning techniques, namely CNN+LSTM, where CNN could help extract relevant features from the images and LSTM could help to learn the sequential changes. This way our model could learn the changes that occur gradually and if any major changes such as natural disaster or conflict occurred in that area, the predicted value of our model and actual value would have the difference much greater than the normal value. This would signify that something major has happened to send an alert UNHCR.

As often in the real world, there was not enough data to apply deep learning Therefore we looked for an alternative.

The solution: Less shiny algorithms

The problem of anomaly detection could be solved with both supervised and unsupervised learning techniques. Since the data was not labeled we went with unsupervised learning techniques. Change detection can be solved using NDVI values, PCA analysis, Image difference methods, etc.

We went through some great methods for anomaly detection including a split based approach to unsupervised learning detection[1]. Comparing two images of the same geographical area at two different times pixel by pixel and then using some algorithms like thresholding algorithms, Bayes theory to generate change map[2]. After doing some research I finally went with the PCA + K-means technique [3] as some previous methods were either taking a lot of assumptions or were directly applied to raw data which could bring a lot of noise.

 

The data

For this project, we needed the satellite data of regions from Somalia. The images can be downloaded either from the earth explorer website or from Google Earth Engine API. You must ensure that the data downloaded has cloud coverage as minimal as possible. This is a common problem working with satellite images.

Unsupervised Learning Climate Change

Fig 3: EarthExplorer Image

 

 

The solution: Unsupervised Learning

 

Unsupervised Learning Climate Change

Fig 4: Satellite Image of an area from Somalia. Here you can see a lot of vegetation and greenery

 

Unsupervised Learning Climate Change

Fig 5: Satellite image of the same area at a different time. Here you can see that vegetation is less than in the previous image 4.

 

Calculating the difference between both images

Differences between the two greyscale images were calculated through pixel by pixel subtraction. The computed value will be such that the pixel of areas associated with the change will have a much larger difference than unchanged areas.

Xd = |X1 – X2| where Xd is the absolute difference of the two image intensities.

Unsupervised Learning Climate Change

Fig 6: The difference image of the bi-temporal images shown earlier.

 

Principal Component Analysis

The next step was to create an eigenvector space using PCA. The first step is converting your image into h X h non-overlapping blocks where h can be anything greater than 2. Let’s call these sets of vectors Y. Principal Component Analysis is used to correct for decorrelation caused by atmospheric noise or striping. PCA drops the outline component from the bands and which then can be then used to classify.

 

Creating a feature vector space

The next step was to create a feature vector space. A feature vector space was constructed for each pixel of the difference image by projecting the neighborhood of each pixel on eigenvector space. This was done by creating a h X h overlapping blocks in the neighborhood of each pixel to maintain contextual information. Now we have a clean and high variance set of vectors that can be used for classification.

Clustering

This step involves generating two clusters based on feature vector space by applying K Means. The two clusters will be one that will represent change and others that will represent change. These feature vector already carries the information whether they carry changed pixel or unchanged one. When there is a change between two images in a region, the assumption is that the values of the difference vector over that region will be higher than in other regions. Therefore K Means will partition the data into two clusters based on the distance between cluster average mean and pixel vector. Finally, the change map was constructed with higher values of pixels over regions of change.

 

Fig 7: The highlighted part depicts the difference between the two images. The image is flooded with white spots because there was a lot of loss of vegetation in the two images.

 

The highlighted areas could be further used to examine the extent of change that occurred in a continuous sequence of time and therefore could help UNHCR take necessary actions. Loss of vegetation to such an extent like fig 7 would happen only when sudden large conflicts or natural disasters will occur and thus creating an alarm.

 

Conclusion

In this project, we were able to develop an anomaly detection model using PCA and K Means which could highlight areas of change. The highlighted areas could be further used to examine the extent of change that occurred in a continuous sequence of time and therefore could help UNHCR take necessary actions. Loss of vegetation to such an extent like fig 7 would happen only when sudden large conflicts or natural disasters will occur and thus creating an alarm.

Since cloud coverage is a common problem while working with satellite images (bottom left region of the image), human intervention is required. Hence there is an area of improvement.

 

More about Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

 

Increasing Solar Adoption in the Developing World through Machine Learning and Image Segmentation

Increasing Solar Adoption in the Developing World through Machine Learning and Image Segmentation

 

The problem

How to Increase Solar Adoption in the developing world through Image Segmentation? Applied in India.

 

The solution

Step 1: Identification of the Algorithm: Image Segmentation

We initially started with the goal of increasing Solar Adoption using Image Segmentation algorithms from computer vision. The goal was to segment the image into roofs and non-roofs by identifying the edges of the roofs. Our first attempt was to use the Watershed image segmentation algorithm. The Watershed algorithm is especially useful when extracting touching or overlapping objects are in the images. The algorithm is very fast and computation inexpensive. In our case, the average computing time for one image was 0.08 sec.

Below are the results from the Watershed algorithm.

 
 
 
Images of rooftops in Delhi, India

Original image(left). The output from the Watershed model(right)

 

As you can see the output was not very good. Next, we implemented Canny Edge Detection. Like Watershed, this algorithm is also widely used in computer vision and tries to extract useful structural information from different visual objects. In the traditional Canny edge detection algorithm, there are two fixed global threshold values to filter out the false edges. However, as the image gets complex, different local areas will need very different threshold values to accurately find the real edges. So there is a technique called auto canny, where the lower and upper bound are automatically set. Below is the Python function for auto canny:

 

 

Snippet of the code for Image Segmentation

 

 
 

The average time taken by a Canny edge detector on one image is approx. 0.1 sec, which is very good. And the results were better than the Watershed algorithm, but still, the accuracy is not enough for practical use.

 
 
 
 
Image of Delhi rooftops for understanding the algorithm

The output from the Canny Edge detection algorithm

 

Both of the above techniques use Image Segmentation and work without understanding the context and content of the object we are trying to detect (i.e. rooftops). We may get better results when we train an algorithm with the objects (i.e. rooftops) looks like. Convolutional Neural Networks are state-of-the-art technology to understand the context and content of an image and are being used here to increase Solar Adoption Awareness using Image Segmentation technique.

As mentioned earlier, we want to segment the image into two parts — a rooftop or not a rooftop. This is a Semantic segmentation problem. Semantic segmentation attempts to partition the image into semantically meaningful parts and to classify each part into one of the predetermined classes.

 
 
 
 
Explaining what segmentation is using a basic example

Semantic Segmentation (picture taken from https://www.jeremyjordan.me/semantic-segmentation/)

 

In our case, each pixel of the image needs to be labeled as a part of the rooftop or not.

 
 
 
 
Differentiating image into two segments, Roof and Non roof part

We want to segment the image into two segments — roof and not roof(left) for a given input image(right).

 

Step 2: Generating the Training Data

To train a CNN model we need a dataset with rooftops satellite images with Indian buildings and their corresponding masks. There is no public dataset available for Indian buildings’ rooftops images with masks. So, we had to create our own dataset. A team of students tagged the images and created masked images (as below).

And here are the final outputs after masking.

 
 
 

Roof top satellite images converted into image segmented photo

 
 
 
 
 

Although the U-Net model is known to work with fewer images for data but to begin with, we had only like 20 images in our training set which is way below for any model to give results even for our U-Net. One of the most popular techniques to deal with less data is Data Augmentation. Through Data Augmentation we can generate more data images using the ones in our dataset by adding a few basic alterations in the original ones.

For example, in our case, any Rooftop Image when rotate by a few degrees or flipped either horizontally or vertically would act as a new rooftop image, given the rotation or flipping is in an exact manner, for both the roof images and their masks. We used the Keras Image Generator on already tagged images to create more images.

 
 
 
 
Augmenting Data

Data Augmentation

 

Step 3: Preprocessing input images

We tried to sharpen these images. We used two different sharpening filters — low/soft sharpening and high/strong sharpening. After sharpening we applied a Bilateral filter for noise reduction produced by sharpening. Below are some lines of Python code for sharpening

 
 
 
 
 
Code for Low Sharpening Filter

Low sharpening filter

 
 
 
 
 
 
Code for high sharpening filter

High sharpening filter

 

 

And below are the outputs.

 
 
 
 

Satellite view of buildings

 
 
 
 
 
 
 
 

Google Images

 
 
 
 
 
 

Step 4: Training and Validating the model

We generated training data of 445 images. Next, we chose to use U-Net architecture. U-net was initially used for Biomedical image segmentation, but because of the good results it was able to achieve, U-net is being applied in a variety of other tasks. is one of the best network architecture for image segmentation. In our first approach with the U-Net model, we chose to use RMSProp optimizer with a learning rate of 0.0001, Binary cross-entropy with Dice loss (implementation taken from here). We ran our training for 200 epochs and the average(last 5 epochs) training dice coefficient was .6750 and the validation dice coefficient was .7168

Here are the results of our first approach from the Validation set (40 images):

 
 
 
 
Predicted and Targeted Image

Predicted (left), Target (right)

 
 
 
 
 

Predicted (left), Target (right)

 

 

As you can see, in the predicted images there are some 3D traces of building structure in the middle and corners of the predicted mask. We have found out that this is due to the Dice loss. Next, we used Adam optimizer with a learning rate 1e-4 and a decay rate of 1e-6 instead of RMSProp. We used IoU loss instead of BCE+Dice loss and binary accuracy metric from Keras. The training was performed for 45 epochs. The Average(last 5 epochs) training accuracy was: 0.862 and the average validation accuracy was: 0.793. Below are some of the predicted masks on the Validation set from the second approach:

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

And here are the results form the test data:

 

 

Test data

 

 

 

Test data

 

 

More About Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

Stay in touch via our newsletter.

Be notified (a few times a month) about top-notch articles, new real-world projects, and events with our community of changemakers.

Sign up here