Omdena Spearheads Disaster Logistic Prediction Tool for Cyclones Commissioned by the World Food Programme

Omdena Spearheads Disaster Logistic Prediction Tool for Cyclones Commissioned by the World Food Programme

By Beth Seibel


Whether termed cyclone, typhoon or hurricane, these natural weather occurrences pack a serious punch and are responsible for approximately 10,000 deaths per year and, “in some cases, causing well over $100 billion in damage. There’s now evidence that the unnatural effects of human-caused global warming are already making hurricanes stronger and more destructive. The latest research shows the trend is likely to continue as long as the climate continues to warm (Berardelli, 2019).”

It is for these reasons that the World Food Programme teamed up with Omdena to more accurately predict the types and amount of aid required when disaster strikes. “Assisting almost 100 million people in around 83 countries each year, the World Food Programme (WFP) is the leading humanitarian organization saving lives and changing lives, delivering food assistance in emergencies and working with communities to improve nutrition and build resilience.”

Omdena gathered a team of 34 collaborators specializing in artificial intelligence and machine learning spanning 19 different countries for eight weeks with the goal of developing an AI data-driven way to help the WFP and other humanitarian organizations to know exactly what resources the people affected by cyclones (or any other disaster) will need and to expedite deployment as fast as possible. A priority on the team’s list, were answers to questions such as, how much food and water is required? What sort of shelters and how many are needed? What types and how much non-food essentials are appropriate? Before AI models could be developed, relevant data had to be gathered for this disaster response problem.

The team collected data from a variety of sources, such as NOAA, to determine affected populations and critical features of these populations such as income level, injuries, deaths, and more. Important factors were determined about cyclones including wind speed, total hours on land, damage factors, and whether the populations were rural versus urban. Below we see the correlation mapped based on income level and the number of people affected revealing populations most in need of assistance.



Understanding the attributes of the people affected by a disaster helps to reveal the types of aid required. So that the WFP and other aid organizations can determine what and how much relief to send with precision, the team used mathematical models to create a tool that calculates the needs of the people in the targeted disaster zones. The tool calculates how much food, non-food items, shelter, etc., the population should need for a determined number of days.

Relief Package


This exciting AI prototype can be used as the basis to assist disaster response organizations around the world to accurately customize aid resources to the specific needs of the people impacted. The team identified a more precise way to allocate aid in times of disaster. This will allow the World Food Programme and other organizations to respond to the needs of affected people faster and more efficiently than ever before thus reducing suffering and saving lives.

Find all details about the project here


Berardelli, J. (2019, July 8). How climate change is making hurricanes more dangerous. Yale Climate Connections. Retrieved June 7, 2020, from

World Food Programme Overview. (2020). Retrieved June 07, 2020, from


More About Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

| Demo Day Insights | AI for Disaster Response: World Food Programme Project

| Demo Day Insights | AI for Disaster Response: World Food Programme Project

Helping affected populations during a disaster most effectively through AI. A collaborative Omdena team of 34 AI experts and data scientists worked with the World Food Programme to build solutions to predict affected populations and create customized relief packages for disaster prevention.

The entire data analysis and details about the relief package tool including a live demonstration can be found in the demo day recording at the end of the article.


The problem: Quick disaster response

When a disaster strikes, the World Food Programme (WFP), as well as other humanitarian agencies, need to design comprehensive emergency operations. They need to know what to bring and in which quantity. How many shelters? How many tons of food? These needs assessments are conducted by humanitarian experts, based on the first information collected, their knowledge, and experience.

The project goal: Building a disaster relief package tool for cyclones (applicable to other use cases and disaster categories)


AI Disaster



Use Case: Cyclones (Solution applicable to other areas)

Tropical cyclones cost about 10,000 human lives a year. Many more are injured with homes and buildings destructed, which results in financial damage of several billions of USD. Due to changes in climate and extreme weather events, the impact is growing steadily.


AI Disaster

Long Beach after Hurricane Katrina. Estimated damage of 168 billion dollars (Source: Wikipedia).


The data

The Omdena team gathered data from several sources:

  • IBTrACS – Tropical cyclone data that provides climatological speed and directions of storms (National Oceanic and Atmospheric Administration)
  • EmDAT – Geographical, temporal, human, and economic information on disasters at the country level. (Université Catholique de Louvain)
  • Socio-Economic Factors from World Bank
  • The Gridded Population of the World (GPW) collection – Models the distribution of the human population (counts and densities) on a continuous global raster surface

Missing data was collected manually or partially automated by scraping from Wikipedia or cyclone reports.


Data exploration: Determining affected populations

All five data set were aggregated and included more than 1000 events and 45 features characterizing cyclones and affected populations.


AI Disaster Response

Data Exploration


AI Disaster

Impact Cyclones (Landing vs. No-landing)


Important correlation factors to determine affected populations:

  • Rural Population
  • Human Development Index
  • GDP per capita
  • Landing
  • Wind Speed
  • Exposed population
  • Total hours in Land
  • Total damage
  • Total deaths


The team mapped the correlation factors to determine which populations are most in need. As an example, below the income level is correlated with the number of people affected. Taking advantage of past data, the data model predicts affected populations.


AI Disaster Response

Predicting affected populations based on income level


The tool: Calculating relief packages

Once an affected population has been identified, humanitarian actors need to design comprehensive emergency operations including how much food and what type of food is needed. The project team built a food basket tool, which facilitates calculating the needs of affected populations. The tool looks for various different features such as days to be covered, the number of affected people, pregnancies, kids, etc.


AI Disaster Response

The relief package tool


The entire data analysis and details about the relief package tool including a live demonstration can be found in the video.



The team: Collaborators from 19 countries

This Omdena project hosted by the WFP Innovation Accelerator united 34 collaborators and changemakers across four continents. All team members worked together for two months on Omdena´s innovation platform to build AI solutions with the mission to improve disaster response. To learn more about the project check out our project page.  


AI Disasters

Omdena Collaborators


All changemakers: Ali El-Kassas, Alolaywi Ahmed Sami, Anel Nurkayeva, Arnab Saha, Beata Baczynska, Begoña Echavarren Sánchez, Chinmay Krishnan, Dev Bharti, Devika Bhatia, Erick Almaraz, Fabiana Castiblanco, Francis Onyango, Geethanjali Battula, Grivine Ochieng, Jeremiah Kamama, Joseph Itopa Abubakar, Juber Rahman, Krysztof Ausgustowski, Madhurya Shivaram, Onassis Nottage, Pratibha Gupta, Raghuram Nandepu, Rishab Balakrishnan, Rohit Nagotkar, Rosana de Oliveira Gomes, Sagar Devkate, Sijuade Oguntayyo, Susanne Brockmann, Tefy Lucky Rakotomahefa, Tiago Cunha Montenegro, Vamsi Krishna Gutta, Xavier Torres, Yousof Mardoukhi


More about Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.



Using Neural Networks to Predict Droughts, Floods and Conflict Displacements in Somalia

Using Neural Networks to Predict Droughts, Floods and Conflict Displacements in Somalia


The Problem


Millions of people are forced to leave their current area of residence or community due to resource shortage and natural disasters such as droughts, floods. Our project partner, UNHCR, provides assistance and protection for those who are forcibly displaced inside Somalia.

The goal of this challenge was to create a solution that quantifies the influence of climate change anomalies on forced displacement and/or violent conflict through satellite imaging analysis and neural networks for Somalia.


The Data 

The UNHCR Innovation team provided the displacement dataset, which contains:

Month End, Year Week, Current (Arrival) Region, Current (Arrival) District, Previous (Departure) Region, Previous (Departure) District, Reason, Current (Arrival) Priority Need, Number of Individuals. These internal displacements are weekly recorded since 2016.

While searching for how to extract the data we learned about NDVI (Normalized difference vegetation index), and NDWI (Normalized Difference Water Index).

Our focus was on finding a way to apply NDVI and NDWI on Satellite Imaging and Neural Networks to prevent Climate Change disasters.

Landsat (EarthExplorer) and MODIS, Hydrology (e.g. river levels, river discharge, an indication of floods/drought), Settlement/shelters GEO (GEO portal). These images have 13 bands and take up around 1GB of storage space per image.

Also, the National Environmental Satellite, Data, and Information Service (NESDIS) and National Oceanic and Atmospheric Administration (NOAA) offer very interesting data like Somalia Vegetation Health print screens taken from STAR — Global Vegetation Health Products.




By looking at the above picture points I figured that the Vegetation Health Index (VHI) could be having a correlation with people displacement.


We found an interesting chart, which captured my attention,

  • Go to STAR’s web page.
  • Click on Data type and select which kind of data you want
  • Check the following image




  •  Click on the region of interest and follow the steps below





VHI index’s weekly since 1984



STAR’s web page provides SMN, SMT, VCI, TCI, VHI index’s weekly since 1984 split in provinces.

SMN= Provincial mean NDVI with noise reduced
SMT=Provincial mean brightness Temperature with noice reduced
VCI = Vegetation cond index ( VCI <40 indicates moisture stress; VCI >60: favorable condition)
TCI= thermal condition Index (TCI <40 indicates thermal stress; TCI >60: favorable condition)
VHI =vegetation Health Index (VHI <40 indicates vegetation stress; VHI >60: favorable condition))

Drought vegetation

VHI<15 indicates drought from severe-to-exceptional intensity

VHI<35 indicates drought from moderate-to-exceptional intensity

VHI>65 indicates good vegetation condition

VHI>85 indicates very good vegetation condition

In order to derive insights from the findings, the following questions needed to be answered.

Does vegetation health correlate to displacements? And is there a lag between vegetation health and observed displacement? Below visualizations provide answers.


Correlation between Vegetation Health Index values of Shabeellaha Hoose and the number of individuals registered due to Conflict/Insecurity.



Correlation between the Number of Individuals from Hiiraan Displacements caused by flood and VHI data.



Correlation between the Number of Individuals from Sool Displacements caused by drought.



The Solution: Building the Neural Network

We developed a neural network that predicts the weekly VHI of Somalia using historical data as described above. You can find the model here.

The model produces a validation loss of 0.030 and training loss of 0.005, Below is the prediction of the neural network using test data.


Prediction versus the original value




More about Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.


Estimating Possible Undetected COVID-19 Infection Cases using Probability Analysis

Estimating Possible Undetected COVID-19 Infection Cases using Probability Analysis

Country-wide estimations for undetected Covid-19 cases and recommendations for enhancing testing facilities based on Probability Analysis

The Problem: Why estimating undetected Covid-19 cases is crucial?

An estimation of the undetected Covid-19 cases is important for authorities to plan economical policies, make decisions around different stages of lockdown, and to work towards the production of intensive care units.

As we have crossed a psychological mark of 1 million Covid-19 patients around the globe, more questions are popping up regarding the capabilities of our health care systems to contain the virus. One of the major worries is the systematic uncertainty in the number of citizens who have hosted the virus. The major contribution to this uncertainty, i.e. Probability Analysis, is possibly due to the small fraction of Covid-19 tests being performed.

The main test to confirm if someone has Covid-19, is to look for signs of the virus’s genetic material in the swab of their nose or throat. This is not yet available for most people. The healthcare workers are morally restricted to reserve the testing apparatus for seriously ill patients in the hospital.


The Solution


In this article, we will show a simple Bayesian approach, a part of Probability Analysis to estimate the undetected Covid-19 cases. The Bayes theorem can be written as:

P(A|B) = P(B|A) × P(A) / P(B)

where P(A) is the probability of event A, P(B) is the probability of event B, P(A|B) is the probability of observing event A if B is true, and P(B|A) is the probability of observing event B if A is true.

The quantity of interest for us is P(infected|notTested) i.e. the probability of infections that are not tested. This is equivalent to the percentage of the population infected by Covid-19 but not tested and we can write it as:

P(infected|notTested) = P(undetected|infected)×P(infected)/P(not tested)

Here the other probabilities are:

  • P(notTested|infected): Probability of tests not done on people that are infected or percentage of the population not tested but infected.
  • P(infected): Prior probability of infection or known percentage of the infected population.
  • P(not tested): Probability or percentage of people not tested.

The following plot shows the total Covid-19 tests per million people and the total number of confirmed cases per million people for several countries. This suggests a clear relation between the Covid-19 tests and confirmed positive detections.


Test per million vs Positive per million graph

Figure 1: Tests per million versus positive Covid-19 cases per million as of 20 March 2020 (data source).


Assuming that all countries follow this relation between the Covid-19 tests and confirmed cases, we can make a rough estimate of the number of undetected cases in each country by using Probability Analysis in every country.


Let’s take Australia as an example:

For example, the plot shows that prior knowledge of infected cases

P(infected) = 27.8/10⁶, and

P(not tested) = (10⁶ — 473)/10⁶.

To estimate the P(notTested|infected), I used the relation between the Covid-19 tests and confirmed cases as in the above Figure 1. This is done by fitting a power law of the form: y = a * x**b, where a is normalization, and b is the slope of this power law. The following plot shows a fit to the data points from the above plot, where the best fit a = 0.060±0.008 and b = 0.966±0.014.


Test per million vs positive per million graph 2

Figure 2: The relation between Covid-19 tests and confirmed cases and a power-law best fit.


Using the best fit parameters, P(notTested|infected) = (10⁶— 4473)/10⁶ / (a * (10⁶ — 4473)**b)/10⁶.

With probabilities 1, 2 and 3, I find P(infected|notTested) = 0.00073 per cent population of Australia. Multiplying this by the population of Australia indicates that there is a possibility of about 18,600 undetected Covid-19 cases in Australia (Probability Analysis report). The following plot shows possible undetected Covid-19 cases as a function of tests per million for different countries as of 20 March 2020.


Tests per million vs Undetected Covid-19 cases graph

Figure 3: Estimation of undetected Covid-19 cases (see assumptions in the text).


Note that several assumptions and considerations are made to estimate these undetected cases. For instance:

  • I assumed that all countries would follow the same power-law relation to estimating P(notTested|infected). However, this is not an extremely good assumption as there is huge scatter in this relation between different countries.
  • Our prior knowledge of the number of infections can be biased itself as P(infected) depends on the number of tests performed as of 20 March 2020.
  • I haven’t considered the susceptibility of a country’s populations to Covid-19, and the attack rate i.e. the biostatistical measure of the frequency of morbidity, which for Covid-19 is estimated around 50–80% (Verity et al. 2020).
  • The impact of government policies of these countries from 14 days before 20 March and 14 days after is not considered.
  • I haven’t considered how susceptible people are targeted for testing in different countries in the next days.

Figure 4 below shows the total number of confirmed cases versus the tests per million as of 5 April 2020 for several countries (data source).

After 16 days on 5 April, the confirmed positive cases in countries like Ukraine, India and Philipines are consistent with the predictions in Figure 3. These countries performed ≤ 10 tests per million people as of 20 March.

Note that the consistency between estimations as of 20 March and 5 April does not necessarily mean that all undetected cases as of 20 March are confirmed now. Several of the confirmed cases as of 5 April are expected to be new cases due to the spread between 20 March and 5 April (even in the presence of lockdowns).

The estimated undetected cases for countries like Colombia and South Africa are about twice as large (Figure 3) as compared to the total confirmed cases as of 5 April (i.e. about 1,500 for both). Both countries have performed about 100 tests per million people.

Countries like Taiwan, Australia, and Iceland, on the other hand, have shown an order of magnitude small number of confirmed cases as compared to estimated numbers in Figure 3.

This indicates that the countries that have not boosted their testing efficiency to more than 1,000 tests per million people have significantly larger uncertainties on the number of current confirmed cases.


Tests per million vs Total positive cases graph

Figure 4: The total number of confirmed cases versus the tests per million as of 5 April 2020.


Given the data in Figure 4 from 5 April 2020, I repeated the whole exercise again to estimate the undetected Covid-19 cases for these countries, cities, and states. The following figure shows the best fit power-law and data points similar to Figure 2 but for the data as of 5 April 2020.


Tests per million vs Positive per million graph

Figure 5: Best fit power law for data as of 5 April 2020.


The best-fit slope for the power-law relation in Figure 5 (b = 1.281±0.009) is consistent with the slope in Figure 2 at the 2-σ confidence level. This helps our assumption of estimating P(notTested|infected) from the best fit power-law relation (the slope is not changing), however, other caveats are the same as before.

Finally, the following plot shows the estimated undetected Covid-19 cases for different countries as of 5 April 2020.


Tests per million vs undetected covid-19 cases graph

Figure 6: Estimated Undetected Covid-19 cases as of 5 April 2020 (see assumptions in the text).


As the comparison between the undetected estimations as of 20 March (Figure 3) and confirmed cases as of 5 April (Figure 4) shows that more tests per million people are required to capture the possible undetected cases, thus now is the high time that authorities raise the testing efficiency in order to reduce the systematics from undetected Covid-19 cases. This seems to be the only good way to reduce the death rate of Covid-19 patients as indicated by a large amount of Covid-19 testing in Germany and South Korea.

To make this happen, all countries need at least one testing center within a radius of 20 Km and arrange more drive through testing facilities as soon as possible.



More About Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

Estimating Street Safeness after an Earthquake with Computer Vision And Route Planning

Estimating Street Safeness after an Earthquake with Computer Vision And Route Planning

Is it possible to estimate with minimum expert knowledge if your street will be safer than others when an earthquake occurs?


We answered how to estimate the safest route after an earthquake with computer vision and route management.


The problem

The last devastating earthquake in Turkey occurred in 1999 (>7 on the Richter scale) around 150–200 kilometers from Istanbul. Scientists believe that this time the earthquake will burst directly in the city and the magnitude is predicted to be similar.

The main motivation behind this AI project hosted by Impacthub Istanbul is to optimize the Aftermath Management of Earthquake with AI and route planning.


Children need their parents!

After kicking off the project and brainstorming with the hosts, collaborators, and the Omdena team about how to better prepare the city of Istanbul for an upcoming disaster, we spotted a problem quite simple but at the same time really important for families: get reunited ASAP in earthquake aftermath!

Our target was set to provide safe and fast route planning for families, considering not only time factors but also broken bridges, landing debris, and other obstacles usually found in these scenarios.


Fatih, one of the most popular and crowded districts in Istanbul. Source: Mapbox API



We resorted to working on two tasks: creating a risk heatmap that would depict how dangerous is a particular area on the map, and a path-finding algorithm providing the safest and shortest path from A to B. The latter algorithm would rely on the previous heatmap to estimate safeness.

Challenge started! Deep Learning for Earthquake management by the use of Computer Vision and Route Management.


Source: Unsplash @loic


By this time, we optimistically trusted in open data to successfully address our problem. However, we realized soon that data describing buildings quality, soil composition, as well as pre and post-disaster imagery, were complex to model, integrate, when possible to find.

Bridges over streets, buildings height, 1000 types of soil, and eventually, interaction among all of them… Too many factors to control! So we just focused on delivering something more approximated.


Computer Vision and Deep Learning is the answer for Earthquake management

The question was: how to accurately estimate street safeness during any Earthquake in Istanbul without such a myriad of data? What if we could roughly estimate path safeness by embracing distance-to-buildings as a safety proxy. The farther the buildings the safer the pathway.

For that crazy idea, firstly we needed buildings footprints laid on the map. Some people suggested borrowing buildings footprints from Open Street Map, one of the most popular open-source map providers. However, we noticed soon Open Street Map, though quite complete, has some blank areas in terms of buildings metadata which were relevant for our task. Footprints were also inaccurately laid out on the map sometimes.


Haznedar area (Istanbul). Source: Satellite image from Google Maps.


Haznedar area too, but few footprints are shown. Blue boxes depict building footprints. Source: OpenStreetMap.


A big problem regarding the occurrence of any Earthquake and their effects on the population, and we have Computer Vision here to the rescue! Using Deep Learning, we could rely on satellite imagery to detect and then, estimated closeness from pathways to them.

The next stone on the road was to obtain high-resolution imagery of Istanbul. With enough resolution to allow an ML model locates building footprints in the map as a standard-visually-agile human does. Likewise, we would also need some annotated footprints on these images so that our model can gracefully train.



First step: Building a detection model with PyTorch and


SpaceNet dataset covering the area for Rio de Janeiro. Source:


Instead of labeling hundreds of square meters manually, we trusted on SpaceNet (and in particular, images for Rio de Janeiro) as our annotated data provider. This dataset contains high-resolution satellite images and building footprints, nicely pre-processed and organized which were used in a recent competition.

The modeling phase was really smooth thanks to software.

We used a Dynamic Unit model with an ImageNet pre-trained resnet34 encoder as a starting point for our model. This state-of-the-art architecture uses by default many advanced deep learning techniques, such as a one-cycle learning schedule or AdamW optimizer.

All these fancy advances in just a few lines of code.


fastai fancy plot advising you about learning rates.


We set up a balanced combination of Focal Loss and Dice Loss, and accuracy and dice metrics as performance evaluators. After several frozen and unfrozen steps in our model, we came up with good-enough predictions for the next step.

For more information about working with geospatial data and tools with, please refer to [1].



Where is my high-res imagery? Collecting Istanbul imagery for prediction.

Finding high-resolution imagery was the key to our model and at the same time a humongous stone hindering our path to victory.

For the training stage, it was easy to elude the manual annotation and data collection process thanks to SpaceNet, yet during prediction, obtaining high-res imagery for Istanbul was the only way.


Mapbox sexy logo


Thankfully, we stumble upon Mapbox and its easy-peasy almost-free download API which provides high-res slippy map tiles all over the world, and with different zoom levels. Slippy map tiles are 256 × 256 pixel files, described by x, y, z coordinates, where x and y represent 2D coordinates in the Mercator projection, and z the zoom level applied on earth globe. We chose a zoom level equal to 18 where each pixel links to real 0.596 meters.


Slippy map tiles on the Mercator projection (zoom level 2). Source:


As they mentioned on their webpage, they have a generous free tier that allows you to download up to 750,000 raster tiles a month for free. Enough for us as we wanted to grab tiles for a couple of districts.


Slippy raster tile at zoom level 18 (Fatih, Istanbul).



Time to predict: Create a mosaic-like your favorite painter

Once all required tiles were stealing space from my Google Drive, it was time to switch on our deep learning model and generate prediction footprints for each tile.


Model’s prediction for some tile in Rio: sometimes predictions looked better than actual footprints.


Then, we geo-referenced the tiles by translating from the Mercator coordinates to the latitude-longitude tuple (that used by mighty explorers). Geo-referencing tiles was a required step to create our prediction piece of art with GDAL software.

Python snippet to translate from Mercator coordinates to latitude and longitude.

Concretely, thecommand allows us to glue tiles by using embedded geo-coordinates in TIFF images. After some math, and computing time… voilà! Our high-res prediction map for the district is ready.


Raw predictions overlaid on Fatih. From a lower degree of building presence confidence (blue) to higher (yellow).



Inverse distance heatmap

Ok, I see my house but should I go through this street?

Building detection was not enough for our task. We should determine distance from a given position in the map to the closest building around so that a person in this place could know how safe is going to be to cross this street. The larger the distance the safer, remember?

The path-finding team would overlay the heatmap below on his graph-based schema and by intersecting graph edges (streets) with heatmap pixels (user positions), they could calculate the average distance for each pixel on the edge and thus obtaining a safeness estimation for each street. This would be our input when finding the best A-B path.


Distance-to-buildings heatmap in meters. Each pixel represents the distance from each point to the closest building predicted by our model. Blue means danger, yellow-green safeness.


But how to produce this picture from the raw prediction map? Clue: computing distance pixel-building for each tile independently is sub-optimal (narrow view), whereas the same computation on the entire mosaic will render as extremely expensive (3.5M of pixels multiplied thousands of buildings).

Working directly on the mosaic with a sliding window was the answer. Thus, for each pixel (x,y), a square matrix composed by (x-pad, y-pad, x+pad, y+pad) pixels from the original plot is created. Pad indicates the window side length in the number of pixels.


Pixel-wise distance computation. Orange is the point, blue is the closest building around. Side length = 100 pixels.


If a pixel belongs to some building, it returns zero. If not, return the minimum euclidean distance from the center point to the building’s pixels. This process along with NumPy optimizations was the key to mitigate the quadratic complexity beneath this computation.

Repeat the process for each pixel and the safeness map comes up.


Distance heatmap overlaid on the satellite image. Blue means danger, yellow-green safeness.



More about Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

Detecting Wildfires Using CNN Model with 95% Accuracy

Detecting Wildfires Using CNN Model with 95% Accuracy

How wildfires detection company Sintecsys leveraged Omdena’s community to build a fire detection algorithm in two months using AI and a CNN Model.



The Problem: Wildfires and a Convolutional Neural Network

2019 was marked by very big fires. Not only the Notredame cathedral in Paris, and the National Museum in my country Brazil but entire complex ecosystems like the Amazon forest Wildfires and more recently in Australia. Before we dive into our finished product of, how to detect and stop wildfires early on with our community-build AI tool, let us understand how forest fires start.

  • Natural fires: Generally, natural fires are started by lightning, with a small portion originated by spontaneous combustion.
  • Human-caused fires: Humans cause fires in multiple ways such as smoking, recreation, soil preparation for agriculture, and so on. Man-caused fires represent the greatest percentual share of fires, but natural-caused fires represent larger burned land areas. This happens because the man-caused are detected earlier, while natural fires can take hours to be identified by the competent authorities.

Regardless of the causes, when a forest like in the Amazon starts to burn, the fire can spread and reach speeds of up to 23 km/h and reach temperatures of 800 °C (1470 °F) destroying plant and animal life within a few hours (sometimes even contributing to species extinction)

Even worse, fires damage the planet through CO2 that will contribute to global warming.

In addition to disrupting the climate, it impacts the sky and the quality of the air of a huge metropolitan city like São Paulo, the most important economical and productive center for my country.

At 3 pm, August 19th, 2019, a black sky appeared as a result of the meeting of a cold front with the fire particulates stemming from the Amazon and midwest fires in my country.

The day became night, and the feeling was that we were living in a biblical plague as described in the Old Testament. Really scary!



Sao Paulo's sky warning for wildfires

Pictures of São Paulo’s sky at 3 pm on August 19th, 2019


Among much misinformation, one post from NASA stood out by shedding the fundamental light of science on the matter.

In the image below, you see a colored high-resolution satellite image showing how the fire smokes spread to the southeast states of my country.


VIIRS image given by NASA

By NOAA/NASA’s Suomi NPP using the VIIRS (Visible Infrared Imaging Radiometer Suite) on August 20th, 2019


The Solution

Sintecsys´s growing customer base of clients on farms and forests can confirm. The company installs cameras on top of communication towers to capture images that are sent to a monitoring center. Once there is fire (or smoke) detected on images, it sends alerts and fire fighting actions. This saves lives and infrastructure costs.

Sintecsys is not alone in its mission as there are many other companies around the world dedicated to this mission also in a very successful way.

The company installed 50 towers distributed in Brazil (2019 data.

To extend the customer reach and scale their business model to thousands of cameras with the capability of accurately and quickly detecting wildfire outbreaks, Omdena’s AI capabilities come into play.

Omdena is a global platform where organizations collaborate with a diverse AI community to build solutions for real problems in a faster and more effective way.



#1 Scoping the problem

To tackle this problem, Omdena and Sintecsys agreed to deal with day images in their first joint challenge and in a second challenge improve the solution by dealing with night images.

The main difference between day and night images for fire detection is that during the day images usually show smoke and during the night these images show live fire. Both sunset and dawn, where smoke and live fire coexist on images, represent boundary conditions for the problem.

#2 Working on the dataset

The dataset was really big comprising footage and images from different cameras with and without fires outbreaks happening. Combining the original images given, our team had almost 7.600 images of 1920 x 1080 size (day images without fires outbreaks, day images with fires, and some night images (around 16%)) to start labeling.



Data set samples from sao paulo

Samples from the datasets



To add even more images, Gary Diana built an algorithm to successfully extract images from the footage and at the same time avoiding the generation of images with the same landscape among them (de-duplication). This initiative added another 1.150 images of 1280 x 720 size to our dataset.

#3 Labeling with Labelbox

Having the datasets prepared for labeling, we gathered around 20 people dedicated to the task, created the environments on Labelbox, which is the best tool available for computer vision by allowing labeling data, managing its quality and operating a production training data pipeline, and then, at last, we started to make tests and to label the final datasets.

I managed the task but I received huge support from Alyona Galyeva who helped the whole team not only by labeling but also by reviewing and managing everyone´s work.

In her own words:

It always starts with a mess when a group of people collaborates on a labeling project. In our case, Labelbox saved us a lot of time and effort by not allowing multiple users to label the same data. On top of that, it made our lives easier by proposing 4 roles: Labeler, Reviewer, Team Manager, and Admin. So, nobody was able to mess with data sources, data formats, and, of course, the labels made by other people.


Labelbox interface for labeling, managing and reviewing labels


Having both datasets labeled, the next train, validation, and test files were generated by the data pipeline team.

#4 Building the models

From the start, the team searched and studied several top-notch papers with different techniques that could be applied to solving the problem.

The challenge team created several teams in different tasks, each one focused on trying different approaches: mobile net, semantic segmentation, Convolutional Neural Networks (CNNs) — from simple architectures to more sophisticated ones.

Another great testimony of this step comes from Danielle Paes Barretto:

It was inspiring to see people eager to achieve great results. I tried to help in all tasks; from labeling the data to building CNN models and testing them on our dataset. We also had frequent discussions which in my opinion is one of the greatest ways of learning. All in all, it was an amazing opportunity to learn and to use my knowledge for the good while meeting great people!

In addition, different techniques were successfully applied to improve results like creating patches of different sizes on original images and training over patches, data augmentation (e.g. horizontal and vertical flipping), denoising images, etc.

#5 Results

The final solutions were able to reach a recall between 95% and 97% while having a false positive rate between 20% and 33%, which means that these solutions were extremely successful in catching 95% to 97% of the real fires outbreaks. While the challenge partner Sintecsys is extremely happy with the results, in our second challenge, we will improve the current models by adding night time images.

More about Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

Stay in touch via our newsletter.

Be notified (a few times a month) about top-notch articles, new real-world projects, and events with our community of changemakers.

Sign up here