The Artificial Intelligence Imperative: Early Detection of Wildfires
Founded in 2016, Sintecsys is a young and growing Brazilian company with a passion for its customers, communities, and the climate. Its real-time fire outbreak detection and management system protect forests and agricultural land in 7 Brazilian states and 4 biomes from fire to help companies reduce their asset losses, protect local people from smoke-related respiratory illnesses, and reduce CO2 emissions that lead to global warming.
The company was in the midst of preparing for its next stage of growth and wanted to enhance its solution’s detection capability with artificial intelligence to scale its deployment and customer base.
Innovation Through Collaboration
Early and accurate detection of fires was a core system capability and therefore a key-value feature requiring a highly-accurate artificial intelligence model to detect wildfires. The upgraded system’s speed to market was also critical from a competitive and financial perspective, so the development process needed to run quickly, efficiently, and meet deadlines. Finally, the Syntecsys internal technical team was in a state of transition as part of its overall company transformation, so effective team communication or coordination throughout the project was also a key success factor.
To best address these challenges and the exploratory nature of the project, Omdena used its 8-week Challenge approach and assembled a diverse 47-member team of skilled and motivated artificial intelligence expert collaborators from 22 countries. Together with the Sintecsys team, the collaborators first determined the problem to solve and deliverable: build a model that can detect smoke from wildfires outbreaks using daytime camera footage from Sintecsys towers, with nighttime detection being saved for a later development phase. The full team researched potential modeling techniques and organized themselves into several task groups, each led by a task manager and focused on testing a promising approach like mobile net, semantic segmentation, and Convolutional Neural Networks. About 20 collaborators also formed a task group to create two training datasets by labeling nearly 9,000 images that could be used by any of these modeling techniques.
The team had to solve two key data challenges affecting the model’s ability to accurately identify smoke. First, they used the label smoothing techniques to help classify real smoke and ‘smoke-like’ anomalies such as camera glare, fog, clouds, and smoke released from boilers. Second, the team applied an upsampling technique to help the model overcome the low quality of many of the images. After multiple rounds of model testing and validation, the final model was able to detect smoke images an impressive 95% to 97% of the time with a false positive rate of only 10% to 33%.
The team used bottom-up collaboration to create a supportive and open environment that optimized innovation and speed. Team members organized themselves into groups that took leadership roles based on their qualifications and learning goals. Activities and communication were guided by an Agile approach with daily discussions between team members to explore multiple techniques simultaneously. Weekly calls were held with the Sintecsys team to report progress and solicit feedback to ensure a ‘best-fit’ solution.
AI’s Impact On Profit and Purpose
Omdena’s artificial intelligence solution contributes to an increase in the overall value of its fire detection and management system to customers which also amplifies its positive impact on people and the planet.
Integration of the automated, high-quality artificial intelligence model helps reduce detection time drastically while resulting in 90% fewer lost acres of crops and trees leading to lower CO2 emissions. Less smoke in the air means better respiratory health for farmers and local inhabitants. The model’s low false-positive rate reduces false alarms, translating into less time and money spent by customers, farmers, and fire brigades to investigate potential fires. A prototype system has been launched with one client. Full deployment to all clients is planned for end-2020/early-2021.
Sintecsys also plans to replicate this model in other countries and investigate with Omdena potential enhancements like detecting fire in nighttime images, incorporating satellite imagery, and predicting the most likely areas where fires will start.
Says Osmar Bambini, Sintecsys Head of Innovation, “Speed, accuracy, and power sum up my perception of Omdena. For Sintecsys, from now on Omdena is our official artificial intelligence partner.”
By Dev Bharti, Juber Rahman, Xavier Torres, and Rosana de Oliveira Gomes
AI technology can be used to predict the number and type of food and non-food items during a disaster. In this article, the focus lies on cyclones, but the applications can be applied in other disaster types.
On May 20, 2020, Cyclone Amphan struck the eastern Indian city of Kolkata and killed at least 84 people across India and Bangladesh. It lashed coastal areas with ferocious wind and rain. Thousands of trees were uprooted in the gales; electricity
and telephone lines were brought down and houses were flattened. Many of Kolkata’s roads were flooded and its 14 million people were left without power.
The United Nations World Food Programme (WFP) is the world’s largest humanitarian organization addressing hunger and promoting food security. WFP is usually one of the first humanitarian agencies to arrive and provide support to the locals when a disaster such as Cyclone Amphan takes place. The below infographic depicts its analysis of Cyclone Amphan’s effects.
The question is how many resources to mobilize for a specific disaster? One of the most challenging tasks is to predict how many food and non-food items – such as blankets, hygiene items, and tools – are needed. According to the Needs Assessment Working Group (NAWG) in Bangladesh, more than 14.2 million people were in the path of the cyclone Amphan, of which 7.2 million were women and 1.4 million children.
While this information was helpful, it did not specifically inform about how many of the exposed people would actually be affected. This was a complex problem: there are many factors other than cyclone-specific properties that determine how many exposed people will end up affected and will need emergency food and non-food supplies. A poor estimation might lead to food shortage and humanitarian crisis in the affected area. Whereas an overestimation could lead to a waste of resources that could be used for support in other locations.
In March 2020, Omdena launched a project (hosted by the WFP Innovation Accelerator) that united 34 Data Science collaborators and changemakers across four continents. All team members worked together for two months on Omdena´s innovation platform to build AI solutions to improve disaster response, by estimating affected populations and the associated relief packages for forthcoming cyclone emergencies.
The AI challenge was broken down into the following phases:
Understanding the business need
The collaborators from Omdena, along with the WFP team, discussed the problem in order to understand its socio-economic impact. Following the bottom-up approach of Omdena challenges, the project was entirely virtual without a single leader. Instead, various approaches and ideas were discussed and within the first week, task groups were created with task managers responsible to ensure progress.
Collecting the data
Data collection was the longest and most difficult task, occupying 75 percent of the project. This brought to light the real issue that technology faces today: a lack of open-source data, as well as data in a suitable format for the use of AI models. Bearing these limitations in mind, the project focused on acquiring data related to cyclone emergencies from publicly available sources, including:
IBTrACS – Tropical cyclone data that provides climatological speed and directions of storms (National Oceanic and Atmospheric Administration).
EmDAT – Geographical, temporal, human, and economic information on disasters at the country level (Université Catholique de Louvain).
World Bank – Socio-economic indicators to gauge the impact of cyclones such as GDP per capita and rural population.
The Gridded Population of the World (GPW) collection – Models the distribution of the human population (counts and densities) on a continuous global raster surface.
WFP Manual – Guidelines for disaster management from WFP. This document was used as the basis to prepare a list of food and non-food items, along with the required quantities following the minimum nutrition guideline by the World Health Organisation (WHO), taking into account vulnerable groups such as pregnant women, lactating mothers and the weather in the affected area. For instance, one bathing soap per person (non-food) and 2,100 Kcal of energy per person per day (food).
Python scripts were written to perform data analysis, data aggregation, and web scraping to fill in any missing values – along with some manual work to gather data that was difficult to automate (eg old PDF documents with scanned pictures).
This provided two files as outputs, the first being a cyclone data file with over 1,000 events and about 45 features that characterize the cyclone events, along with the target field to be predicted by the AI models (affected population). The second output was a food and non-food items file, which was later used to develop a mathematical model for final estimation of the number of relief commodities, given an affected population (from Phase i).
Figure 2: Cyclone Pabuk route
Understanding the data: Exploratory Data Analysis (EDA)
Once the data was ready, the next task focused on finding features that have the most impact on the AI models that have been developed. The team used several machine learning (ML) methods, such as clustering and random forest, to build the final list of relevant features for the problem. Some of these were:
Cyclone related features: wind speed, pressure, basin (location of formation), hours in land, total hours of the event, category level.
Socio-economic related features: Human Development Index, GDP per capita, % of rural area, the income level at the time of the event.
Features regarding total population under the cyclone path, taking into account maximum wind speed knots and using the aforementioned continuous global raster surface (see the example of cyclone Pabuk in Figure 2).
These results are summarised in Figure 3 below.
Building the AI and machine learning models
The core aim of the AI model was to predict the number of food and non-food items needed when a new cyclone or disaster strikes. From a technical perspective, the teams split the problem into two parts. First, using machine learning regression techniques based on the most relevant features, the model predicted the affected population. The second part was to use the affected population number as the basis for estimating food and non-food items in a linear fashion, following the WFP guidelines.
The team trialed the following ML regression approaches for the first part:
Support vector machines
Extreme gradient boosting
The performance metrics showed that the models perform better when estimating a smaller affected population, equivalent to about 150,000 people, which is compatible with the affected population of tropical storms. Such results stem from the fact that this analysis had a majority of tropical storms in the data set, which are the most common kinds of cyclone events. Efforts in improving the performance of models for a better description of higher damage events include collecting more data for such extreme events.
Deploying the models
The team used the StreamLit open-source application framework to build a user-centric application, based on its AI model, which determines the relief package when given a cyclone emergency. The application receives cyclone-specific data from the user to generate an estimation of food and non-food items necessary for the emergency. The application goes a step further into refining the model, once correct information (ie the affected population) has been fed into the model. The output of StreamLit is fed into a Python GUI application, built on top of a mathematical model, to estimate the quantity of relief for food and non-food items.
Figure 4: Presented at the Omdena demo day
A demonstration of the relief package tool can be found here.
The documentation of the model ensures that it can be enhanced and deployed for the betterment of society. The developers also hope that AI can be deployed more on social and humanitarian initiatives such as these.
Whenever a disaster takes place, humanitarian agencies need to act fast in order to provide support to the affected populations. In such emergencies, the right information to the right person or organization is crucial to saving lives. The tool developed in this work has unfolded the potential of data-driven decision-making for humanitarian response, particularly by showing how AI can make disaster relief operations more efficient. This project focused on cyclones in particular, but a similar approach could be applied to many other natural disasters, such as earthquakes, tsunamis, and floods.
Natural disasters have a high degree of unpredictability, which is one of the reasons they are so deadly. Let’s say a cyclone is spotted at Madagascar, one of the most cyclone-prone countries in Africa, and also one of the countries provided with WFP assistance. Cyclones are tracked via satellite images and their paths are also forecasted by several institutions that provide open-source information.
By using the tool developed in this project, humanitarian agents from organizations such as WFP would only need to add the cyclone tracking information, such as wind speeds and pressure, as an input to the tool, alongside the number of days to be covered. The model would then cross this information with socio-economic data about Madagascar, including demographic information, such as the number of children and pregnant women, providing the most suitable relief package for this specific location.
The relief package includes both food and non-food items (NFI) to be delivered, estimated using WHO guidelines for nutrition, and obtained from a sophisticated model for the determination of affected populations using AI techniques. The use of the application would take a matter of minutes, having the potential of improving the efficiency of logistics and deployment phases of the operations. Moreover, since this relief package design is modeled based on data from previous cyclones, further updating the model with new data can improve the accuracy for future disaster management operations.
Whether termed cyclone, typhoon or hurricane, these natural weather occurrences pack a serious punch and are responsible for approximately 10,000 deaths per year and, “in some cases, causing well over $100 billion in damage. There’s now evidence that the unnatural effects of human-caused global warming are already making hurricanes stronger and more destructive. The latest research shows the trend is likely to continue as long as the climate continues to warm (Berardelli, 2019).”
It is for these reasons that the World Food Programme teamed up with Omdena to more accurately predict the types and amount of aid required when disaster strikes. “Assisting almost 100 million people in around 83 countries each year, the World Food Programme (WFP) is the leading humanitarian organization saving lives and changing lives, delivering food assistance in emergencies and working with communities to improve nutrition and build resilience.”
Omdena gathered a team of 34 collaborators specializing in artificial intelligence and machine learning spanning 19 different countries for eight weeks with the goal of developing an AI data-driven way to help the WFP and other humanitarian organizations to know exactly what resources the people affected by cyclones (or any other disaster) will need and to expedite deployment as fast as possible. A priority on the team’s list, were answers to questions such as, how much food and water is required? What sort of shelters and how many are needed? What types and how much non-food essentials are appropriate? Before AI models could be developed, relevant data had to be gathered for this disaster response problem.
The team collected data from a variety of sources, such as NOAA, to determine affected populations and critical features of these populations such as income level, injuries, deaths, and more. Important factors were determined about cyclones including wind speed, total hours on land, damage factors, and whether the populations were rural versus urban. Below we see the correlation mapped based on income level and the number of people affected revealing populations most in need of assistance.
Understanding the attributes of the people affected by a disaster helps to reveal the types of aid required. So that the WFP and other aid organizations can determine what and how much relief to send with precision, the team used mathematical models to create a tool that calculates the needs of the people in the targeted disaster zones. The tool calculates how much food, non-food items, shelter, etc., the population should need for a determined number of days.
This exciting AI prototype can be used as the basis to assist disaster response organizations around the world to accurately customize aid resources to the specific needs of the people impacted. The team identified a more precise way to allocate aid in times of disaster. This will allow the World Food Programme and other organizations to respond to the needs of affected people faster and more efficiently than ever before thus reducing suffering and saving lives.
Helping affected populations during a disaster most effectively through AI. A collaborative Omdena team of 34 AI experts and data scientists worked with the World Food Programme to build solutions to predict affected populations and create customized relief packages for disaster prevention.
The entire data analysis and details about the relief package tool including a live demonstration can be found in the demo day recording at the end of the article.
The problem: Quick disaster response
When a disaster strikes, the World Food Programme (WFP), as well as other humanitarian agencies, need to design comprehensive emergency operations. They need to know what to bring and in which quantity. How many shelters? How many tons of food? These needs assessments are conducted by humanitarian experts, based on the first information collected, their knowledge, and experience.
The project goal: Building a disaster relief package tool for cyclones (applicable to other use cases and disaster categories)
Use Case: Cyclones (Solution applicable to other areas)
Tropical cyclones cost about 10,000 human lives a year. Many more are injured with homes and buildings destructed, which results in financial damage of several billions of USD. Due to changes in climate and extreme weather events, the impact is growing steadily.
Long Beach after Hurricane Katrina. Estimated damage of 168 billion dollars (Source: Wikipedia).
The Omdena team gathered data from several sources:
IBTrACS – Tropical cyclone data that provides climatological speed and directions of storms (National Oceanic and Atmospheric Administration)
EmDAT – Geographical, temporal, human, and economic information on disasters at the country level. (Université Catholique de Louvain)
Socio-Economic Factors from World Bank
The Gridded Population of the World (GPW) collection – Models the distribution of the human population (counts and densities) on a continuous global raster surface
Missing data was collected manually or partially automated by scraping from Wikipedia or cyclone reports.
Data exploration: Determining affected populations
All five data set were aggregated and included more than 1000 events and 45 features characterizing cyclones and affected populations.
Impact Cyclones (Landing vs. No-landing)
Important correlation factors to determine affected populations:
Human Development Index
GDP per capita
Total hours in Land
The team mapped the correlation factors to determine which populations are most in need. As an example, below the income level is correlated with the number of people affected. Taking advantage of past data, the data model predicts affected populations.
Predicting affected populations based on income level
The tool: Calculating relief packages
Once an affected population has been identified, humanitarian actors need to design comprehensive emergency operations including how much food and what type of food is needed. The project team built a food basket tool, which facilitates calculating the needs of affected populations. The tool looks for various different features such as days to be covered, the number of affected people, pregnancies, kids, etc.
The relief package tool
The entire data analysis and details about the relief package tool including a live demonstration can be found in the video.
The team: Collaborators from 19 countries
This Omdena project hosted by the WFP Innovation Accelerator united 34 collaborators and changemakers across four continents. All team members worked together for two months on Omdena´s innovation platform to build AI solutions with the mission to improve disaster response. To learn more about the project check out our project page.
All changemakers: Ali El-Kassas, Alolaywi Ahmed Sami, Anel Nurkayeva, Arnab Saha, Beata Baczynska, Begoña Echavarren Sánchez, Chinmay Krishnan, Dev Bharti, Devika Bhatia, Erick Almaraz, Fabiana Castiblanco, Francis Onyango, Geethanjali Battula, Grivine Ochieng, Jeremiah Kamama, Joseph Itopa Abubakar, Juber Rahman, Krysztof Ausgustowski, Madhurya Shivaram, Onassis Nottage, Pratibha Gupta, Raghuram Nandepu, Rishab Balakrishnan, Rohit Nagotkar, Rosana de Oliveira Gomes, Sagar Devkate, Sijuade Oguntayyo, Susanne Brockmann, Tefy Lucky Rakotomahefa, Tiago Cunha Montenegro, Vamsi Krishna Gutta, Xavier Torres, Yousof Mardoukhi
Millions of people are forced to leave their current area of residence or community due to resource shortage and natural disasters such as droughts, floods. Our project partner, UNHCR, provides assistance and protection for those who are forcibly displaced inside Somalia.
The goal of this challenge was to create a solution that quantifies the influence of climate change anomalies on forced displacement and/or violent conflict through satellite imaging analysis and neural networks for Somalia.
The UNHCR Innovation team provided the displacement dataset, which contains:
Month End, Year Week, Current (Arrival) Region, Current (Arrival) District, Previous (Departure) Region, Previous (Departure) District, Reason, Current (Arrival) Priority Need, Number of Individuals. These internal displacements are weekly recorded since 2016.
While searching for how to extract the data we learned about NDVI (Normalized difference vegetation index), and NDWI (Normalized Difference Water Index).
Our focus was on finding a way to apply NDVI and NDWI on Satellite Imaging and Neural Networks to prevent Climate Change disasters.
Landsat (EarthExplorer) and MODIS, Hydrology (e.g. river levels, river discharge, an indication of floods/drought), Settlement/shelters GEO (GEO portal). These images have 13 bands and take up around 1GB of storage space per image.
Country-wide estimations for undetected Covid-19 cases and recommendations for enhancing testing facilities based on Probability Analysis
The Problem: Why estimating undetected Covid-19 cases is crucial?
An estimation of the undetected Covid-19 cases is important for authorities to plan economical policies, make decisions around different stages of lockdown, and to work towards the production of intensive care units.
As we have crossed a psychological mark of 1 million Covid-19 patients around the globe, more questions are popping up regarding the capabilities of our health care systems to contain the virus. One of the major worries is the systematic uncertainty in the number of citizens who have hosted the virus. The major contribution to this uncertainty, i.e. Probability Analysis, is possibly due to the small fraction of Covid-19 tests being performed.
The main test to confirm if someone has Covid-19, is to look for signs of the virus’s genetic material in the swab of their nose or throat. This is not yet available for most people. The healthcare workers are morally restricted to reserve the testing apparatus for seriously ill patients in the hospital.
In this article, we will show a simple Bayesian approach, a part of Probability Analysis to estimate the undetected Covid-19 cases. The Bayes theorem can be written as:
P(A|B) = P(B|A) × P(A) / P(B)
where P(A) is the probability of event A, P(B) is the probability of event B, P(A|B) is the probability of observing event A if B is true, and P(B|A) is the probability of observing event B if A is true.
The quantity of interest for us is P(infected|notTested) i.e. the probability of infections that are not tested. This is equivalent to the percentage of the population infected by Covid-19 but not tested and we can write it as:
P(notTested|infected): Probability of tests not done on people that are infected or percentage of the population not tested but infected.
P(infected): Prior probability of infection or known percentage of the infected population.
P(not tested): Probability or percentage of people not tested.
The following plot shows the total Covid-19 tests per million people and the total number of confirmed cases per million people for several countries. This suggests a clear relation between the Covid-19 tests and confirmed positive detections.
Assuming that all countries follow this relation between the Covid-19 tests and confirmed cases, we can make a rough estimate of the number of undetected cases in each country by using Probability Analysis in every country.
Let’s take Australia as an example:
For example, the plot shows that prior knowledge of infected cases
P(infected) = 27.8/10⁶, and
P(not tested) = (10⁶ — 473)/10⁶.
To estimate the P(notTested|infected), I used the relation between the Covid-19 tests and confirmed cases as in the above Figure 1. This is done by fitting a power law of the form: y = a * x**b, where a is normalization, and b is the slope of this power law. The following plot shows a fit to the data points from the above plot, where the best fit a = 0.060±0.008 and b = 0.966±0.014.
Using the best fit parameters, P(notTested|infected) = (10⁶— 4473)/10⁶ / (a * (10⁶ — 4473)**b)/10⁶.
With probabilities 1, 2 and 3, I find P(infected|notTested) = 0.00073 per cent population of Australia. Multiplying this by the population of Australia indicates that there is a possibility of about 18,600 undetected Covid-19 cases in Australia (Probability Analysis report). The following plot shows possible undetected Covid-19 cases as a function of tests per million for different countries as of 20 March 2020.
Note that several assumptions and considerations are made to estimate these undetected cases. For instance:
I assumed that all countries would follow the same power-law relation to estimating P(notTested|infected). However, this is not an extremely good assumption as there is huge scatter in this relation between different countries.
Our prior knowledge of the number of infections can be biased itself as P(infected) depends on the number of tests performed as of 20 March 2020.
I haven’t considered the susceptibility of a country’s populations to Covid-19, and the attack rate i.e. the biostatistical measure of the frequency of morbidity, which for Covid-19 is estimated around 50–80% (Verity et al. 2020).
The impact of government policies of these countries from 14 days before 20 March and 14 days after is not considered.
I haven’t considered how susceptible people are targeted for testing in different countries in the next days.
Figure 4 below shows the total number of confirmed cases versus the tests per million as of 5 April 2020 for several countries (data source).
After 16 days on 5 April, the confirmed positive cases in countries like Ukraine, India and Philipines are consistent with the predictions in Figure 3. These countries performed ≤ 10 tests per million people as of 20 March.
Note that the consistency between estimations as of 20 March and 5 April does not necessarily mean that all undetected cases as of 20 March are confirmed now. Several of the confirmed cases as of 5 April are expected to be new cases due to the spread between 20 March and 5 April (even in the presence of lockdowns).
The estimated undetected cases for countries like Colombia and South Africa are about twice as large (Figure 3) as compared to the total confirmed cases as of 5 April (i.e. about 1,500 for both). Both countries have performed about 100 tests per million people.
Countries like Taiwan, Australia, and Iceland, on the other hand, have shown an order of magnitude small number of confirmed cases as compared to estimated numbers in Figure 3.
This indicates that the countries that have not boosted their testing efficiency to more than 1,000 tests per million people have significantly larger uncertainties on the number of current confirmed cases.
Given the data in Figure 4 from 5 April 2020, I repeated the whole exercise again to estimate the undetected Covid-19 cases for these countries, cities, and states. The following figure shows the best fit power-law and data points similar to Figure 2 but for the data as of 5 April 2020.
The best-fit slope for the power-law relation in Figure 5 (b = 1.281±0.009) is consistent with the slope in Figure 2 at the 2-σ confidence level. This helps our assumption of estimating P(notTested|infected) from the best fit power-law relation (the slope is not changing), however, other caveats are the same as before.
Finally, the following plot shows the estimated undetected Covid-19 cases for different countries as of 5 April 2020.
As the comparison between the undetected estimations as of 20 March (Figure 3) and confirmed cases as of 5 April (Figure 4) shows that more tests per million people are required to capture the possible undetected cases, thus now is the high time that authorities raise the testing efficiency in order to reduce the systematics from undetected Covid-19 cases. This seems to be the only good way to reduce the death rate of Covid-19 patients as indicated by a large amount of Covid-19 testing in Germany and South Korea.
To make this happen, all countries need at least one testing center within a radius of 20 Km and arrange more drive through testing facilities as soon as possible.