More about Omdena
Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.
Now, how do we get to the elevation map? And what do we use it for?
Those were the main questions we asked ourselves, considering that we will be doing something great for the world in this project regarding the understanding of topographical maps using deep learning.
First, what is an Elevation Map anyway?
An elevation map shows the various elevations in a region. Elevation in maps is shown using contour lines (level curves), bands of the same color (using imagery processing), or by numerical values giving the exact elevation details. Elevation maps are generally known as Topographical Maps by using Deep Learning.
A digital Elevation Model is a specialized database that represents the relief of a surface between points of known elevation. It is the digital representation of the land surface elevation.
Contour lines are the most common method of showing relief and elevation on a standard topographical map using deep learning. A contour line represents an imaginary line of the ground, above or below sea level. Contour lines form circles (or go off the map). The inside of the circle is the top of a hill.
We worked with the DEM to create the contour lines, using GIS open-source software, in this case, we used a GIS software called QGIS with a plugin’ called “Contour”, which uses the elevations of the DEM to define the level curves and obtain a contour line model of the study area (it is possible to define the distance between each level curve, which in this case occurred every two meters).
Next, we need a Triangular Irregular Network (TIN) with vector-based lines and three-dimensional coordinates (x,y,z). The TIN model represents a surface as a set of contiguous, non-overlapping triangles.
We applied computational geometry (Delaunay Triangulation) to create the TIN.
Delaunay triangulations are widely used in scientific computing in many diverse applications. While there are numerous algorithms for computing triangulations, it is the favorable geometric properties of the Delaunay triangulation that make it so useful.
For modeling terrain or other objects with a set of sample points, the Delaunay Triangulation gives a good set of triangles to use as polygons in the model. In particular, the Delaunay Triangulation avoids narrow triangles (as they have large circumcircles compared to their area).
A DTM is a vector dataset composed of regularly spaced points and natural features such as ridges and break lines. A DTM augments a DEM by including linear features of the bare earth terrain.
DTM’s are typically created through stereophotogrammetry, but in this case, we downloaded a point surface of the terrain.
The points are called LiDAR points, a collection of points that represents a 3D shape or feature. Each point has its own set of X, Y, and Z coordinates and in some cases additional attributes. We can think about a point cloud as a collection of multiple points and converted into a DTM using GIS open-source software.
After applying previous techniques in the context of identifying trees, we got the following results.
These results are part of an overal solution to identify trees close to power stations and allows us to find and determine the elevations of the forest cover, as well as the aspect of the land, either for geological purposes (determination of land use), forestry (control of protected natural areas), or to prevent fires in high-risk areas (anthropic causes).
The elevation map can give great help to public agencies to perform spatial analysis, manage large amounts of spatial data, and produce cartographically appealing maps to aid in decision making. It improves the responders efficiency and performance by giving them rapid access to critical data during an incident.
By James Tan
The power of deep learning paired with collaborative human intelligence to increase crop cultivation through super resolution.
In order to accurately locate crop fields from satellite imagery, it is conceivable that images of a certain quality are required. Although Deep Learning is notoriously known for being able to pull off miracles, we human beings will have a real field day labeling the data if we cannot clearly make out the details within an image.
The following is an example of what we would like to achieve.
If we can clearly identify areas within a satellite image that correspond to a particular crop, we can easily extend our work to evaluate the area of cultivation of each crop, which will go a long way in ensuring food security.
To get our hands on the required data, we explored a myriad of sources of satellite imagery. We ultimately settled on using images from Sentinel-2, largely due to the fact that the satellite mission boasts images of the best quality amongst other open-source images.
Despite my previous assertion that I am not an expert in satellite imagery, I believe that having seen the above image we can all agree that the quality of it is not quite up to scratch.
A real nightmare to label!
It is completely unfeasible to discern the demarcations between individual crop fields.
Of course, this isn’t completely unreasonable for open-source data. Satellite image vendors have to be especially careful when it comes to the distribution of such data due to privacy concerns.
How outrageous it would be if simply anyone can look up what our backyards look like on the Internet, right?
However, this inconvenience comes at a great detriment to our project. In order to clearly identify and label crops in an image that is relevant to us, we would require images of much higher quality than what we have.
Deep Learning practitioners love to apply what they know to solve the problems they face. You probably know where I am getting with this. If the quality of an image isn’t good enough, we try to enhance it of course! The process we like to call super-resolution.
This is one of the first things that we tried, and here are the results.
Quite noticeably there has been some improvement, the model has done an amazing job of smoothening out the rough edges in the photo. The pixelation problem has been pretty much-taken care of and everything blends in well.
However, in doing so the model has neglected finer details and that leads to an image that feels out of focus.
Naturally, we wouldn’t stop until we got something completely satisfactory, which led us to try this instead.
Now, it is quite obvious that this model has done something completely different than Deep Image Prior. Instead of attempting to ensure that the pixels blend in with each other, this model instead places great emphasis on refining each individual pixel. In doing so it neglects to consider how each pixel is actually related to its surrounding pixels.
Albeit being successful in injecting some life into the original image by making the colors more refined and attractive, the pixelation in the image remains an issue.
When we first saw this, we couldn’t believe what we were watching. We have come such a long way from our original image! And to think that the approach taken to achieve such results was such a silly one.
Since each of the previous two models were no good individually, but they clearly were good at getting different things done, what if we combined the two of them?
So we ran the original image through Deep Image Prior, and subsequently fed the results of that through the Decrappify model, and voila!
Relative to the original image, the colors of the current image look incredibly realistic. The lucid demarcations of the crop fields will certainly come a long way in helping us label our data.
The way we pulled this off was embarrassingly simple. We used Deep Image Prior which is found at its official Github repository. As for Decrappify, given our objectives, we figured that training it on satellite images would definitely help out. Having the two models readily set up, its just a matter of feeding images into them one after the other.
For those of you that have made it this far and are curious about what the models actually are, here’s a brief overview of them.
This method hardly conforms to conventional deep learning-based super-resolution approaches.
Typically, we would create a dataset of low and super-resolution image pairs, following which we train a model to map a low-resolution image to its high-resolution counterpart to increase crop cultivation. However, this particular model does none of the above, and as a result, does not have to be pre-trained prior to inference time. Instead, a randomly initialized deep neural network is trained on one particular image. That image could be one of your favorite sports stars, a picture of your pet, a painting that you like, or even random noise. Its task is then, to optimize its parameters to map the input image to the image that we are trying to super-resolve. In other words, we are training our network to overfit to our low-resolution image.
Why does this make sense?
It turns out that the structure of deep networks imposes a ‘naturalness prior’ over the generated image. Quite simply, this means that when overfitting/memorizing an image, deep networks prefer to learn the natural/smoother concepts first before moving on to the unnatural ones. That is to say that the convolutional neural network (CNN) will first ‘identify’ the colors that form shapes in various parts of the image and then proceed to materialize various textures in the image. As the optimization process goes on, CNN will latch on to finer details.
When generating an image, neural networks prefer natural-looking images as opposed to pixelated ones. Thus, we start the optimization process and allow it to continue to the point where it has captured most of the relevant details but has not learned any of the pixelations and noise. For super-resolution, we train it to a point such that the resulting image it creates closely resembles the original image when they are both downsampled. There exist multiple super-resolution images that could have produced each low-resolution image to increase crop cultivation.
And as it turns out, the most plausible image is also the one that doesn’t appear to be highly pixelated, this is because the structure of deep networks imposes a ‘naturalness prior’ on generated images.
We highly recommend this talk by Dmitry Ulyanov (who was the main author of the Deep Image Prior paper) to understand the above concepts in depth.
In contrast with the previous model, here it is about to learn as much possible about satellite images. As a result, when we give it a low-quality image as an input, the model is able to bridge the gap between a low and high-quality version of it by using its knowledge of the world to fill in the blanks.
The model has a U-net architecture with a pre-trained ResNet backbone. But the part that is really interesting is in the loss function, which has been adapted from this paper. The objective of this model is to produce an output image of higher quality, such that when it is fed through a pre-trained VGG16 model, it produces minimal ‘style’ and ‘content’ loss relative to the ground truth image. The ‘style’ loss is relevant because we want the model to be able to be careful in creating a super-resolution image with a texture that is realistic of a satellite image to increase crop cultivation. The ‘content’ loss is responsible for encouraging the model to recreate intricate details in its higher quality output.
Is it possible to estimate with minimum expert knowledge if your street will be safer than others when an earthquake occurs?
We answered how to estimate the safest route after an earthquake with computer vision and route management.
The last devastating earthquake in Turkey occurred in 1999 (>7 on the Richter scale) around 150–200 kilometers from Istanbul. Scientists believe that this time the earthquake will burst directly in the city and the magnitude is predicted to be similar.
The main motivation behind this AI project hosted by Impacthub Istanbul is to optimize the Aftermath Management of Earthquake with AI and route planning.
After kicking off the project and brainstorming with the hosts, collaborators, and the Omdena team about how to better prepare the city of Istanbul for an upcoming disaster, we spotted a problem quite simple but at the same time really important for families: get reunited ASAP in earthquake aftermath!
Our target was set to provide safe and fast route planning for families, considering not only time factors but also broken bridges, landing debris, and other obstacles usually found in these scenarios.
We resorted to working on two tasks: creating a risk heatmap that would depict how dangerous is a particular area on the map, and a path-finding algorithm providing the safest and shortest path from A to B. The latter algorithm would rely on the previous heatmap to estimate safeness.
Challenge started! Deep Learning for Earthquake management by the use of Computer Vision and Route Management.
By this time, we optimistically trusted in open data to successfully address our problem. However, we realized soon that data describing buildings quality, soil composition, as well as pre and post-disaster imagery, were complex to model, integrate, when possible to find.
Bridges over streets, buildings height, 1000 types of soil, and eventually, interaction among all of them… Too many factors to control! So we just focused on delivering something more approximated.
The question was: how to accurately estimate street safeness during any Earthquake in Istanbul without such a myriad of data? What if we could roughly estimate path safeness by embracing distance-to-buildings as a safety proxy. The farther the buildings the safer the pathway.
For that crazy idea, firstly we needed buildings footprints laid on the map. Some people suggested borrowing buildings footprints from Open Street Map, one of the most popular open-source map providers. However, we noticed soon Open Street Map, though quite complete, has some blank areas in terms of buildings metadata which were relevant for our task. Footprints were also inaccurately laid out on the map sometimes.
A big problem regarding the occurrence of any Earthquake and their effects on the population, and we have Computer Vision here to the rescue! Using Deep Learning, we could rely on satellite imagery to detect and then, estimated closeness from pathways to them.
The next stone on the road was to obtain high-resolution imagery of Istanbul. With enough resolution to allow an ML model locates building footprints in the map as a standard-visually-agile human does. Likewise, we would also need some annotated footprints on these images so that our model can gracefully train.
Instead of labeling hundreds of square meters manually, we trusted on SpaceNet (and in particular, images for Rio de Janeiro) as our annotated data provider. This dataset contains high-resolution satellite images and building footprints, nicely pre-processed and organized which were used in a recent competition.
The modeling phase was really smooth thanks to fast.ai software.
We used a Dynamic Unit model with an ImageNet pre-trained resnet34 encoder as a starting point for our model. This state-of-the-art architecture uses by default many advanced deep learning techniques, such as a one-cycle learning schedule or AdamW optimizer.
All these fancy advances in just a few lines of code.
We set up a balanced combination of Focal Loss and Dice Loss, and accuracy and dice metrics as performance evaluators. After several frozen and unfrozen steps in our model, we came up with good-enough predictions for the next step.
For more information about working with geospatial data and tools with fast.ai, please refer to .
Finding high-resolution imagery was the key to our model and at the same time a humongous stone hindering our path to victory.
For the training stage, it was easy to elude the manual annotation and data collection process thanks to SpaceNet, yet during prediction, obtaining high-res imagery for Istanbul was the only way.
Thankfully, we stumble upon Mapbox and its easy-peasy almost-free download API which provides high-res slippy map tiles all over the world, and with different zoom levels. Slippy map tiles are 256 × 256 pixel files, described by x, y, z coordinates, where x and y represent 2D coordinates in the Mercator projection, and z the zoom level applied on earth globe. We chose a zoom level equal to 18 where each pixel links to real 0.596 meters.
As they mentioned on their webpage, they have a generous free tier that allows you to download up to 750,000 raster tiles a month for free. Enough for us as we wanted to grab tiles for a couple of districts.
Once all required tiles were stealing space from my Google Drive, it was time to switch on our deep learning model and generate prediction footprints for each tile.
Then, we geo-referenced the tiles by translating from the Mercator coordinates to the latitude-longitude tuple (that used by mighty explorers). Geo-referencing tiles was a required step to create our prediction piece of art with GDAL software.
gdal_merge.py thecommand allows us to glue tiles by using embedded geo-coordinates in TIFF images. After some math, and computing time… voilà! Our high-res prediction map for the district is ready.
Ok, I see my house but should I go through this street?
Building detection was not enough for our task. We should determine distance from a given position in the map to the closest building around so that a person in this place could know how safe is going to be to cross this street. The larger the distance the safer, remember?
The path-finding team would overlay the heatmap below on his graph-based schema and by intersecting graph edges (streets) with heatmap pixels (user positions), they could calculate the average distance for each pixel on the edge and thus obtaining a safeness estimation for each street. This would be our input when finding the best A-B path.
If a pixel belongs to some building, it returns zero. If not, return the minimum euclidean distance from the center point to the building’s pixels. This process along with NumPy optimizations was the key to mitigate the quadratic complexity beneath this computation.
Repeat the process for each pixel and the safeness map comes up.
Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.
The complexity of the task is increased due to the quality of satellite images from India (and most of the developing world). Similar solutions like the Google Sunroof project work only on high-resolution images and are not usable in the majority of the developing world.
Mask R-CNN was built by the Facebook AI research team. The working principle of Mask R-CNN is quite simple. The researchers combined two previously existing models together and played around with the linear algebra. The model can be divided into two parts — a region proposal network (RPN) and a binary mask classifier. Step one is to get a set of bounding boxes that possibly contain an object of relevance. The second stage is to color the boxes.
Our goal of using this model is to segment or separate each roof-top instance in an image.
Our challenges started with the lack of data as there are no datasets with rooftops.
We used weights from Mask R-CNN network trained over “coco” dataset that was originally trained to recognize 80 classes but did not have a roof, as a starting point for our model. Resnet 101 was used as a backbone architecture. We started with training our model for “heads” layers, i.e. the RPN, classifier and mask heads of the network; because training the entire network would need a lot of data and since it was pre-trained model over many different classes, it would be a good start. Then to see the difference it made in prediction, we moved up to training till stage 4 and 5 of Resnet 101 architecture.
We also tried different variations. Image size was changed from 1024X1024 to 320X320 because our training image size is 300X300 and padding didn’t seem a good idea to increase it up to 1024.
We compared results from another 20 rooftops and our results show that often the model 4 performs best.
We then tried changing the minimum detection confidence threshold, from 0.9 to 0.7. Regions of Interest(ROIs) below this value will be skipped. It was done because for detecting roofs, 0.9 seemed a very high threshold when it can just be reduced enough to predict any region as a rooftop, as roofs aren’t intricate, well-defined and very specific regions; so any region that can be a good candidate for rooftop should be considered. At 0.7 we had a lot more regions but it also presented us with many overlapped regions as roofs whereas, for 0.9 value, we had a few big blobs taking all of the adjacent roofs as one region.
Other than this, we were trying to train the model by changing the optimizer from Stochastic Gradient Descent(SGD) to Adam optimizer.
Here are the result images for models trained with SGD and Adam optimizer. Training is done with 70% and 90% confidence. Testing is done with 70% and 90% confidence for each of the trained models.
1. Adam optimizer trained models are not able to predict all the instances and they are not able to differentiate between the adjacent roofs as compared to SGD trained models.
2. SGD trained models predict sharper masks than the Adam trained models. This shows SGD is better for our models, and we’ll continue with it.
About SGD variations:
1. Training the model with 70% confidence is increasing the number of masks, as it has been trained to consider low probability roofs too which leads to overlapped masks. Whereas training with 90% is giving cleaner and fewer masks. So, training with 90% seems a better option.
2. Now, to predict masks with 70% confidence or with 90% confidence? 90% is too precise and might remove other options whereas 70% will include them as seen in images 4 and 7. So, after training for more number of images(currently trained over 95 images), we will be able to see which one can be used finally.
As a final step to identify the individual rooftops we did some post-processing. We created regular shapes from the odd colored shaped. The post-processing was done on a dataset of 607 images.
Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.
The project goal was to build a Machine Learning model for tree identification on satellite images. The solution will prevent power outages and fires sparked by falling trees and storms. This will save lives, reduce CO2 emissions, and improve infrastructure inspection. The project was hosted by the Swedish AI startup Spacept.
Four weeks ago 35 AI experts and data scientists, from 16 countries came together through the Omdena platform. The community participants formed self-organized task groups and each task group either picked up a part or approach to solving the challenge.
Omdena’s platform is a self-organized learning environment and after the first kick-off call, the collaborators started to organize in task groups. Below are screenshots of some of the discussions that took place in the first days of the project.
Task Group 1: Labeling
We labeled over 1000 images. A large group of people makes it not only faster but also more accurate through our peer-to-peer review process.
Active collaborators: Leonardo Sanchez (Task Manager, Brazil), Arafat Bin Hossain (Bangladesh), Sim Keng Ying(Singapore), Alejandro Bautista Ramos (Mexico), Santosh Kumar Pydipalli (India), Gerardo Duran (Mexico), Annie Tran (USA), Steven Parra Giraldo (Colombia), Bishwa Karki (Nepal), Isaac Rodríguez Bribiesca (Mexico).
Task Group 2: Generating images through GANs
Given a training set, GANs can be used to generate new data with the same features as the training set.
Active Participants: Santiago Hincapie-Potes (Task Manager, Colombia), Amit Singh (Task Manager for DCGAN, India), Ramon Ontiveros (Mexico), Steven Parra Giraldo (Colombia), Isaac Rodríguez (Mexico), Rafael Villca (Bolivia), Bishwa Karki (Nepal).
Task Group 3: Generating elevation model
The task group is using a Digital Elevation Model and triangulated irregular network. Knowing the elevation of the land as well as trees will help us to assess risk potential tree posses to overhead cables.
Active Participants: Gabriel Garcia Ojeda (Mexico)
Task Group 4: Sharpening the images
A set of image processes has been built, different combinations of filters were used and a basic pipeline to automate the process was implemented to test out the combinations. All in order to preprocess the set of labeled images to achieve more accurate results with the AI models.
Active Participants: Lukasz Kaczmarek (Task Manager, Poland) Cristian Vargas (Mexico), Rodolfo Ferro (Mexico), Ramon Ontiveros (Mexico).
Task Group 5: Detecting trees through Masked R-CNN model
Mask R-CNN was built by the Facebook AI research team. The model generates a set of bounding boxes that possibly contain the trees. The second step is to color based on certainty.
Active Participants: Kathelyn Zelaya (Task Manager, USA), Annie Tran (USA), Shubhajit Das (India), Shafie Mukhre (USA).
Task Group 6: Detecting trees through U-Net and Deep U-Net model
U-Net was initially used for biomedical image segmentation, but because of the good results it was able to achieve, U-Net is being applied in a variety of other tasks. It is one of the best network architecture for image segmentation. We applied the same architecture to identifying trees and got very encouraging results, even when trained with less than 50 images.
Active Participants: Pawel Pisarski (Task Manager, Canada), Arafat Bin Hossain (Bangladesh), Rodolfo Ferro (Mexico), Juan Manuel Ciro Torre (Colombia), Leonardo Sanchez (Brazil).
The U-Net consists of a contracting path and an expansive path, which gives it the u-shaped format. The contracting path is a typical convolutional network that consists of repeated application of convolutions, each followed by a rectified linear unit (ReLU) and a max-pooling operation.
One of the techniques called our attention: the Deep U-Net. Similarly to U-Nets, the Deep U-Nets have both sides (contraction and expansion side) but use U-connections to pass high-resolution features from the contraction side to the upsampled outputs. And additionally, it uses Plus connections to better extract information with less loss error.
Having discussed the architecture, a basic Deep U-Net solution was applied to the unique 144 images labeled that were then divided into 119 images and 119 masks for the training set, 22 images and 22 masks for the validation set, and 3 images and 3 masks for a test set. As images and masks were in 1,000 x 1,000 images, they were cropped into 512 x 512 images generating 476 images and 476 masks for the training set, 88 images and 88 masks for the validation set, and 12 images and 12 masks for the test set. Applying the Deep U-Net model with 10 epochs and a batch size equal to 4, the results for the 10 epochs — using Adam optimizer, a binary-cross-entropy loss and running over a GPU Geforce GTX 1060 — were quite encouraging, reaching 94% accuracy over validation.
Model Accuracy and Loss
Believing that accuracy could be improved a bit further, the basic solution was expanded using data augmentation. We generated through rotations, 8 augmented images per original image and had 3,808 images and 3,808 masks for the training set, and 704 images and 704 masks for the validation set.
We reproduced the previous model, keeping the basal learning rate as 0.001 but adjusting with a decay inversely proportional to the number of epochs and increasing the number of epochs to 100.
The Deep U-Net model learned very well to distinguish trees in new images, even separating shadows among forests as not trees, reproducing what we humans did during the labeling process but with an even better performance.
A few results can be seen below and were generated using new images completely unseen before by the Deep U-Net.