AI Insights

Deep Learning Pipeline for Image Segmentation & Laser Weed Removal

May 20, 2021


article featured image

A Crop vs Weed Segmentation on the edge pipeline, in order to improve the model performance that identifies weed from crops in images, we used different image segmentation techniques. Here we present a walkthrough of the preprocessing and pipelining for the Semantic Segmentation exploration.

Introduction

In partnership with Omdena and WeedBot, an impact-driven startup developing a laser weeding robot for farmers to recognize, and remove weeds with a laser beam to facilitate pesticide-free food production, a number of Machine Learning Engineers spread around the world worked for over two months with the goal to develop a high speed, high precision Image Segmentation model with a speed of 12 milliseconds or faster on the Nvidia Xavier edge device. 

Photo by Weedbot

Photo by Weedbot

We explored two types of Image Segmentation — Instance & Semantic Segmentation. In this article, we shall be doing a walkthrough of the preprocessing and pipelining for the Semantic Segmentation exploration. A different article looks into the Instance Segmentation exploration here.

Photo by Markus Spiske on Unsplash

Photo by Markus Spiske on Unsplash

Image Segmentation

Semantic Segmentation was considered because Semantic Segmentation tends to be less computationally intensive than Instance Segmentation. One reason for this is, unlike Instance Segmentation, Semantic Segmentation simply performs object pixel-wise classification without aiming to determine instances of the object. Instance Segmentation, on the other hand, performs both pixel-wise semantic segmentation and object detection (bounding boxes). 

We concluded the distinguishing pixels belonging to crops vs weeds do not necessarily require “instance awareness”. You can see more here on the difference between Instance & Semantic segmentation. 

Components

Feature Extraction

One of the feature extraction methods decided upon is based on this  paper. It details methods to generate 11 additional channels that could aid in improving the model accuracy for crop vs weed classification. 

RGB + Additional Engineered channels

RGB + Additional Engineered channels

Engineered channels calculation feature extraction

Engineered channels calculation feature extraction

Included below are some examples of the extracted channels — 

Original RGB Image

Original RGB Image

HSV

HSV

EXG

EXG

EXR

EXR

CIVE

CIVE

Target Preprocessing

The dataset provided by Weedbot contained annotations in the coco format for the carrots. For our purposes, each image was converted to a 3 channel binary mask corresponding to the Background, Carrot, and Weed classes. This would form the image output to be predicted. Using this method, we could simply extract the necessary channels for further work.

3 channel mask

3 channel mask

Original RGB Image

Original RGB Image

Binary Masks

Background Binary Mask

Background Binary Mask

Carrot

Carrot Binary Mask

Carrot Binary Mask

Weeds

Weed Binary Mask

Weed Binary Mask

Original RGB

Original RGB

Background-Red, Carrot-Green, Weed-Blue

Background-Red, Carrot-Green, Weed-Blue

Problem

The next step in the pipeline was to experiment with different existing open-source segmentation models including uNet, PSPNet, Bonnet to name a few, as well as custom architectures and for this, we needed to split into multiple teams and compare results. 

It became imperative that we work with the exact same datasets, as well as run tests on the same validation images so as to be able to more directly compare the results of the different models. An implication of this was that the different teams had to spend some time performing a lot of the same preprocessing steps to get to the segmentation part of the pipeline and a lot of those preprocessing steps might need to be performed within the training loop. 

Hub by Activeloop to the rescue

Activeloop  was a technical partner on this project and we made use of their platform to overcome this hurdle. 

We were able to perform the preprocessing half of the model pipeline which resulted in a 15-channel input image, and a 3 channel output image. This was then uploaded to Activeloop. This meant that the image preprocessing could be performed once centrally and uploaded. 

For the modeling, everyone could call up the same image dataset without the need to worry about any of the preprocessing or image loading steps that are typical in computer vision problems as the image data is streamed directly to the model as arrays/tensors. 

Let’s dive in. 

Install hub by Activeloop — 

pip3 install hub

Register for an account on Activeloop or on the command line using — 

hub register

To log in with a registered account, on the command line — 

hub login

Each of the masks was saved to disk with the same file names but in different folders.

In my_schema, we define the input type as “image”, and the target arrays as the output. We also define the dimensions of both the input and target. 

We define a load_transform function that loads the different images and masks, combines them into the input and target formats, and uploads the array to Activeloop. 

Next, we pass in a list representing the image names, and for each image name in the list, we run load_transform. The tag string takes our hub username and the name we wish to store the repository as. 

Computing the transformation: 100%|██████████| 19.4k/19.4k [18:09<00:00, 17.8 items/s]

We can now load the image dataset simply with — 

tag = "username/repo_name"
ds = hub.load(tag)

Another advantage to using this method was that we could quickly set up the pipeline on Colab notebooks, and each team could make a copy of the same notebook and quickly get to experimenting without spending any time on environment setup. 

Here’s a link to the initial Colab notebook shared to begin experimenting with different architectures. 

Activeloop offers a nice visualization dashboard to view both the input channels, as well as the target images. 

Dashboard display for input images

Dashboard display for input images

Toggle to view target

Dashboard display for Target

Dashboard display for Target

References

Milioto A, Lottes P, Stachniss C 2018 Real-time Semantic Segmentation of Crop and Weed for Precision Agriculture Robots Leveraging Background Knowledge in CNNs https://arxiv.org/pdf/1709.06764.pdf

Mishra A 2020 Faster Machine Learning Using Hub by Activeloop https://towardsdatascience.com/faster-machine-learning-using-hub-by-activeloop-4ffb3420c005

This article is written by Sijuade Oguntayo, Pankaja Shankar, Shubham Gandhi, Marjan Gbohadian,  Melania Abrahamian,  Nyan Swan Aung (Brian).

Ready to test your skills?

If you’re interested in collaborating, apply to join an Omdena project at: https://www.omdena.com/projects

Related Articles

media card
FloodGuard: Integrating Rainfall Time Series and GIS Data for Flood Prediction and Waterbody Forecasting in Bangladesh
media card
AI-Driven Sustainability Solutions in a Changing World
media card
Vehicle Image Analysis and Insurance Fraud Prevention through EDA Techniques