Deep Learning Pipeline for Image Segmentation & Laser Weed Removal
May 20, 2021
A Crop vs Weed Segmentation on the edge pipeline, in order to improve the model performance that identifies weed from crops in images, we used different image segmentation techniques. Here we present a walkthrough of the preprocessing and pipelining for the Semantic Segmentation exploration.
Introduction
In partnership with Omdena and WeedBot, an impact-driven startup developing a laser weeding robot for farmers to recognize, and remove weeds with a laser beam to facilitate pesticide-free food production, a number of Machine Learning Engineers spread around the world worked for over two months with the goal to develop a high speed, high precision Image Segmentation model with a speed of 12 milliseconds or faster on the Nvidia Xavier edge device.
We explored two types of Image Segmentation — Instance & Semantic Segmentation. In this article, we shall be doing a walkthrough of the preprocessing and pipelining for the Semantic Segmentation exploration. A different article looks into the Instance Segmentation exploration here.
Image Segmentation
Semantic Segmentation was considered because Semantic Segmentation tends to be less computationally intensive than Instance Segmentation. One reason for this is, unlike Instance Segmentation, Semantic Segmentation simply performs object pixel-wise classification without aiming to determine instances of the object. Instance Segmentation, on the other hand, performs both pixel-wise semantic segmentation and object detection (bounding boxes).
We concluded the distinguishing pixels belonging to crops vs weeds do not necessarily require “instance awareness”. You can see more here on the difference between Instance & Semantic segmentation.
Components
Feature Extraction
One of the feature extraction methods decided upon is based on this paper. It details methods to generate 11 additional channels that could aid in improving the model accuracy for crop vs weed classification.
Included below are some examples of the extracted channels —
Target Preprocessing
The dataset provided by Weedbot contained annotations in the coco format for the carrots. For our purposes, each image was converted to a 3 channel binary mask corresponding to the Background, Carrot, and Weed classes. This would form the image output to be predicted. Using this method, we could simply extract the necessary channels for further work.
Binary Masks
Carrot
Weeds
Problem
The next step in the pipeline was to experiment with different existing open-source segmentation models including uNet, PSPNet, Bonnet to name a few, as well as custom architectures and for this, we needed to split into multiple teams and compare results.
It became imperative that we work with the exact same datasets, as well as run tests on the same validation images so as to be able to more directly compare the results of the different models. An implication of this was that the different teams had to spend some time performing a lot of the same preprocessing steps to get to the segmentation part of the pipeline and a lot of those preprocessing steps might need to be performed within the training loop.
Hub by Activeloop to the rescue
Activeloop was a technical partner on this project and we made use of their platform to overcome this hurdle.
We were able to perform the preprocessing half of the model pipeline which resulted in a 15-channel input image, and a 3 channel output image. This was then uploaded to Activeloop. This meant that the image preprocessing could be performed once centrally and uploaded.
For the modeling, everyone could call up the same image dataset without the need to worry about any of the preprocessing or image loading steps that are typical in computer vision problems as the image data is streamed directly to the model as arrays/tensors.
Let’s dive in.
Install hub by Activeloop —
pip3 install hub
Register for an account on Activeloop or on the command line using —
hub register
To log in with a registered account, on the command line —
hub login
Each of the masks was saved to disk with the same file names but in different folders.
In my_schema, we define the input type as “image”, and the target arrays as the output. We also define the dimensions of both the input and target.
We define a load_transform function that loads the different images and masks, combines them into the input and target formats, and uploads the array to Activeloop.
Next, we pass in a list representing the image names, and for each image name in the list, we run load_transform. The tag string takes our hub username and the name we wish to store the repository as.
Computing the transformation: 100%|██████████| 19.4k/19.4k [18:09<00:00, 17.8 items/s]
We can now load the image dataset simply with —
tag = "username/repo_name" ds = hub.load(tag)
Another advantage to using this method was that we could quickly set up the pipeline on Colab notebooks, and each team could make a copy of the same notebook and quickly get to experimenting without spending any time on environment setup.
Here’s a link to the initial Colab notebook shared to begin experimenting with different architectures.
Activeloop offers a nice visualization dashboard to view both the input channels, as well as the target images.
Toggle to view target
References
Milioto A, Lottes P, Stachniss C 2018 Real-time Semantic Segmentation of Crop and Weed for Precision Agriculture Robots Leveraging Background Knowledge in CNNs https://arxiv.org/pdf/1709.06764.pdf
Mishra A 2020 Faster Machine Learning Using Hub by Activeloop https://towardsdatascience.com/faster-machine-learning-using-hub-by-activeloop-4ffb3420c005
This article is written by Sijuade Oguntayo, Pankaja Shankar, Shubham Gandhi, Marjan Gbohadian, Melania Abrahamian, Nyan Swan Aung (Brian).