Weed and Crop Detection using Computer Vision on Drone Imagery
Learn how drone imagery and computer vision enable accurate weed and crop detection, reducing herbicide use and improving precision agriculture.

This project shows how drone imagery and computer vision can reliably separate crops from weeds at field scale, making site specific herbicide application practical in real farming conditions. By combining synthetic data generation, superresolution, detection and segmentation models with active learning, the pipeline delivers accurate weed maps while reducing manual annotation effort. The outcome is lower chemical use, reduced costs and a scalable foundation for precision agriculture.
Introduction
Weeds compete directly with crops for nutrients, water and light, and can significantly reduce yields if left unmanaged. Traditional weed control methods often rely on blanket herbicide spraying, which increases costs and causes long-term harm to soil, water and surrounding ecosystems. Precision agriculture addresses this challenge by enabling herbicides to be applied only where weeds are present, making early and accurate weed detection a critical requirement for sustainable farming.
Recent advances in unmanned aerial vehicles (UAVs) and artificial intelligence now enable low-cost capture and analysis of high-resolution field imagery using computer vision models. This progress highlights the growing role of AI in agriculture for precision farming and automated crop monitoring.
Public datasets such as DRONEWEED have accelerated early-season weed classification research using deep-learning approaches like convolutional neural networks and vision transformers. Building on this progress, the OmdenaâSkyMaps project focused on mapping weeds and crops in beetroot and corn fields using drone imagery to support targeted interventions that reduce chemical use and improve farm efficiency.
Problem Statement
Traditional weed management relies on manual scouting or uniform herbicide applications. Both are inefficient: manual surveys are slow and labourâintensive, while spraying entire fields treats many areas with no weeds. SkyMaps and Omdena aimed to develop a system that automatically distinguishes crop plants from weeds and pinpoints their locations within drone images. The challenge combines data acquisition (obtaining enough annotated images across different resolutions) with model design (choosing architectures that handle class imbalance and small objects). Ultimately, the goal is to enable farmers to spray only where weeds grow, lowering costs and protecting soil and water quality.
Project Scope and Deliverables
The project spanned the full machineâlearning pipeline, from data preparation to deployment. Five core deliverables were defined:
These deliverables reflect how modern computer vision pipelines increasingly rely on advanced instance segmentation and semantic understanding to move from experimentation to field-ready deployment.
- Data augmentation: Because collecting large numbers of labelled images is expensive, the team generated synthetic training data. Weed and crop cutâouts were composited onto varied backgrounds, creating over 1âŻ000 annotated images with masks and COCO annotations. The approach produced training, validation and test sets without further fieldwork.
- Superresolution: To enhance lowâresolution drone images, deepâlearning models were investigated that convert lowâresolution inputs into 512âŻĂâŻ512âpixel outputs. Improving image fidelity helps models recognise small weed seedlings.
- Object detection: Multiple architecturesâincluding Faster RCNN, Detectron 2, YOLOv4 and YOLOv5âwere benchmarked to locate and classify individual plants. Models were trained on both real and synthetic datasets and evaluated using mean average precision (mAP) at an IoU threshold of 0.5.
- Segmentation: UâNet variants and Mask RCNN were used for pixelâlevel classification of images. Semantic segmentation labels each pixel as crop or weed, while instance segmentation delineates individual plants.
- Active learning and deployment: Using the OnePanel/CVAT platform, the team built an annotation workflow that combines model predictions with human corrections. An inference API was packaged and integrated into the SkyMaps platform, enabling realâtime weed mapping for end users.
Methodology
Data Collection and Augmentation
SkyMaps provided orthophoto maps captured by drones at ground sampling distances of 5â30 mm per pixel for highâresolution flights and 10â100 mm per pixel for lower resolutions. These maps were tiled into 512âŻĂâŻ512 images and manually annotated to identify crops (beetroot and corn) and weeds (thistle).

Figure 1. Annotated weed samples from drone imagery. Sample images from the SkyMaps dataset illustrate how crops and weeds are labeled for training. Such examples help readers visualise the classification task and the complexity of real field conditions.
To expand the dataset, the team created a synthetic augmentation pipeline:
- Dataset structure: The baseline dataset included 217 background images, 9 thistle foreground images and 51 annotated images (25 validation and 26 test). This small collection could not support robust model training.
- Synthetic image generation: Collaborators used GNU Image Manipulation Program (GIMP) to cut weed and crop plants from the SkyMaps images. These foreground cutâouts were randomly placed on background photos, and corresponding masks and COCO annotations were generated. The pipeline produced 1âŻ000 training images, 100 validation images and 100 test images, each 512âŻĂâŻ512 pixels, dramatically increasing the training diversity.
- Evaluation: A Faster RCNN baseline model was trained on the synthetic dataset. Although it achieved reasonable performance on synthetic validation and test sets ([email protected] â 0.64 on validation and 0.66 on synthetic test images), its mAP on real test images was only 0.08, highlighting the challenge of domain transfer from synthetic to real data.
Superresolution
Lowâaltitude flights yield fewer images and faster field coverage but at the cost of lower resolution. To recover highâresolution detail, the superresolution team evaluated several models that upscale lowâresolution RGB inputs to 512âŻĂâŻ512 outputs.

Figure 2. Superresolution workflow. Orthophoto maps are tiled into small images, fed into various models (DCSCN, autoâencoders, SRCNN, Pix2Pix and UNet) and evaluated using structural similarity index (SSIM) and mean squared error (MSE).
Training data comprised highâresolution images (HR) and corresponding downscaled lowâresolution images (LR). The dataset included:
| Image type | Resolution | Train | Validation | Test |
|---|---|---|---|---|
| Highâresolution (HR) | 512âŻĂâŻ512 | 224 | 20 | 30 |
| Lowâresolution (LR) | 256âŻĂâŻ256 | 730 | 40 | 90 |
| Lowâresolution (LR) | 128âŻĂâŻ128 | 200 | 10 | 30 |
Several architectures were assessed using three metrics: mean squared error (MSE), peak signalâtoânoise ratio (PSNR) and structural similarity index (SSIM). The Deep CNN with Skipped Net (DCSCN) achieved the best balance of accuracy and visual fidelity (SSIM â 0.765, MSE â 821, PSNR â 26.5). Although autoâencoders and generative adversarial networks (GANs) produced higher PSNR scores, they introduced discolouration and artefacts that degraded downstream detection. Consequently, the DCSCN model was selected for superresolution preprocessing.
Object Detection
The objectâdetection task requires locating plant instances and classifying them into four categories: thistle, small beetroot, large beetroot and corn. Collaborators trained and evaluated several models on real and synthetic datasets.

Figure 3. Example of YOLOv5 training metrics. Losses decrease while precision, recall and mAP improve over successive epochs, illustrating model convergence.
Performance was measured using mAP at IoU âĽÂ 0.5. The results are summarised below:
| Model | [email protected] (test) | [email protected] (validation) |
|---|---|---|
| YOLOv5 large | 71.2Â % | 66Â % |
| YOLOv5Â x | 66.5Â % | 60.5Â % |
| YOLOv4 | 65.3Â % | 62.7Â % |
| YOLOv4 (tiled images) | 63.7Â % | 62.3Â % |
| Detectron 2 | 62.2 % | 64.1 % |
| YOLOv5 medium | 61.4Â % | 60.2Â % |
| Faster RCNN (ResNext50 + FPN) | 58.6 % | 54.7 % |
| Faster RCNN (ResNet152 + FPN) | 54.2 % | 52.2 % |
| Baseline Faster RCNN (ResNet50 + FPN) | 48.4 % | 57.3 % |
The YOLOv5 large model offered the best balance of speed and accuracy, achieving an mAP above 70 % on the test set. Detectron 2 provided competitive performance on the validation set but was slower during inference. Experiments comparing models trained exclusively on the synthetic dataset versus the real dataset revealed the limitations of synthetic data: for example, a Detectron 2 model trained only on synthetic images achieved a mAP of 7.26 % on the real test set, compared with 49.05 % when trained on real data. These findings underscore the importance of collecting real images even when synthetic augmentation is used.
Segmentation
Segmentation was tackled at two levels: semantic and instance segmentation for weed detection, where semantic models classify each pixel as crop or weed while instance segmentation delineates individual plants. In real agricultural settings, choosing between these approaches depends on data availability, plant density and operational goalsâfactors that strongly influence how segmentation outputs translate into actionable weed maps. Metrics included Intersection over Union (IoU), Dice coefficient and mAP for instance segmentation. Results for UâNet variants and Mask RCNN are summarised below:
| Type | Model (backbone & resolution) | IoU (crop) | IoU (weed) | Dice (crop) | Dice (weed) | mAP (instance) |
|---|---|---|---|---|---|---|
| Semantic | UâNet (896âŻĂâŻ896) | 0.937 | 0.808 | 0.893 | 0.893 | â |
| Semantic | UâNet (EfficientNet, 768âŻĂâŻ768) | 0.857 | 0.458 | 0.922 | 0.620 | â |
| Semantic | UâNet (512âŻĂâŻ512) | 0.850 | 0.250 | 0.920 | 0.400 | â |
| Semantic | UâNet (MobileNet, 256âŻĂâŻ256) | 0.563 | 0.055 | 0.486 | 0.043 | â |
| Instance | Mask RCNN (ResNet101, 512âŻĂâŻ512) | â | â | â | â | 0.590 |
| Instance | Mask RCNN (ResNet50 + FPN, 1024âŻĂâŻ768) | â | â | â | â | 0.396 |
The largest semantic UâNet (896âŻĂâŻ896) delivered high IoU and Dice scores for both crop and weed classes, confirming that spatial resolution matters when delineating small weed patches. For instance segmentation, Mask RCNN with a ResNet101 backbone achieved a mAP of 0.59, outperforming the ResNet50âbased variant.
Active Learning and Deployment
After training baseline detection and segmentation models, the team employed an activeâlearning loop using the OnePanel/CVAT platform:
- Baseline dataset preparation: The initial labelled datasetâconsisting of bounding boxes for object detection and polygons for instance segmentationâwas uploaded to CVAT.
- Model training: TensorFlow Object Detection and Mask RCNN models were trained using OnePanelâs builtâin data augmentation and hyperparameter tuning features.
- Autoâannotation: The trained models were linked to CVAT to automatically label new, unlabeled images. This autoâannotation generated pseudoâlabels that served as a starting point for human annotators.
- Human feedback: Annotators reviewed the pseudoâlabels and corrected errors using CVATâs editing tools. The corrected labels were added to the training set.
- Retraining: The improved training set was used to fineâtune the models, and the cycle repeated. Each iteration reduced annotation time and improved model accuracy.
Finally, an inference API encapsulating the best models was deployed within the SkyMaps platform. Users can select crop type and view predicted weed locations on their fields in real time, enabling siteâspecific spraying.
Results and Discussion
The integrated pipeline achieved promising results. Objectâdetection accuracies exceeded 70Â % mAP on real test images with the YOLOv5 large model, while semantic segmentation models attained high IoU and Dice scores for both crops and weeds. Data augmentation increased training diversity, and superresolution improved the clarity of lowâresolution images. However, experiments showed that synthetic data alone cannot replace real data: models trained solely on augmented images performed poorly on real test sets. Active learning proved valuable for accelerating annotation and iteratively improving model performance.
The projectâs findings align with research on earlyâseason weed classification, which notes that highâquality datasets enable advanced deepâlearning models and can reduce pesticide use by enabling targeted control. By combining synthetic data generation, superresolution, detection, segmentation and active learning, the OmdenaâSkyMaps collaboration created a robust workflow for weed mapping.
Challenges and Solutions
Several challenges emerged during the project:
- Limited annotated data: Manual labelling is expensive and timeâconsuming. Solution: The team generated thousands of synthetic images and employed active learning to iteratively improve models with minimal human effort.
- Lowâresolution imagery: Highâaltitude flights produce coarse images. Solution: Superresolution models restored fine details, boosting detection performance.
- Class imbalance: Weed instances were less common than crop instances. Solution: Oversampling through synthetic augmentation and balanced training sets helped models learn from underârepresented classes.
- Model selection: Each architecture has tradeâoffs in speed and accuracy. Solution: By evaluating multiple models, the team identified YOLOv5 large and Mask RCNN as the best performers for detection and instance segmentation, respectively.
Impact and Future Work
Precision weed detection has clear environmental and economic benefits. Applying herbicides only where weeds are detected reduces chemical usage, lowers costs and protects soil and water. Early weed classification allows farmers to take action when plants are most vulnerable, improving yields and supporting sustainable agriculture. The OmdenaâSkyMaps project illustrates how deep learning and UAV imagery can deliver these benefits through an operational tool.
Future research could expand the dataset to include more weed species and crop types, incorporate multispectral or hyperspectral imagery to support plant health prediction using drone imagery, and develop lightweight models suitable for on-board processing.
Conclusion
The OmdenaâSkyMaps project shows how computer vision and drone imagery can enable precise, scalable weed detection in real agricultural settings. By combining synthetic data augmentation, superresolution, deep-learning models, and active learning, the team built a practical pipeline that accurately distinguishes crops from weeds and integrates directly into farm workflows.
While real, high-quality data remains essential for strong performance, the results highlight the potential of AI-driven weed mapping to reduce chemical use, lower costs, and support more sustainable farming. With expanded datasets and continued model refinement, such systems can become a core component of precision agriculture.



