📢 Download our 45-page white paper on AI Implementation in 2026

How AI Can Protect Our Water: Detecting The Invisible Threats Within

Omdena harnesses AI to reveal hidden water contaminants, empowering communities with innovative, low-cost, and scalable water protection.

April 5, 2024

12 minutes read

article featured image

Omdena’s AI-powered tool detects harmful microorganisms in drinking water using deep learning models like YOLOv8 and Detectron2. The low-cost system enables faster, accurate testing, helping communities and organizations ensure clean, safe water worldwide.

The Problem

Water is a source of life, and its quantity and quality are of utmost importance to human life. The United States and many other countries are committed to providing clean and safe water for most of its residents, but water borne diseases continue to be a challenge.

Reports from the CDC confirm that about 7.5 million water borne illnesses occur annually and approximately $3.3 billion in healthcare costs (in the USA) 

This means that these lives are continuously endangered – because contaminated water is tightly connected with the transmission of diseases such as cholera, diarrhea, type-A hepatitis, typhoid fever, dysentery, and polio.

The main causes of these diseases are microorganisms, viruses, and fecal matter in the drinking water because of aging infrastructure, chlorine-resistant pathogens and an increase in recreational water use. While water quality monitoring systems exist as part of a framework for many water infrastructures, several challenges such as timely availability of the results, reliability of the data, performance of existing systems, robustness of the system, and interoperability of the results to the end users, need to be improved on.

“CDC infographic showing yearly waterborne illnesses, hospitalizations, and deaths caused by germs in water.”

The Background

Rural water treatment plant and storage reservoir surrounded by fields and hills, representing water purification systems.

Each year, 4 billion cases of water-related diseases lead to 3.4 million deaths worldwide, making them one of the leading causes of mortality among children under five. The situation is even more severe in rural regions of developing countries, where access to clean water and sanitation is often limited.

Both developed and developing nations have established governmental and private systems to monitor drinking water quality and reduce the risks of waterborne illnesses. These systems typically rely on testing water samples from different sources to detect disease-causing microorganisms and other contaminants.

However, implementing such testing at scale demands significant investment and resources, which can limit the reach and efficiency of existing infrastructure.

The Goal

This project aimed to develop a low-cost, efficient method for detecting microorganisms (bacteria) in drinking water, helping to reduce the time required for water quality testing. It was designed to identify most of the microorganisms commonly found in drinking water while remaining user-friendly and practical at a local level.

The detection method includes a binary classification system that categorizes microorganisms as either harmful or not harmful. This approach ensures accessibility for users who may not have the technical knowledge to determine the safety of microorganisms by name alone.

The four main objectives of this project were:

  1. Develop a low-cost, easy-to-use method for detecting microorganisms (bacteria) in drinking water.

  2. Access and prepare suitable data of common microorganisms in water to train a deep learning model.

  3. Train a CNN (or equivalent) to recognize and classify bacteria using computer vision techniques.

  4. Deploy the trained model on a mobile device for real-world usability.

Our Approach

Coming up with the Design Concept

The goal was to develop an object detection deep learning model capable of identifying the presence of microorganisms in water samples. Since this method relies heavily on shape differentiation, it was ideal for this study—microorganisms typically vary in shape at least at the genus level, making visual detection feasible.

Object detection and classification are widely used techniques in computer vision. Object detection uses images, videos, or live camera feeds to recognize and locate objects within a frame in real time.

Object classification takes this process a step further. Once the object’s presence is confirmed, the system determines its class based on a predefined labeled dataset, enabling more precise categorization and analysis.

Planning out the project

Six-week AI project plan outlining stages from data research and model training to mobile deployment and evaluation.

Project Management

Good project management was key to keeping everything on track. Since our collaborators were spread across several countries and continents, clear communication and coordination were essential. The team needed a way to share responsibilities, manage documentation, and monitor progress effectively.

Omdena’s project management framework made this process smooth and inclusive. It focuses on using simple, publicly available tools that most contributors are already familiar with. This approach saves valuable time and avoids the long onboarding or training that many international collaborations require.

We used three main tools to manage the project:

  • Slack – for day-to-day communication and quick updates

  • Google Drive – for organizing and storing project files and documents

  • DagsHub – for data science collaboration and version control

Datasets

Finding a comprehensive dataset of microscopic images of microorganisms turned out to be one of the biggest challenges in this project. Building a deep learning model requires a large amount of data, yet detailed datasets in this field are rarely available to the public.

After some research, the team decided to use the Environmental Microorganism Image Dataset (EMDS). Over the years, several versions of this dataset have been released, and we used the two most recent ones — EMDS-6 and EMDS-7 — as our primary data sources.

EMDS-6 Dataset and Additional Data

While the EMDS-6 and EMDS-7 datasets were valuable, they didn’t include many of the common pathogenic microorganisms found in drinking water. To bridge that gap, the team searched for additional data, focusing on pathogens highlighted in WHO standards, such as E. coli, Salmonella, and Shigella.

In total, eleven new microorganism classes were identified and populated with images collected from reliable web sources, strengthening the dataset and making it more relevant to real-world water testing scenarios.

Pre-processing

Classification

The EMDS-7 dataset contained 41 classes of microorganisms, while EMDS-6 included 21. The research team reviewed each microorganism and labeled it as pathogenic (1) or non-pathogenic (0) based on its potential to cause disease.

Annotation

Comparison of microorganism images before and after AI annotation highlighting object detection boundaries

To label the data efficiently, the team used Roboflow, a tool that streamlines the annotation process and ensures accuracy and consistency across the dataset. This helped save time while maintaining high-quality data for model training.

Post-processing

Before training, the images went through several additional processing steps to improve model performance. These included checking for balanced classes, consistent magnification levels, uniform color and hue, and standardized image dimensions.

  • Balanced dataset: Ensures fair model training and prevents bias toward one class.

  • Color and hue consistency: Uniform coloration improves the precision and reliability of predictions.

  • Image dimension and resolution: Smaller, standardized images help the model train faster by reducing the number of pixels it needs to process.

Models and Their Metrics

The team developed two main types of models for this project:

1. Binary Classification Model

This model focuses on determining whether a water sample is contaminated or not. The dataset used for this task was carefully balanced, containing nearly equal numbers of samples from both classes — harmful and not harmful.

2. Object Detection

This model identifies and locates pathogens within water samples. Detecting microorganisms can be tricky since they are extremely small, so the team used algorithms known for their ability to handle small-object detection, including Faster R-CNN, YOLOv8, EfficientNet, Detectron2, and SSD.

This was addressed by using models that perform well at detecting small objects, such as Faster R-CNN, YOLOv8, EfficientNet, Detectron 2 and SSD.

Testing the Models

To identify the best-performing algorithm, the team developed and tested six different models — three focused on binary classification and three on object detection. Each model was evaluated for accuracy, memory usage, and inference time to determine its overall performance.

The models tested were:

  • YOLOv8-JCOLANO
  • YOLOv8m-cl
  • Rastogi’s object detection 1
  • Detectron2
  • Rastogi’s object detection 2
  • Rastogi image classification 1

The web application was developed using Streamlit and is currently deployed in Streamlit’s community cloud. Based on the testing we decided to use two models for binary classification and individual microorganism detection:

YOLOv8m-cls

  • Purpose: Binary classification for water contamination detection.
  • Usage: Users can determine if water is contaminated through binary classification.

Detectron2 Model

  • Purpose: Individual microorganism detection.
  • Usage: Users can detect and identify various microorganisms present in the provided images.

Constraints

Deployment on GitHub Repository

To deploy the application on Streamlit’s Community Cloud, the code needed to be hosted on a GitHub repository separate from the DagsHub repository. At the time of documentation, the deployment code was hosted on a personal account and needed to be transferred to an account associated with the San Jose chapter.

Large File Storage (LFS)

Since the Detectron2 model file exceeded 100MB, the team used Git Large File Storage (LFS) to handle and version large files efficiently. This setup requires installing and configuring Git LFS. Each GitHub account provides 1 GiB of free storage and 1 GiB of monthly bandwidth, which helps manage large assets within these limits.

Detectron2 Installation Handling

The automatic installation of necessary packages via the requirements.txt file poses an issue with Detectron2, particularly when installing it without a pre-built package. To address this, the latest pre-built packages for Detectron2 were used, which are built on torch 1.10. Since Yolo also relies on torch, and to prevent library conflicts, we maintain torch 1.10 throughout the system, limiting compatibility to Python 3.9.

Future Steps

Further Discussion on App Functionality and Hosting

The team plans to hold additional discussions to decide on the desired functionality and the best hosting options for the application. For now, the project uses the Streamlit Community Cloud as a free hosting platform. While this setup works for demonstration purposes, it limits the overall functionality and scalability of the application.

Add REST Capabilities

There are also plans to add REST API capabilities to the application, which would make integration with other systems easier and enable external access to its features. These next steps will require more detailed planning and collaboration to ensure the project continues to grow and serve broader use cases.

What We Learned

Challenges & Limitations

The main goal of the project was to create an app capable of detecting common pathogens in drinking water. However, one of the biggest challenges was the lack of publicly available datasets for some of the most common bacteria, such as Salmonella, E. coli, and Shigella. Because of this, the current version serves as a prototype that will continue to evolve as more data becomes available.

Better Datasets

Companies that own bacterial image data rarely make it public, as collecting and labeling these images is both time-consuming and expensive. To move forward, the team identified two possible options:

  1. Create new environmental microorganism datasets — though this would be complex and resource-intensive.

  2. Partner with universities or research institutions that may already have relevant datasets available.

Performance of model on field data

Further testing is needed to evaluate how well the model performs under real-world conditions — specifically, comparing results from paper microscopes versus digital microscopes to understand how image quality affects detection accuracy.

User friendly deployment

The deployment approach also needs refinement. The next step is to identify the best method to make the app more accessible — ideally enabling it to run on mobile devices and work offline, making it more practical for field use in remote areas.

Opportunities

A tool for researchers

The models and applications developed through this project can be valuable resources for researchers who wish to build upon or expand the work. With further development, these tools could support advanced studies in microbiology, environmental science, and public health.

Potential as a community-based tool

There’s also strong potential for this system to become a community-level tool. One of the main challenges in detecting microorganisms that require high magnification is the cost of microscopes capable of identifying them. With proper support from NGOs or government initiatives, funding could help make such equipment more accessible. This would allow communities to test their own water sources quickly and cost-effectively, empowering local efforts to improve water safety. In parallel, many innovators and organizations are making significant progress in global water purification and management. Learn more about the top water treatment companies that are leading the way toward cleaner, safer water systems worldwide.

Time Frame

The entire project was completed in a 6 week period, between September and October 2023. In this time we were able to achieve all of the following:

  • Sourcing difficult to find datasets and processing them.
  • Testing and evaluating multiple AI models to find the correct one for this application.
  • Performing a thorough EDA to garner valuable insights.
  • Deploying the frontend as a publicly available demo.

Further Applications of This Technology

Scientist working with water sample testing equipment in laboratory for AI-based water contamination analysis

Access to clean drinking water is becoming increasingly scarce in many parts of the world. The system developed through this project can play a vital role in ensuring that this precious resource remains free from disease-causing microorganisms.

Beyond water quality testing, this technology and its underlying methodology have potential uses in several other fields:

1. Medicine

Waterborne diseases are not always easy to detect or diagnose. A quick, reliable testing system can help doctors identify the cause of illnesses faster and begin treatment sooner.

2. Agriculture

Conntaminated water can easily spread through crops grown with infected water sources. This tool can help farmers ensure that the water they use for irrigation is clean and safe.

3. Epidemiology

Many outbreaks originate from contaminated shared water sources. Regular monitoring with systems like this can help detect contamination early and prevent widespread epidemics.

Conclusion

This project demonstrates the powerful role that AI-based systems can play in maintaining water quality. It shows how technology can help conduct large-scale testing efficiently while reducing the financial and operational burden on governments and organizations.

The tool achieved a high level of accuracy and reliability, though it was limited by factors such as the availability of comprehensive datasets. Even with these challenges, it stands as a strong proof of concept, highlighting what’s possible when AI is applied to public health and environmental monitoring. With the right data and resources, similar models could be developed into robust, real-world solutions.

We also hope this project serves as an inspiration for others working toward a world free from waterborne diseases. To support collaboration and progress in this area, we’ve made all project resources publicly available for anyone interested in exploring or building upon our work.

Want to work with us too?

FAQs

The project aimed to create a low-cost, AI-powered tool to detect harmful microorganisms in drinking water, reducing testing time and enabling accessible, scalable water quality monitoring.
Using deep learning and computer vision, Omdena’s team trained models to identify microorganisms in water samples through image detection and classification — distinguishing between harmful and harmless bacteria.
Contaminated water can spread diseases like cholera, typhoid, and hepatitis A, which affect millions globally. Early detection saves lives and helps ensure sustainable access to safe drinking water.
The team tested and compared YOLOv8, Detectron2, and other deep learning models for both binary classification (contaminated/not contaminated) and object detection (specific microorganism identification).
Omdena’s model thrives on global collaboration — uniting AI engineers, data scientists, and domain experts from multiple countries to co-create impactful, real-world AI solutions.
The main challenges were limited availability of high-quality datasets, ensuring model accuracy on real-world samples, and managing deployment on low-resource environments like mobile devices.
This solution can help governments, NGOs, and local communities monitor water safety affordably — reducing disease outbreaks and ensuring cleaner, safer water for millions.
Anyone passionate about AI, sustainability, or social impact can join Omdena’s collaborative projects through Omdena.com and help build AI solutions that make a difference worldwide.