AI Insights

Six Weeks in Open Source | Lessons, Challenges and Contribution to Growth

October 19, 2024


article featured image

Urban gardening isn’t just a passing trend; it’s becoming a necessity in our increasingly crowded cities, where green spaces are treasures. Plants don’t just beautify our surroundings — they purify the air, cool our neighbourhoods, and create tranquil escapes amidst the urban hustle. But let’s be honest, choosing the plants that are well-suited to the local environment has become a task that can be daunting for many, especially when you want your garden to thrive without constant upkeep.

Here’s a fun tidbit: Picking the right plants can cut your watering needs by up to 50%! Not only does this conserve water, but it also ensures your garden flourishes with less effort.

That’s where Cultivate: Enhancing Urban Gardening with Geospatial Intelligence steps in, making urban gardening simpler and more rewarding. Using smart, data-driven insights, we provide gardeners with personalised plant recommendations perfectly tailored to their specific environmental conditions. In this blog, I’ll take you along on my journey as a machine learning engineer, working with an incredible team to turn this vision into reality.

Data Challenge and Problem Statement

As urban gardeners, we’ve all been there — standing in front of a garden center or browsing online, overwhelmed by the sheer variety of plants, wondering which ones will actually thrive in our little slice of the city. You plant something with high hopes, only to watch it struggle, wither, or barely produce the results you envisioned. It’s frustrating, and it’s not your fault. The truth is, gardening success in urban settings is a delicate balance of choosing the right plants for your specific environment, and that’s no easy feat.

The challenge lies in the lack of localised knowledge. Each urban garden is unique — whether it’s a rooftop in a densely packed city or a small patch of soil by your window. Soil quality, sunlight exposure, microclimates, and other environmental factors vary drastically from one spot to another. Without tailored guidance, even the most enthusiastic gardener can end up with poor plant health, low yields, and a growing sense of discouragement.

That’s the problem we set out to solve. We wanted to take the guesswork out of urban gardening by leveraging the power of data and technology. Imagine having a personal gardening assistant that understands not just plants, but your specific environment — whether you’re dealing with too much shade, unpredictable weather, or challenging soil.

Our mission was to create a solution that offers personalised plant recommendations based on your exact geolocation and environmental factors, ensuring that every plant you choose has the best possible chance to thrive.

We knew this could be a game-changer for urban gardeners, transforming gardening from a frustrating trial-and-error process into a rewarding, science-backed experience. It’s about more than just growing plants; it’s about cultivating success and making urban spaces greener, one well-chosen plant at a time.

Our Approach

The project aimed to address the challenge of selecting suitable plants for urban environments by leveraging a combination of data-driven insights and advanced machine learning techniques.

Below is a detailed breakdown of our approach.

1. Comprehensive Data Collection and Exploratory Data Analysis

Our first step was to gather high-quality data that would form the foundation of our recommendation system. This involved collecting diverse datasets, each critical to making accurate plant recommendations.

Plant Data

We sourced detailed plant information from the Royal Horticultural Society (RHS). This dataset encompassed 50,663 plant varieties and included key attributes such as:

  • Growth Characteristics: Information on plant height, spread, and growth habits.
  • Environmental Preferences: Details on sunlight requirements, hardiness zones, soil pH preferences, water needs, and drought tolerance.
  • Seasonal and Lifespan Data: Data on flowering seasons, foliage color changes, and plant longevity.

Soil Data

Soil plays a pivotal role in plant health, so we incorporated soil data from the ORNL DAAC Soil Data. This dataset provided extensive information on:

  • Soil Texture and Composition: Including sand, silt, clay proportions, and organic matter content.
  • Soil pH and Nutrient Availability: Crucial for determining the suitability of different plants for specific locations.
  • Drainage and Water Retention Properties: Factors influencing the moisture content available to plants.

We visualized the distribution of soil samples across the cleaned continent names.

Distribution of soil samples across continents

Distribution of soil samples across continents — Source: Omdena

A bar chart was created to visualise the counts of various soil types. The chart included the top 20 soil types to focus on the most significant categories.

Count of soil types — Source: Omdena

Count of soil types — Source: Omdena

To further analyse the soil type distribution, a pie chart was generated, showing the percentage of the top 25 soil types.

Percentage distribution of top 25 soil types — Source: Omdena

Percentage distribution of top 25 soil types — Source: Omdena

An interactive scatter plot was created to map the geographical distribution of the top 25 soil types. This plot uses latitude and longitude coordinates to pinpoint where each soil type is found globally, offering a visual exploration of soil types in different regions.

Geographical distribution of soil types — Source: Omdena

Geographical distribution of soil types — Source: Omdena

These visualisations provided in this analysis lays a strong foundation for understanding the global distribution of soil types, which can be critical for further environmental studies and applications.

Climate Data

Understanding the climatic conditions was essential to tailoring recommendations. We utilised WorldClim Climate Data, which covered 28,438 global locations and included:

  • Temperature Data: Average, minimum, and maximum temperatures for different seasons.
  • Precipitation Patterns: Monthly and annual rainfall data, essential for matching plants to local water availability.
  • Solar Radiation and Wind Speed: These factors influence evaporation rates and plant water needs.

We explore the distribution of several crucial climate variables using histograms with kernel density estimates (KDE).

Distribution analysis of key climate variables — Source: Omdena

Distribution analysis of key climate variables — Source: Omdena

The correlation matrix is a key tool to understand the relationships between various climate variables.

Correlation Analysis — Source: Omdena

Correlation Analysis — Source: Omdena

This heatmap visualises the correlation between important climate variables, such as temperature, precipitation, and seasonality. The colour intensity indicates the strength of the relationship between variables, with positive correlations in warmer colours and negative correlations in cooler ones. This analysis helps identify which variables are closely related, providing insights into how different aspects of climate interact.

Geospatial plots provide a spatial perspective on key climate variables.

Annual Mean Temperature by City

Annual Precipitation by City

Temperature Seasonality by City

Precipitation Seasonality by City

Geospatial Analysis by key climate variables — Source: Omdena

This boxplot summarises the distribution of solar radiation across different months, providing a clear view of the variability in sunlight exposure throughout the year.

Monthly Solar Radiation Analysis — Source: Omdena

Monthly Solar Radiation Analysis — Source: Omdena

These visualisations collectively provide a comprehensive understanding of the key climate variables in the dataset, highlighting their distributions, correlations, and spatial patterns.

2. Rigorous Data Preprocessing

The success of any machine learning model relies heavily on the quality of the data used for training. Our initial exploration involved three datasets: Plant data, Soil data, and Climate data.

Here’s how we approached the preprocessing and feature engineering phase.

Initial Data Cleaning

We started by examining the plant dataset, which originally contained 50,751 plant names. However, many of these entries had missing values across various features. To ensure the dataset’s usability, we filtered out plants that had fewer than four feature categories with valid data. This resulted in a more manageable dataset.

Although this reduced dataset was more concise, it still contained many null values. We considered further trimming the dataset by removing remaining null values, but decided against it to preserve as much data as possible.

Feature Standardization

To maintain consistency, we standardized key features across the datasets. For example, plant spread, height, and hardiness were all converted into a uniform format, with binary encoding (0 and 1) where applicable. This made it easier to process and integrate the data across different files.

We also introduced a new feature called “Habit” with its corresponding sub-features. Additionally, redundant features like Genus, Propagation, Pruning, Diseases, and Pests were removed from the main dataset. However, these features are stored separately and can be integrated into output descriptions after the model predicts the plant type.

Merging Datasets

We attempted to merge the cleaned datasets in order to enrich the dataset with additional features. However, it became clear that merging all the datasets into a single file would be impractical. The cleaned plat dataset already included encoded features such as sunlight (full, partial, shade), garden type, season, height, hardiness, pH, and soil type. On the other hand, the soil dataset primarily contained soil type information mapped to geographic coordinates (longitude and latitude).

Given the nature of the data, it was decided to treat the preprocessed plant dataset as the main dataset. The soil dataset would be used separately to provide soil type information during the prediction phase, based on user input.

Complete Plant Data preprocess

Complete Plant Data preprocess

Complete Plant Data preprocess

Complete Plant Data preprocess

Complete Plant Data preprocess

Some glimpses of the main dataset (preprocessed plant dataset) — Source: Omdena

Finalised Datasets

Main Dataset: Preprocessed Plant dataset — containing the core features required for model training and prediction.

Supplementary Dataset: Preprocessed Soil dataset — uses for soil type mapping during predictions.

Additional Dataset: Climate dataset — includes more climate-related data and other relevant information.

These steps ensured that our dataset was well-structured, consistent, and ready for the subsequent phases of model development.

3. Advanced Machine Learning Model Development

Given the complexity of matching plants to their optimal environments, we experimented with various machine learning models. The process involved several stages.

Developing the machine learning model for this project was a journey full of challenges, learning, and ultimately, innovation. Our goal was to create a model that could accurately recommend the top five plants most suited to a gardener’s specific location and environmental conditions. However, this wasn’t a straightforward task.

One of the biggest hurdles we encountered was the issue of class imbalance within our dataset. Many plant species had very few instances, making it difficult to train a robust model. Typically, in situations like this, we might turn to techniques like SMOTE (Synthetic Minority Over-sampling Technique) to balance the classes. But here’s the catch — SMOTE wasn’t an option for us. Why? Because SMOTE requires the number of samples in most instances to be greater than or equal to the number of neighbors, a condition that our dataset didn’t meet. This meant we had to think outside the box and find an alternative approach.

Given these constraints, we explored algorithms that could work effectively with our data structure. We decided to focus on search algorithms, specifically those that excel in finding similarities in N-dimensional spaces. Algorithms like Cosine Similarity, Content-Based Search, and Nearest Neighbors Search emerged as strong contenders. These algorithms are particularly well-suited for building recommendation systems, which is exactly what we needed — a system that could identify the top five plants that are most likely to thrive in a given location based on multiple environmental factors.

After extensive discussions and testing of various algorithms, we decided to adopt the Cosine Similarity model to build our recommender system.

Decision to Use Cosine Similarity: This decision was guided by its ability to suggest the top 5 plants by measuring the similarity between different plant features and identifying which plants were most closely aligned with the environmental conditions provided by the user. The model provided a balanced trade-off between accuracy and the relevance of recommendations, making it the best fit for our needs.

In the end, our model became more than just a tool for matching plants to locations — it became a bridge between gardeners and the often-overwhelming world of plant selection. By providing tailored recommendations, our model empowers urban gardeners to make informed decisions, ensuring their gardens not only survive but thrive.

4. API and Service Development

To bring our vision to life and make it easily accessible for everyone, we crafted a powerful yet user-friendly service using FastAPI. This service is the heart of our solution, enabling gardeners to effortlessly input their specific parameters — think location, sunlight exposure, and garden type — and instantly receive personalized plant recommendations.

Imagine getting the perfect plant suggestions in a blink! Our API is optimized for real-time processing, ensuring that as soon as you input your gardening details, the recommendations are delivered to you immediately, with no waiting around. We knew that this service needed to cater to more than just a handful of users. So, we designed the API with scalability in mind. Whether it’s a single gardener seeking advice or a large organization looking to assist hundreds, our service handles requests seamlessly, ensuring that everyone gets the help they need, when they need it.

5. User-Centric Dashboard Development

We didn’t stop at just delivering recommendations. We wanted users to truly interact with the data and make informed decisions, so we built a sleek, intuitive dashboard that brings all the information to your fingertips.

Interactive Maps

By using Folium, we crafted dynamic maps that not only show your location but also provide visual cues on which plants would thrive in your area. It’s like having a gardening expert guide you, but in map form!

Custom Filters and Options

We know that every garden is unique, and so are the preferences of every gardener. That’s why our dashboard is packed with customizable filters, allowing you to tailor the recommendations based on aesthetic preferences, maintenance levels, or specific garden conditions like container gardening. The power is in your hands to create the garden of your dreams.

Data Visualization

Making decisions becomes a breeze with our clear, insightful visualisations. We’ve translated key data points into easy-to-understand charts and graphs, so you can see at a glance which plants are your best bets and make confident choices.

Visuals of the application in the plant recommendation

Visuals of the application in the plant recommendation

Visuals of the application in the plant recommendation

Visuals of the application in the plant recommendation

Visuals of the application in the plant recommendation — Source: Omdena

We believe in growing and improving just like the gardens we help cultivate. That’s why we embraced an iterative approach, constantly refining our models and interfaces based on real user feedback.

Beta Testing: We didn’t just launch our solution and hope for the best. Instead, we conducted beta testing with a dedicated group of urban gardeners. Their insights were invaluable, helping us tweak and perfect the usability and accuracy of our recommendations.

User Feedback Loops: We’ve built feedback loops directly into the dashboard, so you can tell us what’s working and what’s not. This active engagement ensures that our system isn’t just static but evolves continuously to meet your needs, delivering even better recommendations as we go.

Application and Future Directions

This application has significantly transformed urban gardening by making it accessible and successful for city dwellers. Through personalised, data-driven recommendations, it empowers individuals to create thriving green spaces, enhancing the livability of urban environments and fostering a broader shift towards sustainability. The application has democratised gardening knowledge, inspiring a new generation of urban gardeners, while also building a supportive community committed to eco-friendly practices.

Looking to the future, the project aims to expand its reach globally, adapting to different climates, languages, and cultural gardening practices. We are committed to deepening our focus on sustainability by promoting environmentally friendly gardening techniques and reducing users’ environmental footprints. Additionally, the project envisions playing a pivotal role in education and influencing urban planning and environmental policies, further embedding green spaces into city life. Ultimately, this is more than just an app — it is a catalyst for a greener, more sustainable urban future.

Future Possibilities

The project has laid a strong foundation for revolutionising urban gardening, but the journey doesn’t stop here. Looking ahead, there are several exciting opportunities to enhance and expand the capabilities of this application.

1. Advanced Predictive Analytics

While our current model provides accurate plant recommendations, we can further refine this by incorporating advanced predictive analytics. By analysing historical gardening data, weather patterns, and long-term climate forecasts, we could offer gardeners insights into how their plants will perform over time, allowing them to plan for seasonal changes and potential climate shifts.

2. Integration with IoT Devices

The next logical step is to integrate our platform with Internet of Things (IoT) devices such as smart soil sensors, weather stations, and automated irrigation systems. These devices could feed real-time data directly into the application’s system, enabling even more precise recommendations and automating garden maintenance tasks like watering and fertilising based on actual conditions.

3. Expanding Plant Database

Currently, our database covers a wide variety of plants, but there’s always room for growth. By continually expanding the database to include more plant species, especially rare and native varieties, we can cater to an even broader audience. This expansion could also involve incorporating data on companion planting, which would allow the system to suggest plant combinations that thrive together.

4. Enhanced User Personalization

We can further improve the user experience by introducing more personalised features. For instance, by analysing user behaviour and preferences, the system could tailor recommendations not just based on environmental factors but also on the gardener’s past successes, aesthetic preferences, and gardening style. Over time, the application could become a truly personalised gardening assistant, learning and adapting to each user’s unique needs.

5. Augmented Reality (AR) Integration

Imagine using AR to visualise how different plants will look in your garden before you even plant them. By integrating AR technology, users could take a virtual tour of their future garden, experimenting with different layouts and plant choices to create the perfect design.

Conclusion

The Cultivate: Enhancing Urban Gardening with Geospatial Intelligence project successfully tackled the challenge of selecting suitable plants for urban environments by providing personalised, data-driven recommendations. This project was a testament to the power of collaboration, innovation, and perseverance.

Behind the success of the project, there’s a team of dedicated, talented, and diverse individuals who worked tirelessly over several weeks to bring this vision to life. We faced numerous challenges, from data integration to model selection, but through brainstorming, experimentation, and collaboration, we overcame them all. A special thanks to the team for their unwavering commitment and to the task team for their exceptional work on data resolution. This project not only enhances urban gardening practices but also contributes to the broader goal of environmental sustainability. I am grateful to have been a part of this impactful journey, and I look forward to seeing how Cultivate: Enhancing Urban Gardening with Geospatial Intelligence will help urban gardeners thrive.

Learn more about the projects at Omdena

This article is written by Nethmi de Silva.

Want to work with us too?

media card
Revolutionizing Short-term Traffic Congestion Prediction with Machine Learning
media card
AI Transforms ESG Monitoring: Empowering SMEs and Combating Greenwashing
media card
How We Leveraged Advanced Data Science and AI to Make Farms Greener
media card
Environmental Sustainability and AI: A Synergistic Approach for a Greener Future for Companies