From Data Science Courses to Tackling Energy Shortage in Nigeria – by Takashi from Japan
September 25, 2022
Author: Takashi Daido, Omdena collaborator.
Read the journey of Omdena collaborator, Takashi Daido, and how he went from MOOC (Massive Online Open Courses) to real-world projects.
I joined the Omdena project “Providing Renewable Energy for African Communities” in 2019. Our client was RA365, an NGO that aims to improve energy availability for those who live without electricity, which is more than 100 million people in Nigeria.
It was a great experience because of working for social good and learning invaluable lessons in a real project with 40 other collaborators from around the world, making an Omdena project unique.
My background and path to Omdena
I had taken several MOOCs of data science, and I could manipulate major python libraries (Pandas, Numpy, Matplotlib, sci-kit-learn) and utilize machine learning algorithms. I also completed several hands-on projects in courses and was eager to apply my skills to real problems.
So, I found Omdena to be the perfect choice for me. Their projects run for eight weeks and involve people from different backgrounds and experience levels.
My first shock: The reality of a data science project
Although I had high aspirations to contribute to the project, the project’s start simply illustrated how inexperienced I was.
After I attended the kickoff meeting, I was a little shocked by two facts:
- The client didn’t know what exact data products needed.
- The client neither had the original dataset nor decided which one to use from a vast amount of open data for the project. It just had a vague intention to use satellite images for something!
This doesn’t happen in MOOC’s hands-on projects and Kaggle competitions, where we usually have analysis-ready datasets, specific purposes, and analysis guidelines. I have heard that the real dataset was often messy. But, I hadn’t expected that we didn’t have any dataset right from the start!
Fortunately, some members experienced with geospatial analysis seemed to have an idea to use satellite images. Still, we needed to communicate with our clients, learn domain knowledge, discuss the direction and final outputs of analysis, collect data and master new tools for this project.
One of the leading collaborators of the project has provided a detailed article summarizing the project results and our analysis process.
So, I’ll share my three steps for the future Omdena collaborators or any other person interested to be successful in a real project!
Step 1: Understanding the business context
The first step is understanding the value creation of the project.
In other words, we needed to figure out the kind of value the client provides and to whom. Then imagine how a data product (from simple visualization to some ML model) assists the operation.
In our case, we evaluated how the NGO intended to improve energy availability in Nigeria with solar power, and how its operation was so far.
Through frequent communication, we got the following information:
- The NGO planned to deploy “a solar container” that generates 50K kWp at maximum, works 24 hours a day with electricity storage and affords 4000 people or 400 households. However, it costs about 300,000 euros. The project was at its initial stage. NGO hasn’t raised money to install any solar container yet.
- NGO has to decide where to put containers, considering factors such as demand and solar availability.
Note that this information was not presented in advance and not in a straight manner. Rather, questions were raised during the analysis process and we repeatedly asked questions and got answers.
Based on this understanding, we could determine what our final product should be and which datasets are necessary for that.
Step 2: Learning Minimum Viable Domain Knowledge
Many expert data scientists claim that domain knowledge is crucial for better analysis (How is it possible to collect data, engineer features, and explain outputs for clients and stakeholders !?).
In my case, I needed to learn some vocabulary about solar power and the energy situation in Nigeria, which were completely unfamiliar topics to me. I wondered how much I should learn for data analysis. We weren’t experts in that domain after all.
What I did was the following:
- Reading three to five easy articles about a topic and writing down words that appear frequently and are unfamiliar to you
- Search definitions of these words (Sharing the vocabulary list is also an asset for project members)
- Understanding how a variable can be measured and what “unit” is used (ex. kWh for energy production, W/m² for solar irradiance)
- Understanding how variables are interconnected as much as possible
You may think it is a part of explanatory data analysis (EDA); it’s correct. However, it is usual that domain experts more or less have known how fundamental variables interact with each other. We should study it before exploring them from scratch by EDA.
- Noting that some variables are composite from other variables with a mathematical equation and their relationship is very clear.
If you get the feeling that you cover major variables on the topic and have the big picture of how these variables are measured and interacted, it’s a good sign to start your analysis.
Step 3: Data Collection
There are many open-data platforms serving hundreds or thousands of datasets. We can download them by just clicking a button. However, we should select reliable and most suitable datasets for a project.
Here is a set of questions that helps you with this:
1. Who created the dataset?
Public organizations, research institutes, and commercial bodies create open data. You need to validate their authority to use their dataset to some extent.
2. Why is it created? How is it used?
Datasets are created for certain purposes. They may be biased, not reliable, or not aligned with the purpose. You can check the purpose and become more conscious of how it may be biased.
3. Are they well-documented?
If you can’t get enough documentation for a dataset, it is difficult to interpret variables, combine it with other datasets, and explain outputs.
4. Is it updated periodically?
Usually, we prefer datasets to be updated periodically so that we can update our model, and our solution keeps to be valid.
Conclusion
The Omdena project got me out of my comfort zone and let me realize what I should learn to be competent in a real project.
Understanding the business context, learning minimum viable domain knowledge, and collecting data with reasonable criteria are crucial parts for you to shine on the project.
I am looking forward to tackling the next challenge!