My journey from MOOC projects into the real world.
By Omdena Collaborator Takashi Daido
Recently, I joined the Omdena project “Providing Renewable Energy for African Communities”.
Our client was RA365, an NGO that aims to improve energy availability for those who live without electricity, which are more than 100 million people in Nigeria.
It was a great experience because of not only working for social good but also learning invaluable lessons in a real project with 40 other collaborators from around the world, which makes an Omdena project unique.
My background and path to Omdena
I had taken several MOOCs of Data Science and I could manipulate major python libraries (Pandas, Numpy, Matplotlib, scikit-learn) and utilize machine learning algorithms. I also completed several hands-on projects in courses and was eager to apply my skills to real problems.
I found Omdena to be the perfect choice for me their projects run for eight weeks and involve people from different backgrounds and experience levels.
My first shock: The reality of a data science project
Although I had high aspirations to contribute to the project, the project start simply illustrated how inexperienced I was.
After I attended the kickoff meeting, I was a little shocked by two facts:
- The client didn’t know what exact data-products it wanted.
- The client didn’t have its original dataset nor decide which one to use from a vast amount of open data for the project. It just had a vague intention to use satellite images …..for something!
No defined goal? No dataset to analyze? Satellite image…for what? This doesn’t happen in MOOC’s hands-on projects and Kaggle competitions where we usually have analysis-ready datasets, clear purposes, and guidelines of analysis. I have heard that the real dataset was often messy. But, I haven’t expected that we didn’t have any dataset right from the start!
…..Fortunately, there were some members experienced with geospatial analysis and they seemed to had an idea to use satellite images. Still, we needed to communicate with our client, learn domain knowledge, discuss the direction and final outputs of analysis, collect data and master new tools for this project.
One of the leading collaborators of the project has provided a detailed article summarizing the project results and the process of our analysis.
So, I’ll share my 3 steps you have to be involved in for future Omdena collaborators or any other person interested to be successful in a real project!
Step 1: Understanding the business context
The first step is understanding the value creation of the project.
In other words, we needed to figure out how what kind of value the client provides to whom, and then imagine how a data product (from simple visualization to some ML model) assists the operation.
In our case, we evaluated how the NGO intended to improve energy availability in Nigeria with solar power, and how its operation was so far.
Through frequent communication, we got the following information:
- The NGO planned to deploy “a solar container”, which generates 50K kWp at maximum, works 24 hours a day with electricity storage and affords 4000 people or 400 households. However, it costs about 300,000 euros (about 33,6645 dollars).
- The project was at its initial stage. NGO hasn’t raised money to install any solar container yet.
- NGO has to decide where to put containers, considering factors such as demand and solar availability.
Note that this information was not presented in advance and not in a straight manner. Rather, questions were raised during the analysis process and we iteratively asked questions and got answers.
Based on this understanding, we could determine what our final product should be and which datasets are necessary for that.
Step 2: Learning Minimum Viable Domain Knowledge
Many expert data scientists claim that domain knowledge is crucial for better analysis (How is it possible to collect data, engineer features, and explain outputs for client and stakeholders !?).
In my case, I needed to learn some vocabulary about solar power and energy situation in Nigeria, which were completely unfamiliar topics to me. I wondered how much I should learn for data analysis? We were no experts in that domain after all.
What I did was the following:
- Read three to five easy articles about a topic and write down words that appear frequently and are unfamiliar to you.
- Search definitions of these words (Sharing the vocabulary list is also an asset for project members.)
- Understand how a variable can be measured and what “unit” is used (ex. kWh for Energy production, W/m² for solar irradiance).
- Understand how variables are interconnected as much as possible. You may think it is part of explanatory data analysis (EDA). It’s correct. However, it is usual that domain experts more or less have known how fundamental variables interact with each other. We should study it before exploring them from scratch by EDA.
- Also, note that some variables are composite from other variables with a mathematical equation and their relationship is very clear.
If you get the feeling that you cover major variables on the topic and have the big picture of how these variables are measured and interacted, it’s a good sign to start your analysis.
Step 3: Data Collection
There are many open-data platforms serving hundreds or thousands of datasets. We can download them by just clicking a button. However, we should select reliable and most suitable datasets for a project.
Here is a set of questions that helps you with this:
- Who created the dataset?
Public organizations, research institutes, and commercial bodies create open data. You need to validate their authority to use their dataset to some extent.
2. Why it is created? How it is used?
Datasets are created for certain purposes. They may be biased or not be reliable, aligned with the purpose. You can check the purpose and are more conscious of how it may be biased.
3. Are they well-documented?
If you can’t get enough documentation for a dataset, it is difficult to interpret variables, combine it with other datasets, and explain outputs.
4. It is updated periodically?
Usually, we prefer datasets to be updated periodically so that we can update our model and our solution keeps to be valid.
The Omdena project got me out of my comfort zone and let me realize what I should learn to be competent in the real project.
Understanding the business context, learning minimum viable domain knowledge, collecting data with reasonable criteria are crucial parts for you to shine on the project.
I am looking forward to tackling the next challenge!
Building AI Solutions Collaboratively
Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.