Three Key Steps To Be Successful In A Data Science Project

Apr 12, 2020
Rate this post
(2 votes)
Successful Data Science Project

My journey from MOOC projects into the real world.


By Omdena Collaborator Takashi Daido


Recently, I joined the Omdena project “Providing Renewable Energy for African Communities”.

Our client was RA365, an NGO that aims to improve energy availability for those who live without electricity, which are more than 100 million people in Nigeria.

Off-grid, solar energy puts local people in control

Off-grid, solar energy puts local people in control


It was a great experience because of not only working for social good but also learning invaluable lessons in a real project with 40 other collaborators from around the world, which makes an Omdena project unique.


My background and path to Omdena

I had taken several MOOCs of Data Science and I could manipulate major python libraries (Pandas, Numpy, Matplotlib, scikit-learn) and utilize machine learning algorithms. I also completed several hands-on projects in courses and was eager to apply my skills to real problems.

I found Omdena to be the perfect choice for me their projects run for eight weeks and involve people from different backgrounds and experience levels.

Photo by Braden Collum on Unsplash

Photo by Braden Collum on Unsplash


My first shock: The reality of a data science project

Although I had high aspirations to contribute to the project, the project start simply illustrated how inexperienced I was.

After I attended the kickoff meeting, I was a little shocked by two facts:

  • The client didn’t know what exact data-products it wanted.
  • The client didn’t have its original dataset nor decide which one to use from a vast amount of open data for the project. It just had a vague intention to use satellite images …..for something!

No defined goal? No dataset to analyze? Satellite image…for what? This doesn’t happen in MOOC’s hands-on projects and Kaggle competitions where we usually have analysis-ready datasets, clear purposes, and guidelines of analysis. I have heard that the real dataset was often messy. But, I haven’t expected that we didn’t have any dataset right from the start!

…..Fortunately, there were some members experienced with geospatial analysis and they seemed to had an idea to use satellite images. Still, we needed to communicate with our client, learn domain knowledge, discuss the direction and final outputs of analysis, collect data and master new tools for this project.

One of the leading collaborators of the project has provided a detailed article summarizing the project results and the process of our analysis.

So, I’ll share my 3 steps you have to be involved in for future Omdena collaborators or any other person interested to be successful in a real project!


Step 1: Understanding the business context

The first step is understanding the value creation of the project.

In other words, we needed to figure out how what kind of value the client provides to whom, and then imagine how a data product (from simple visualization to some ML model) assists the operation.

In our case, we evaluated how the NGO intended to improve energy availability in Nigeria with solar power, and how its operation was so far.

Through frequent communication, we got the following information:

  • The NGO planned to deploy “a solar container”, which generates 50K kWp at maximum, works 24 hours a day with electricity storage and affords 4000 people or 400 households. However, it costs about 300,000 euros (about 33,6645 dollars).
  • The project was at its initial stage. NGO hasn’t raised money to install any solar container yet.
  • NGO has to decide where to put containers, considering factors such as demand and solar availability.

Note that this information was not presented in advance and not in a straight manner. Rather, questions were raised during the analysis process and we iteratively asked questions and got answers.

Based on this understanding, we could determine what our final product should be and which datasets are necessary for that.

Photo by Paul Schafer on Unsplash

Photo by Paul Schafer on Unsplash

Step 2: Learning Minimum Viable Domain Knowledge

Many expert data scientists claim that domain knowledge is crucial for better analysis (How is it possible to collect data, engineer features, and explain outputs for client and stakeholders !?).

In my case, I needed to learn some vocabulary about solar power and energy situation in Nigeria, which were completely unfamiliar topics to me. I wondered how much I should learn for data analysis? We were no experts in that domain after all.

What I did was the following:

  • Read three to five easy articles about a topic and write down words that appear frequently and are unfamiliar to you.
  • Search definitions of these words (Sharing the vocabulary list is also an asset for project members.)
  • Understand how a variable can be measured and what “unit” is used (ex. kWh for Energy production, W/m² for solar irradiance).
  • Understand how variables are interconnected as much as possible. You may think it is part of explanatory data analysis (EDA). It’s correct. However, it is usual that domain experts more or less have known how fundamental variables interact with each other. We should study it before exploring them from scratch by EDA.
  • Also, note that some variables are composite from other variables with a mathematical equation and their relationship is very clear.

If you get the feeling that you cover major variables on the topic and have the big picture of how these variables are measured and interacted, it’s a good sign to start your analysis.


Step 3: Data Collection

There are many open-data platforms serving hundreds or thousands of datasets. We can download them by just clicking a button. However, we should select reliable and most suitable datasets for a project.

Here is a set of questions that helps you with this:

1. Who created the dataset?

Public organizations, research institutes, and commercial bodies create open data. You need to validate their authority to use their dataset to some extent.

2. Why it is created? How it is used?

Datasets are created for certain purposes. They may be biased or not be reliable, aligned with the purpose. You can check the purpose and are more conscious of how it may be biased.

3. Are they well-documented?

If you can’t get enough documentation for a dataset, it is difficult to interpret variables, combine it with other datasets, and explain outputs.

4. It is updated periodically?

Usually, we prefer datasets to be updated periodically so that we can update our model and our solution keeps to be valid.



The Omdena project got me out of my comfort zone and let me realize what I should learn to be competent in the real project.

Understanding the business context, learning minimum viable domain knowledge, collecting data with reasonable criteria are crucial parts for you to shine on the project.

I am looking forward to tackling the next challenge!


Useful Resources


About Omdena

Building AI Solutions Collaboratively 

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

Learn more about the power of Collaborative AI.

You might also like:
NGOs Events

Subscribe to get our Case Studies, Projects, and Real-World AI

  • Subscribe today to receive updates on the latest news!
  • We will send you periodic updates.
  • And many more information: Events, Courses,…
  • We respect your privacy.


Leave a Comment
Submit a Comment

Your email address will not be published.