Why Kaggle Is Not Inclusive and How to Build a More Inclusive Data Science Platform
January 12, 2022
How to build a more inclusive Data Science platform?
‘If you want to be good at swimming in pools, that is fine, go for Kaggle. If you want to be good on the open sea, go for Omdena’ — Leonardo Sanchez, Omdena challenge collaborator from Brazil.
‘What I learned in the past couple of months in Omdena’s AI challenge is much more than what I learned from all the competitions combined’ — Murli Sivashanmugam, Senior Data Scientist, Omdena Challenge collaborator from India.
Don’t get me wrong, I think Kaggle is a great data science platform for data scientists to hone their skills and apply the lessons learned in theory.
But here is why it does not go far enough.
Top 5 problems with AI competitions and Kaggle
1. Lack of interpretability
If you are a data scientist, to be on top in Kaggle you´ll struggle for 0.1% on the score. The amount of effort to do so is most of the time in real life not worthwhile. Adding data leaks´ exploration, stacking, boosting, ensemble to achieve 0.1% is a must-have. Of course, for a leaderboard, it is extremely valuable, but for real life, it is simply not worthy as it decreases interpretability.
2. Lack of skills development
Data Science competitions are not teaching you all the skills you need to thrive in your career. Competitions only teach a subset of skills that are important in the real world while they pose the risk of creating misleading incentives, e.g. of winning against someone instead of working together, expecting unrealistic data quality, or spending time on tasks that are essentially not very useful to solve a given problem. All of which can be toxic personality traits in an actual work environment.
3. No deployment in the real world
Secondly, most of the solutions built are not replicable in production environments. Remember that in competitions you are seeking results on a leaderboard and in the real world you are building entire solutions to solve problems for society and the planet.
4. Lack of AI team collaboration
Direct team communication is key. This means to:
- Build empathy and cultural awareness
- Understand how to ask for help the right way
- Split the work amongst each other most effectively
And much more where competitions won´t help you.
In addition, not everyone has access to fellow data scientists to work with and learn from. Many people are located in remote parts of the world and may not have other people to collaborate with. Not everyone has access to a university or comes from bigger cities where they can meet other data scientists.
5. Lack of Company Collaboration
Organizations that want to put challenges on Kaggle have to make the dataset publicly, which does not fit all business models. In addition, many real-world problems require the company to closely work and iterate with the data scientists and engineers, which is not possible in Kaggle.
You might also like
How to improve Kaggle and make it more inclusive
At Omdena, we thought it would be great to make people collaborate from all over the world by connecting them in a collaborative environment to work on an interesting real-world problem. Team of up to 50 engineers go from problem scoping to data collection, augmentation and AI modeling and deployment by adding the necessary software engineers skills.
The advantages for data scientists and ML engineers
- Getting your hands dirty through real-world experience. Rohith Paul from India says while “I participated in some Kaggle competitions where the data was cleaned already, Omdena’s real-world exposure was a new experience for me and I loved it”.
- Work closely with domain experts. Most real-world problems are not limited to just a data science problem but involve domain experts to create value. We have seen that while working with domain experts, data scientists from diverse backgrounds help the company to refine the problem and give a new perspective to the problem.
- Deploying the solutions in the real world by focusing on MlOps as a set of practices that to deploy and maintain machine learning models in production reliably and efficiently.
In the words of Aleksandr Laskorunskiy from Israel:
“Omdena can give such a big bonus for people through the opportunity to mention in their CV the real work experience that is required by any company now”.
The advantages for organizations
Omdena recruited a team of more than 50 data scientists around the globe who worked tirelessly over the entire project duration. I´ve rarely seen a team working so hard for a common goal and achieving such tangible results in a short period of time. The project resulted in several outcomes that are extremely promising, not just for Save the Children, but for the entire field of NGOs and other actors in the field.
— John Zoltner
Senior Advisor of Technology for Development and Innovation, Save the Children
- A selected group of 40–50 practitioners with different skills sets and experience levels.
- The data is shared internally.
- The community solving the problem includes the company (often data scientists) via direct exposure to the project and additional involvement if necessary.
- The community members are intrinsically motivated and have often faced the problem themselves. They not only build a model but also help the company refine the problem and put the problem into a bigger context. This is something we see over and over again in all challenges.
After only two weeks of setting up the project team, working with such a diverse and motivated group of professionals was a truly unique experience, with insights beyond our expectations. We look forward to collaborating on additional projects!
— Arnon Houri-Yafin
CEO, Zzapp Malaria ($5M XPRIZE 2021 Winner))
And in the words of Saurav Suman from the UN World Food Program in Nepal:
The collaborative approach of Omdena is taking innovation to a whole new level with the idea of leveraging technology to bring in people with different capacities and work on a problem. The driving force behind this approach is accelerated learning through collaborative spirit, mentoring, and spot-on guidance.
Ready to test your skills?
If you’re interested in collaborating, apply to join an Omdena project at: https://www.omdena.com/projects