AI Insights

Are There Too Many Data Science Competitions and Hackathons? Here Is Why the Market Is Saturated.

March 18, 2021


article featured image

Data Science competitions are not teaching you the skills you need to thrive in your career. Here is what skills data scientists really need to focus on in 2021 to stand out.

data science competitions

Credit: @ai_memes

Competitions and hackathons have their role to teach you AI modeling skills. However, as shown above, the problem becomes clear:

Competitions only teach a subset of skills that are important in the real world while they pose the risk of creating misleading incentives, e.g. of winning against someone instead of working together, expecting unrealistic data quality, or spending time on tasks which are esentially not very useful to solve a given problem. All of which can be toxic personality traits in an actual work environment.

Contrary, there are several skills that matter on the job apart from the fact the data in the real world is in almost all cases far from “perfect” (as suggested in most competitions).

To get a better understanding of what really matters in actual data science roles, here is how a Redditor describes his day to day work:

  1. Meeting with businesses to understand the problem
  2. Find the data and build data pipelines using SQL/Python
  3. Do analysis and build baseline model in Python/Jupyter notebook
  4. Once a workflow is established, I put everything in Python scripts, and run automated hyperparameters/model selection/etc. searches and standardized result output to find the best model. Also helps with reproducibility.
  5. Present and communicate results to business
  6. Develop final model package and data pipelines to deploy it in our production platforms (using OOP concepts like python classes, software engineering principles like pylint, pytest, CICD pipelines, etc.)

Another Redditor comments on the above list as follows:

The list seems like a solid workflow, which pretty much guarantees job security.

Looking at the above sequence of steps, most of it is data engineering, reproducible analysis development, and then production engineering.

This connects to recent mind-boggling statistics in a KDNuggets article where ML researcher Mihail Eric analyzed the data roles being hired for at every company coming out of Y-Combinator since 2012. Here’s the gist of what he found out:

There are 70% more open roles at companies in data engineering as compared to data science. As we train the next generation of data and machine learning practitioners, let’s place more emphasis on engineering skills.

The “secret” to successful data science teams is data quality (and engineering) and integrating domain knowledge.

The Skills That Matter

In summary, the real value of a data scientist is not just the languages, models, and tools they use. Instead, what organizations are looking for is someone who knows how to solve a problem using data; starting from figuring out what data to collect in the first place all the way to turning that into a simple insight.

Now, let us talk about what skills really bring you ahead to get a job, excel at work, and most importantly enjoy what you are doing.

First, I want to focus on soft skills as they are mostly overlooked in competitions and hackathons. Technical skills will follow.

Soft Skills

The people skills.

Omdena projects

Source: Omdena Project

According to WEF two of 2025´s most important skills are analytical thinking as well as complex problem-solving. Let us apply this to the data science field and describe the most essential skills.

Complex Problem-Solving

Here are two myths about how data scientists solve problems: one is that the problem naturally exists, hence the challenge for a data scientist is to use an algorithm and put it into production. Another myth considers data scientists always try leveraging the most advanced algorithms, the fancier model equals a better solution.

The reality is each problem is unique and comes with different parameters. The essential skill is to figure out the most effective and often efficient approach to solving the problem. Sometimes it needs a fancier model but more often a simplistic approach yields better results. The skill is that you deeply analyze the problem, understand it, and then decide what solution can be built. Problem first, technology second!

Critical Thinking & Analysis

The times of heavy top-down management are (mostly) over. While a competition has a more top-down style process to follow, building a real-world project works best in a collaborative approach with a flat hierarchy.

The best data scientists do not just follow orders but learn how to think independently. This will not only help to address a problem differently but also will improve team communication, and educate the business leaders, and the overall leadership of an organization. All of which ties into the following skill set:

Communication & Collaboration

A data scientist has to be able to communicate results and automate analyses. While from a technical standpoint, this is typically done in Power BI, Tableau, or similar, direct team communication is key. This means to:

  • Build empathy and cultural awareness
  • Understand how to ask for help the right way
  • Split the work amongst each other most effectively
  • And much more where competitions won´t help you.

“Communicate unto the other person that which you would want him to communicate unto you if your positions were reversed.” — Plato

Active Learning & Learning Strategies

A no-brainer. Competitions lack the collaboration and interaction necessary to learn the most, either by teaching or listening to somebody else.

learning data science

Leadership & Social Influence

Individuals who are able to take responsibility and drive initiatives forward in a team are what every organization is looking for. Taking over responsibility is like a muscle that can be trained. You do not always need to be a senior to take on leadership roles. Not the project size matters but your mindset of moving things forward wherever you can.

You don’t need a title to be a leader.

“The strength of the team is each individual member. The strength of each member is the team.” — Phil Jackson

Fun

data scientist meme

Lastly, fun is an essential “soft skill” to have. As AI influencer Eriber Weber puts it:

“Do not only optimize for income but for work that makes you happy.”

Happiness is the ultimate productivity driver but apart from that, work (life) is too short to do most stuff you do not enjoy or that does not serve a bigger purpose.

Here is how one of our Omdena project participants Samir, Software Engineer at Google, describes the joy of collaborating with 50 engineers from around the world:

“A group of strangers from different corners of the Earth, who have never met each other; transcending geographical borders and time zones to work together and solve fascinating social problems; whilst learning from and inspiring each other every single day! This isn’t just a figment of my imagination. Such a world exists and I am extremely grateful that I am part of such an extraordinary journey.”

Hard Skills

Now, after touching on the key soft skills, I want to briefly talk about some overlooked hard skills. Apart from programming, ML, and EDA, there are some less obvious skills that make or break it.

Data Engineering

Coming back to our example earlier where a senior data scientist and Redditor describes his work, most of it covered data engineering.

As a data scientist, you may join thinking you’re there to build smart models and derive as much value from the data as possible. In reality, most of the time you get held up as your first few months require you to build the necessary infrastructure and pipelines to even get the data. Having looked into some messy datasets will help you to kick-start your career.

Visualization & Analytics

data visualization lego

Source: nfo-graphics.nl

Almost always visualization is ignored by beginners and even more experienced data scientists.

Here is why visualization is so important:

It can provide you with some great help in:

  • Interpreting data better and more memorable.
  • Getting your insights across (non-technical) folks
  • Noticing correlations
  • Figuring outliers
  • Finding Cause-Effect relations
  • And more you won´t see till you visualize it 🙂

Version Control

Is the ability to manage the change and configuration of an application. It’s a priceless skill in a team of developers. It allows you to check files for modifications. Next, during check-ins, you see if the files have been changed by another user and you will be alerted and able to merge them.

Paying attention to version control will make teamwork much more effective.

API’s and Command Line

You just can’t skip it — if you do, you are bound to hear it again. APIs are being used almost everywhere and are needed to excel in developing applications in any of the data science domains such as cloud, IoT, and web applications. Having a good understanding of different storage services, security features, and automation tools will enable you to apply the best technology needed for the job.

Deployment

Your model is not a Jupyter notebook!

The deployment on edge and/or cloud is a must-skill in all production applications. As a fact, maintaining a model on production with security, and maintenance is one of the rare and wanted skills in the field now.

Nor do competitions often result in deployment, neither do they teach you how to deploy real-world models?

But what about AUTO ML?

AutoML tools don’t solve every problem. You need stable, clean data for AUTO ML to even be of value. You also need someone that understands the problem enough to select useful validation steps and metrics. Often you need those metrics/validations tooled to be explained so some business-side person can understand the results.

In conclusion, AUTO ML won´t be a threat anytime soon to the data science role but rather another tool. In summary, the demand for diverse skills is increasing and it’s becoming harder for people to get into this field, which, in turn, creates a shortage within the market. Competitions are valid but only teach a subset of skills, while the full set of skills can only be learned in actual real-world projects where teamwork is key.

Making it in data science requires Hardwork+Patience+Real World Data+Teamwork+Fun.

And the funny thing is there is so much to learn and so many interesting problems to solve. 🙂

Ready to test your skills?

If you’re interested in collaborating, apply to join an Omdena project at: https://www.omdena.com/projects

Related Articles

media card
Filling a Gap in the Iraq AI Sector and Launching my Own Startup – by Mohammed Zuhair, Ph.D
media card
Why Collaborative AI Projects Beat Competitions and How it Helped me to Get a Job as a Data Analytics Consultant
media card
From a Junior Machine Learning Engineer to an Associate Data Engineer in Only 12 Months
media card
Falling in Love with Data Science and How I Got a Job Offer as Data Scientist at Insight