How to Prepare for a Junior Data Scientist Job in the Real World

How to Prepare for a Junior Data Scientist Job in the Real World

How we boosted our data science skills in a global AI challenge to detect wildfires.

Article written by Yash Mahesh Bangera

 

From nobodies to collaborating with global leaders. Working on an actual problem meant for us to overcome data quality challenges, finding the best-fit machine learning model, and improving our team and problem-solving skills.

In this article, Ashish Gupta and I describe our experiences in our first Omdena Challenge and how it prepares for a Junior Data Scientist Job.

It was always one of our dreams to work in enterprises that take initiatives for social good. Being part of the IT sector, we were trying to make use of technology for the benefit of the community.

However, the deployment and applicability of these technologies in solving real-world problems seemed out of reach.

Omdena: Instagram

Fortunately, Omdena provided us with the opportunity to leverage AI and ML technologies to join their #AIforEarth Challenge to detect wildfires in Brazil.

At Omdena, you belong to a team that comprises people from across the globe that are working towards a common objective.

In such a scenario, keeping up with the senior and Lead AI enthusiasts within the team is super difficult, but ultimately rewarding!

 

Photo by Kevin Ku on Unsplash

 

Entering a Real-World Machine Learning Project

Joining the challenge of detecting disastrous forest fires through the use of landscape images, we came across challenges in our dataset.

Our dataset consists of images that are captured by cameras placed on communication towers in the forest areas. The dataset is precarious. It contains images ranging from images containing camera glares, fog, clouds, night-time images as well as images that contain smoke released from boilers.

For example, the image below comes with camera glares, which makes it difficult for the machine to interpret the image.

 

Images with camera glares

 

Such noisy images decrease the accuracy levels of a model. For instance, images that contain clouds are difficult to understand (even by us humans) because the smoke and clouds appear the same!

 

Original Image with clouds covering the forest area

 

Mask that our model recognized as smoke for the above image

Images that did not contain this problem, displayed several other challenges. The following image is difficult to be recognized as smoke or fog.

 

Image with the confusion of smoke and fog

And now what our model predicted for the mask.

 

Model prediction

With these data challenges, the accuracy metrics for most of the models prepared by the team were plateauing around 93–94%.

Something had to be done!

One way around these stagnant waters was to implement a routine to improve the quality of the dataset that is being fed into our model so that the machine could interpret them better.

Improving the image quality

We took initiative and began working on a task that could process images by making use of denoising, sharpening as well as upsampling techniques.

With everybody busy with their work, as a Junior Data Scientist, we had to take responsibility for the job.

We assumed the role of task managers and created a task for improving the quality of the images. This is when the team took notice! Our task involved using image processing methodologies such as image denoising, image sharpening, and image super-resolution to improve the quality of the dataset.

We stumbled upon several algorithms to perform this and started to experiment.

First, we came across a method known as the Non-Local Means of Denoising Images prepared by OpenCV. Below are some results.

 

Original Image

 

After running it through the algorithm — Appears the same.

As you can see, the original distorted and noisy image showed no improvements because the method only showed commendable results on images containing Gaussian Noise. However, our images contain random noise.

We shifted to implement yet another approach that was taken from the popular Deep Image Prior code repository. Using the various methodologies offered by this paper, we were able to eliminate some noise from the image. However, this was not practical!

We ran the code on a sample image containing Gaussian noise for 3000 iterations.

Original Image containing Gaussian Noise

 

Image after denoising — 3000 iterations (Gaussian noise eliminated)

Because this showed promising results, we ran the code using our dataset’s images. Even after a whooping 8000 iterations, we had the following results.

 

Original Image

 

Image after denoising — 8000 iterations (still a bit blurry as compared to Original image)

Shoot for the moon. Even if you miss, you’ll land among the stars.

In the end, the code was not implemented into our team’s final model, because it dealt only with Gaussian noise removal.

However, the experience of moving on from smaller contributions to being task managers of a challenge was exciting, inspiring and motivating.

 

Prepared for a Junior Data Scientist Job

 

Technical skills

As beginners in the field, we were not very experienced with libraries such as the PyTorch library. The same was with Slack. Our time with the team has increased our skill sets and abilities in handling such software. I am now not only experienced with the technical aspects of it but can also apply them in future challenges or for other teams towards the development and deployment of new projects.

Communication and Collaboration

Prior to this challenge, we usually lacked confidence in putting forward our viewpoints in front of other people. This diffidence, however, was eradicated when the Omdena team’s seniors welcomed our viewpoints with open arms and we could communicate freely in meetings. This also created room for efficient collaboration on a global scale irrespective of the age or experience of the other team members.

Problem Solving Skills

Our team was in a fix and we assumed responsibility to try and create a solution for this fix. In a scenario where the only problem for our plateauing model seemed to be the dataset, we thought of solving this problem for the improvement of our team’s performance. In such a case, we efficiently worked on denoising the images of our dataset so that one could see an improvement in their quality. This improved quality could help us in achieving greater levels of accuracy from the model our team had developed.

From being nobody’s in a large group of global collaborators, we managed to gather praise from the team for our efforts towards coding the denoising solutions. We have also successfully presented these findings at our meetings held towards the end of the challenge.

We simply cannot express the levels of satisfaction you get when highly experienced people belonging to the AI enthusiasts community appreciate the work done by you and praise the efforts. It is not at all a problem even if you land amongst the stars you know!

 


 
 

About Omdena

Building AI solutions through global collaboration

Omdena is a global platform where a community of changemakers builds AI solutions to real-world problems via collaboration.

Learn more about us and Collaborative AI.

The Best Way to Build AI Solutions As a Company Is Through Collaboration. Here’s Why!

The Best Way to Build AI Solutions As a Company Is Through Collaboration. Here’s Why!

A company that wants to become AI-powered and build solutions for real world problems has to overcome several challenges along the way. 

One of the most common obstacles is to get access to AI experts and data scientists that can help to translate a problem into a deployable AI solution or prototype. In addition, many organizations have little or no data to begin with. And even if data is plentyful, the question remains, how to leverage the raw data to solve problems or gain valuable insights. If an organization made the step to develop an AI system, the next wave of challenges is just around the corner.

Namely, AI systems built in the lab are often biased and lack real world approval. More than 50 percent of AI solutions fail due to a misalignment of stakeholders, no metrics, and no contextual AI strategy.

Now, how to solve these issues and build AI solutions successfully?

At Omdena, we have worked with more than 800 AI engineers to find ways to overcome these problems. 

Our approach to developing AI technology is based on collaboration where diversity of thought and inclusion is embedded in the process.

Omdena runs Collaborative AI Challenges with organizations that want to get started with AI, solve a real-world problem, or build deployable solutions within two months.

We have been collaborating with various organizations, including the United Nations, companies, startups, and NGOs around the world.

In our eight-week challenges, a selected community of 40 to 50 AI engineers delivers deployable solutions using real-world data. 

Our community has compiled many impactful examples of applying AI ranging from use cases in the environment to health, finance, safety, justice, and many more.

One of our recent partners, Wildfire detection company Sintecsys, describes Omdena’s AI community in the following way: 

An Omdena challenge brings together power, speed, and accuracy, through the dedication and impeccability from the Data Scientists involved and the leadership that emerged from the collaborative process. It not only got us into the AI game but also pointed us in the most suitable direction for our company. 

Now, in order to understand better how

 

The following list gives you a headstart on the advantages of Collaborative AI and whether you’d qualify for a challenge with our global community.

I don’t have a clearly defined problem for an AI project

There are plenty of meaningful problems that could be solved with the right AI technology.

In a nonextensive report, McKinsey compiled a library of more than 160 AI use cases.

Even though the AI movement is accelerating, the progress can be still seen as slow up until now. One of the main reasons is that many of the problems need to be translated into an AI-suitable format. 

Running an Omdena challenge means to work with AI and domain experts to transform a vague problem statement to a specific AI use case within a couple of days or weeks. 

For example, in our AI challenge with Impacthub Istanbul, the initial statement was to “improve the aftermath management of an earthquake with AI”. Within two weeks, the community narrowed it down to “Predicting the safest route after an earthquake for people at special risk (schools, hospitals)”. 

I do not have data or I do not know if I have the right data

According to several studies and our experience working with more than 600 AI engineers, data challenges, are the most common obstacle organizations face. 

Building a cutting-edge AI-based solution comes with the necessity to feed the algorithm not only the right type of data but also high-quality data.

In an Omdena challenge, our collaborators take care of:

  • Data collection 
  • Data understanding 
  • Data preparation 

A special bonus to ensure high quality is our community-enabled peer-to-peer review process where several people check each other’s code throughout the entire project. 

Our challenge to build a chatbot for Post-Traumatic-Stress-Disorder (PTSD) treatment was even kicked off with no data set at all. Within a week the community found creative ways to access various data sets

I do not have the technical staff for an AI project

No AI team, no AI solution?

Obviously, the lack of AI talent is the cornerstone of all obstacles associated with AI adoption. 

At Omdena you’ll get access to a selected community of AI practitioners that are most suitable to tackle your problem. Our AI challenges bring together AI experts, data scientists, machine learning and data engineers, as well as domain experts. 

Working with a fully staffed and diverse community comes with additional advantages, which bring us to the next point. 

I want to know how to build my own AI team

A fatal mistake many organizations make is to just hire AI experts or even beginners and expect magic to happen. 

Without asking the right questions, building the AI team will end in an expensive disaster. 

Here are a few common questions that need to be answered properly before making hiring decisions. 

  • What specific capabilities do I need in my organizational context?
  • What are the team composition and level of experience required?
  • How to attract and communicate with data scientists and machine learning engineers? 
  • What are my current data and infrastructure capabilities? 

According to an O-Reily study “hiring the required roles” is the third-biggest problem companies face. 

When building your team, you want to be well-prepared to attract the sought-after data scientists to your enterprise, startup or NGO. 

An eight-weeks deep dive into a full-scale project has significantly helped our previous partners to derive their own AI strategy.

Wildfire detection company Sintecsys, for example, worked with 42 Omdena collaborators in conjunction with their internal team to scale their business model. The developed machine learning models are currently implemented to avoid wildfires in the Amazon rain forest. 

I want to build a fully functional and deployable AI solution

In most of our challenges organizations request an implementable cutting-edge solution, within a short time frame.

The reason why we can move faster than in traditional development approaches are our Collaborative AI processes:

Once a challenge is kicked-off tasks and responsibilities are quickly allocated to the right people 

In addition, we leverage available code, best-in practice tools, and processes to simultaneously test and develop multiple machine learning models. 

And all of this happens in a diverse and inclusive setting where perspectives are shared most effectively. 

For globally known NGO Safecity India, our community built an algorithm to predict safe routes for women at risk of sexual harassment. In Safecity’s words: 

Omdena is one of the world’s finest sets of data scientists building solutions for Good. In only two months, we accomplished what we tried for two years reaching out to some of the biggest corporations. 

I want to prototype and validate my hypotheses

Even if your goal is not to build a fully deployable solution, yet, running a challenge will deliver valuable insights through data exploration and rapid prototyping. 

Global AI expert Andrew NG points out that in order to derive an AI strategy successfully, pilot projects are the best way to “get the flywheel turning” as soon as possible. 

A project should generate a quick win such as insights on where to focus your capabilities. 

In a recent challenge, we have helped the United Nations World Food Programme (WFP) to leverage open-source data in Nepal to fight hunger (link). As a follow-up challenge, we are working with the WFP in Istanbul to improve the food supply management in case of natural disasters.

I value building an ethical and trustworthy AI solution

According to findings by a New York University research center, the lack of diversity in the artificial intelligence field has reached “a moment of reckoning”. A “diversity disaster” has contributed to flawed systems that perpetuate gender and racial biases found a survey, published by the AI Now Institute, of more than 150 studies and reports.

One of our key mission points is to ensure diversity and inclusion in our challenges. 

We are proud to say that 35 percent of our AI community collaborators are female and see diversity of thought as a necessary condition to develop trustworthy products for real-world product adoption.

In the end, AI for Good solutions cannot be built in isolation of the people and social circumstances that make them necessary in the first place. (Add UNHCR use case) 

My organization would benefit from global exposure

Finally, we work with our partners to not only build cutting-edge AI solutions for tough problems but also provide a global platform where your organization will be showcased to leading partners in the AI space (Link). 

A challenge collaboration includes joint promotional efforts pre-challenge when we announce the call for applications and invite the AI community, during a challenge, as well as post-challenge in form of social media coverage, case studies, and webinars. 

For further questions, check out our FAQs or get to know the Omdena team in our LiveChat.

“Why compete if we can collaborate”