How to Prepare for a Junior Data Scientist Job in the Real World
February 24, 2020
From nobodies to collaborating with global leaders. Working on an actual problem meant for us to overcome data quality challenges, finding the best-fit machine learning model, and improving our team and problem-solving skills.
In this article, Ashish Gupta and I describe our experiences in our first Omdena Challenge and how it prepares for a Junior Data Scientist Job.
AI for Good — The need for contribution
It was always one of our dreams to work in enterprises that take initiatives for social good. Being part of the IT sector, we were trying to make use of technology for the benefit of the community.
However, the deployment and applicability of these technologies in solving real-world problems seemed out of reach.
Omdena Challenge to the rescue
Fortunately, Omdena provided us with the opportunity to leverage AI and ML technologies to join their #AIforEarth Challenge to detect wildfires in Brazil.
At Omdena, you belong to a team that comprises people from across the globe that are working towards a common objective.
In such a scenario, keeping up with the senior and Lead AI enthusiasts within the team is super difficult, but ultimately rewarding!
Entering a Real-World Machine Learning Project
Joining the challenge of detecting disastrous forest fires through the use of landscape images, we came across challenges in our dataset.
Always prepare for the worst!
Our dataset consists of images that are captured by cameras placed on communication towers in the forest areas. The dataset is precarious. It contains images ranging from images containing camera glares, fog, clouds, night-time images as well as images that contain smoke released from boilers.
Problem 1: Camera glares
For example, the image below comes with camera glares, which makes it difficult for the machine to interpret the image.
Problem 2: Clouds
Such noisy images decrease the accuracy levels of a model. For instance, images that contain clouds are difficult to understand (even by us humans) because the smoke and clouds appear the same!
Images that did not contain this problem, displayed several other challenges. The following image is difficult to be recognized as smoke or fog.
And now what our model predicted for the mask.
With these data challenges, the accuracy metrics for most of the models prepared by the team were plateauing around 93–94%.
Something had to be done!
One way around these stagnant waters was to implement a routine to improve the quality of the dataset that is being fed into our model so that the machine could interpret them better.
Improving the image quality
We took initiative and began working on a task that could process images by making use of denoising, sharpening as well as upsampling techniques.
With everybody busy with their work, as a Junior Data Scientist, we had to take responsibility for the job.
Showing grace under pressure
We assumed the role of task managers and created a task for improving the quality of the images. This is when the team took notice! Our task involved using image processing methodologies such as image denoising, image sharpening, and image super-resolution to improve the quality of the dataset.
We stumbled upon several algorithms to perform this and started to experiment.
First, we came across a method known as the Non-Local Means of Denoising Images prepared by OpenCV. Below are some results.
As you can see, the original distorted and noisy image showed no improvements because the method only showed commendable results on images containing Gaussian Noise. However, our images contain random noise.
We shifted to implement yet another approach that was taken from the popular Deep Image Prior code repository. Using the various methodologies offered by this paper, we were able to eliminate some noise from the image. However, this was not practical!
We ran the code on a sample image containing Gaussian noise for 3000 iterations.
Because this showed promising results, we ran the code using our dataset’s images. Even after a whooping 8000 iterations, we had the following results.
Shoot for the moon. Even if you miss, you’ll land among the stars.
In the end, the code was not implemented into our team’s final model, because it dealt only with Gaussian noise removal.
However, the experience of moving on from smaller contributions to being task managers of a challenge was exciting, inspiring and motivating.
Prepared for a Junior Data Scientist Job
Technical skills
As beginners in the field, we were not very experienced with libraries such as the PyTorch library. The same was with Slack. Our time with the team has increased our skill sets and abilities in handling such software. I am now not only experienced with the technical aspects of it but can also apply them in future challenges or for other teams towards the development and deployment of new projects.
Communication and Collaboration
Prior to this challenge, we usually lacked confidence in putting forward our viewpoints in front of other people. This diffidence, however, was eradicated when the Omdena team’s seniors welcomed our viewpoints with open arms and we could communicate freely in meetings. This also created room for efficient collaboration on a global scale irrespective of the age or experience of the other team members.
Problem Solving Skills
Our team was in a fix and we assumed responsibility to try and create a solution for this fix. In a scenario where the only problem for our plateauing model seemed to be the dataset, we thought of solving this problem for the improvement of our team’s performance. In such a case, we efficiently worked on denoising the images of our dataset so that one could see an improvement in their quality. This improved quality could help us in achieving greater levels of accuracy from the model our team had developed.
From being nobody’s in a large group of global collaborators, we managed to gather praise from the team for our efforts towards coding the denoising solutions. We have also successfully presented these findings at our meetings held towards the end of the challenge.
We simply cannot express the levels of satisfaction you get when highly experienced people belonging to the AI enthusiasts community appreciate the work done by you and praise the efforts. It is not at all a problem even if you land amongst the stars you know!
This article is written by Yash Mahesh Bangera.