Balancing AI Courses and Real-World Projects, Mindset, and Securing an NVIDIA Internship

Balancing AI Courses and Real-World Projects, Mindset, and Securing an NVIDIA Internship

Data Scientist Kennedy K. Wangari shares his learnings as a community leader and incoming data science/ AI internship at NVIDIA.

 

Can you describe yourself in 50 words or less?

Kennedy Wangari is a Data Scientist, based in Kenya. An AI Community Leader and Innovator, Kennedy is passionate about tech communities, and harnessing the power of data and innovative AI technology to make a better and easier tomorrow. He’s a firm believer in the power of open source: words, knowledge, and ideas should be accessible to everyone.

 

Why are you into AI and not something else?

I believe that we can apply AI to tackle, and solve complex problems, in numerous domains, thus providing fundamentally new approaches to every problem and situation in this data-driven world.

The AI-powered future looks promising, and I would love to be part of this radical transformation changing the face of humanity.

 

To improve your skills, how do you complement course work and real-world projects?

Read

Cultivate the culture, and habit of reading research papers, domain academic literature, famous case studies, and articles. They will greatly help to deepen and solidify your craft, and domain understanding.

Reproduce

Next, attempt to reproduce and implement the research papers in projects.

Network

Listen, ask, talk, and build meaningful relationships and collaborations with people involved in your field of specialization (domain experts, customers) for feedback, mentorship, and guidance. This will improve your understanding of the domain, and gain a familiar taste with their challenges and potential data science/ analytics use cases.

To take things to the next level, attend conferences, events, and hackathons, you will find mentors and possible collaborators to partner.

Be involved in an online/ physical domain-related reading group or community with colleagues/ students, to study academic literature, and help one another.

 

What contributed the most to get your Data Science/AI internship offer at NVIDIA?

I could say that the proper data science and machine learning experience, and expertise gained from taking up real-world AI projects at Omdena, winning hackathons, building, and working with AI communities like deeplearning.ai, previous rigorous data science internships, and my current role provided an impressive background and skillset that interested recruiters from some top tech companies such as Microsoft Applied ML Software Engineering team, NVIDIA AI amongst others.

Throughout the rigorous rounds of interviews, I would probably say what contributed greatly to receiving the offer was my unique problem-solving mindset, analytical thinking, and approaches, based on how I responded, and tackled the various questions: technical, non-technical, use case related scenarios, and projects.

Most importantly, I thoroughly researched, read, and inquired internally on ongoing data science projects, research resources, articles, and ongoing work related to the role I was interviewing for.

One of the recruiters was quite impressed that I went beyond to greatly interact, utilize some of their products, and shared valuable feedback from my experience.

All these activities improved my understanding of the ongoing AI internship projects at NVIDIA, how they operate, function, are utilized, provided insights for potential data analytic use cases, and AI challenges from the perspective of the teams and people involved.

This provided a great baseline for our engagements and discussions with the recruiters. I gained insights, and intuition on contextual understanding of the data the teams work with, and how to utilize different data science technologies to solve related business problems that would later come up during the rounds of interviews.

Of course, it’s important to write good quality code, be technically competent, having a full understanding of what you are doing, and communicate effectively your work, and results, but there is more than that, and that’s what made me bag the offer.

 

What are your most important mindset tips (as mental conditions like impostor syndrome are becoming a problem)?

The learning mindset: become a life-long learner, to stay relevant, adapt to changes in the AI field, agile, adaptable to tap into opportunities and prospects, and future proof your career. Constantly be building up your knowledge and expertise.

The focused mindset: remain disciplined and focused as you build your craft in the AI Space. Make that real progress in ML, don’t be swayed by exciting ideas, projects springing up daily, and by latest development trends. Stay motivated, focused, and build up skills.

The self-trust mindset: you’ve got to trust yourself, be courageous enough to follow your passion aggressively, and believe in your capabilities. The AI space is vast, with lots of information, difficult concepts to master, a sea pool of learners, and its challenges. Don’t give up on the things you believe in and want to achieve.

When it comes to Imposter Syndrome, the voice in your head can get very loud about how big a fraud you are and how little you know. It will focus on your shortcomings, ignoring your success. Shift it by getting louder about your achievements. For every new skill, you gain, celebrate in grand. Always remember this quote by Albert Einstein: “The moment you stop learning, you start dying”. That feeling will always be there, get better at overcoming, and dealing with it, using it as a motivation to improve yourself.

Often value comes from flipping a perceived weakness on its head and figuring out its opposite. Use imposter syndrome as your greatest strength, act on it in the right way, and improve.

In this highly evolving, dynamic field with a sea pool of learners, and career people transitioning into the field, What will separate you from the rest in the field is your problem-solving skills; work towards improving your creativity, concrete problem solving, and critical analytical thinking abilities through different ways, and by practice. Know how to find out answers, develop the solutions yourself, and how to derive the answers.

 

A book or course, that you most recommend?

Ultralearning: Master Hand Skills, Outsmart the Competition, and Accelerate Your Career by Scott Young:

In a field that is highly dynamic, and evolving, where you’ve to be a lifelong learner. This is such a great read that will help us to become more pragmatic, relevant, re-invent ourselves, maximize our competitive advantage, adapt to changes in the AI space, and future proof of our careers in the data world.

Practical Natural Language Processing by Oreilly:

A very straightforward from the go book that does a great job in bridging the gap between Natural Language Processing (NLP) Research and practical applications. Covering from e-commerce, healthcare, finance, and other sought after domains where NLP is put into use, it’s such a great manual recommended for machine learning practitioners, data scientists, or anyone interested in the NLP field.

The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists:

A great read for aspiring and current data scientists to learn from the best. It’s a reference book packed full of strategies, suggestions, and recipes to launch and grow your own data science career.

How to Transform Your Data Science Skills and Build a Meaningful Career

How to Transform Your Data Science Skills and Build a Meaningful Career

By Tanya Dixit
 
 

Data Science is a hot field and will remain as such with new roles evolving. More and more companies are realizing that in order to stay relevant in the future they need to become data science career-oriented organizations.

Great breakthroughs have happened with machine learning models helping to detect fake news, spotting diseases such as malaria, building inclusive financial systems, and many more.

But data science is also a very diverse field — not only in terms of knowledge but also in terms of people, domains, and careers. This very diversity of ideas and individuals can make it a huge success. However, there are several obstacles along the way.

To overcome problems such as algorithmic bias, safety, and trust issues, and silo-driven development practices, we need to embrace all that humanity has to offer. To learn skills that are relevant in an inclusive workplace, we at Omdena are happy to turn toward online education with plenty of great resources to learn to code, dive into machine learning, and hackathons and competitions that will help to improve your modeling and analysis skills.

But what is missing, are skills that are not only about technical prowess, like Machine Learning, Data engineering, Visualization, Programming, Domain expertise, etc.

Even the most technically skilled data scientist needs to have the following soft skills to thrive today. All of which you can only learn through hands-on practice.

Collaboration, Cultural empathy, Creative thinking/problem solving, Initiative, Presentation skills

This is what a meaningful career means. This is what being a next-level data science engineer or data scientist means — A career where one is part of something that is bigger than oneself. The world’s biggest challenges can only be solved when people from diverse backgrounds with various skills and perspectives come together to bring out the best.

“Do not optimize for income only but for passion, for where you really want to make a difference.” — influencer Eric Weber in one of our webinars.

Our mission is to democratize AI not just by talking but by the means of purpose-driven action and collective intelligence.

The Japanese call this Ikigai (生き甲斐) — “a reason for being. The thing that gets you up in the morning”.

 

The people who are doing it

In the following, six individuals from various backgrounds — career movers, a Professor at ESADE, a Postdoc in Physics, a data engineer, a researcher, and a student — sharing their experiences in working in Omdena´s real-world projects.

 

Shifting careers in a 90 degrees direction

 

 

Professor Xavier has been involved for a long time in Business analytics and consulting. He took a step further in his career when he earned a Masters’s degree in Business Intelligence and Data Science. He joined Omdena´s disaster management project with UN WFP.

According to him, “(1) Doing AI for Good (2) on a global scale (3) and with a global team” were the 3-ingredient salad that made him want to participate in the project. He improved significantly his data engineering skills, tools for feature importance analysis, hyperparameter tuning, and cross-validation techniques and learned a lot about neural networks. He also improved skills in Python and in using libraries such as Scikit-Learn and visualization libraries such as Seaborn.

“I also learned that it doesn’t matter if you try something that maybe later it’s not part of the pipeline of the project. It’s part of the learning process.“  –  Professor Xavier

When asked about soft skills, Professor Xavier states that he improved his team management skills, mainly because he was part of a very diverse team with different cultures and backgrounds at Omdena. A very crucial aspect that he learned was to swim in a world of chaos at the beginning of a project and be willing to be patient for things to come into place later, with clear tasks and responsibilities.

“There is a lot of smart people out there willing to help and to make good using AI. Being part of such a community like that makes me proud of it. Makes me want to share it with my relatives and friends.”  –  Xavier Torres Fatsini

 

 

From Omdena to a full-time offer at Microsoft

 

 

Kritika Rupauliha is a CS undergrad, currently in the 6th semester of her degree. She has worked at organizations like Leading India AI, Reflex Solutions LLP, Omdena, IIIT Allahabad, and Microsoft as an intern.

At Omdena, she worked her way up from Junior ML Engineer to the Lead ML Engineer of a project. This was her first time managing a task with a large number of globally diverse participants. According to Kritika, by participating in an Omdena project, she learned to balance between empowering people to take their own initiative and get things done, while at the same time setting goals to keep the overall task on track. She also gained valuable skills in clear and regular communication, especially while working remotely. She learned to manage work for Omdena, her day job, and family commitments, and also to manage notifications appearing round the clock because of the global nature of the collaboration.

“Before joining Omdena, I had been involved in some research work and college projects under my professors. But I had never been exposed to such a big community of similar-minded individuals. I found out that I thrived in such a community, learning with my peers, and exploring the horizons of AI. Omdena is also the reason why I got selected for a software engineering intern at Microsoft.” – Kritika Rupauliha

For Kritika, being around experienced professionals and learning from them was the best thing, and they went on to become close friends who will always mentor her in her future endeavors. She learned valuable communication skills for which she credits her projects at Omdena.

We congratulate Kritika for bagging a full-time offer at Microsoft!!

 

Leading teams at Omdena

 

 

Rosana de Oliveira Gomes is an inspiring Astrophysicist who is now a Lead Machine Learning Engineer at Omdena.

Rosana improved her leadership and “communication to non-scientist” skills and increased her multicultural experience by working with collaborators from three continents for the first time.

“I feel more connected with the world after speaking every day with people from all over the world in order to build something together. I also feel more confident and valued (before the experience at Omdena I have suffered from bullying at the workplace and this really helped me to rebuild my confidence in my skills.” – Rosana de Oliveira Gomes

 

 

No need to have perfect knowledge

 

 

Marek Cichy got his Masters’s degree in the Portuguese language and culture and spent 10 years professionally working as a Polish-Portuguese/Spanish translator and interpreter. He wanted to shift to being an NLP specialist. He tells us that before joining Omdena, he felt he had a lot of hurdles to jump over and impostor syndrome was haunting him. He initially felt quite overwhelmed with all the maths and programming he had to learn.

But after joining Omdena, he realized a few things which immensely boosted him. Let’s listen from him:

“I don’t need to have perfect knowledge in all the above-mentioned areas to contribute in a valuable way to a project like this”“Compared with other participants, even ones with a “better” curriculum, I didn’t feel I was lagging behind, and definitely the atmosphere in Omdena projects is really positive and non-toxic”“I realized my domain knowledge (e.g. speaking Polish in the Omdena+SexEdPL project) is an equally valuable resource as ML knowledge is”“I always thought I’m not good at managing other people, but I discovered that in a positive environment I’m able to do it.”

Marek also says that he met such a diverse group of people who greatly expanded his network, and got to work on a real-world problem that can now help him in job interviews.

 

Data engineering & finding like-minded people

 

 

Raghuveer is an IT professional working as a Platform Engineer to provide an intelligent platform to investigate fraudulent activities. He was interested in the field of data science, machine learning, and AI since his college days. Before Omdena, he was working with small pet projects, but couldn’t apply it on a large scale. He also didn’t get to interact much with the data science community before. According to Raghuveer, working with Omdena helped him in several ways and transformed his data science career:

  • Working on a large scale project (end to end) for the first time.
  • Developing data munging/collection skills via scraping
  • Building “people” skills as he got to interact with almost +25 people

 

Working through passion

 

 

Juber Rahman is a researcher in the field of Electrical and Computer Engineering turned Data Scientist who believes in the power of passion.

“I believe a person performs the best when he or she is passionate to solve a problem. Very few organizations (e.g. Omdena) can ignite the passion in you. Most organizations make it feel an obligation to do something rather than creating a drive to solve a meaningful problem.”

Juber started learning advanced model development combining multiple models as part of his project at Omdena. He says the most important thing he learned was to accept and tolerate disagreement with fellow workers.

 

Our mission is not only to help in solving real-world problems but to transform data science career in the process too. We are happy about each success story as part of our community-driven work. The way we solve problems is changing, and we all are witnessing the dawn of a new era — one where collaboration matters more than the competition.

 

 

 

How to Learn Data Science Most Effectively in 2020 | Eric Weber

How to Learn Data Science Most Effectively in 2020 | Eric Weber

How to learn Data Science most effectively in 2020, what goes wrong in the field, and why income is not the only relevant career metric.

In this one-hour fireside chat with Eric Weber, Data Science Influencer (40k followers) and former Senior Data Scientist at LinkedIn, we discussed strategies and tactics to learn skills, finding a balance between theory and practice, developing a mindset of learning through failure, and why everyone needs to find his or her own way.

While it is hard to wrap up all insights in a post, here are 6 learnings from the webinar. For more, you can watch the entire recording at the end of the page.

 

data science skills

Figure 1: Data Science Skills

 

 

Finally, don`t listen too much to what others say (applies to this post as well) in your career journey. Well-known Youtuber & MIT Research Scientist Lex Fridman recently shared his own struggles with imposter syndrome and comparing himself to others. His suggestion?

“Pave your own path”

 

Overcoming an Imbalanced Dataset using Oversampling. Casestudy: Sexual Abuse

Overcoming an Imbalanced Dataset using Oversampling. Casestudy: Sexual Abuse

The problem: Overcoming an imbalanced data set

When it comes to data science, sexual harassment is an imbalanced data problem, meaning there are few (known) instances of harassment in the entire dataset.

An imbalanced problem is defined as a dataset which has disproportional class counts. Oversampling is one way to combat this by creating synthetic minority samples. 

 

The solution: The power of oversampling

SMOTE — Synthetic Minority Over-sampling Technique — is a common oversampling method widely used in machine learning with imbalanced high-dimensional datasets using Oversampling. The SMOTE technique generates randomly new examples or instances of the minority class from the nearest neighbors of a line joining the minority class sample to increase the number of instances. SMOTE creates synthetic minority samples using the popular K nearest neighbor algorithm.

 

K nearest neighbors draw a line between the minority points and generate points in the middle of the line. It is a technique that was experimented on, nowadays one can find many different versions of SMOTE which build upon the classic formula. Let’s visualize how oversampling effects the data in general.

 

graph between feature 0 and feature 1 before considering oversampling

Visual representation of data without oversampling

 

Visual representation of data with oversampling

 

 

For visualization’s sake, two features are picked and from their distribution, it’s clearly seen that the minority samples match the majority sample count.

 

Impact on the predictions

Let’s compare the predictive power of oversampling vs. not oversampling. Random Forest is used as the predictor in both cases. The ProWSyn version of oversampling is selected as the highest performing oversampling method after all the methods are compared using this Python package.

Let’s check the performance of models pre and post oversampling.

 

No oversampling implemented graph

ROCAUC without oversampling

 

 

Oversampling graph

ROCAUC with oversampling

 

 

With ProWSyn oversampling implemented, we can see a 13% increase in the ROCAUC score, which is the Area Under the Receiver Operating Characteristic curve, from 84% to 97%. I was also able to decrease the Brier Score, which is a metric for probability prediction, by 5%.

As you can see from the results, oversampling can significantly boost your model performance when you have to deal with imbalanced datasets using oversampling. In my case, the ProWSyn version of SMOTE performed the best but this depends always on the data and you should try different versions to see which one works the best for you.

 

What is ProWSyn and why does it work so well?

Most Data Science: oversampling methods lack a proper process of assigning correct weights for minority samples, in this case regarding the classification of Sexual Harassment cases. This results in a poor distribution of generated synthetic samples. Proximity Weighted Synthetic Oversampling Technique (ProWSyn) generates effective weight values for the minority data samples based on the sample’s proximity information, i.e., distance from the boundary which results in proper distribution of generated synthetic samples across the minority data set.

 

What is the output?

 

Graph of probability of sexual harassment instance

x: number of instances; y: probability

 

 

After the prediction, the histogram of predicted probabilities looks like the image above. The distribution turned out the be the way I imagined. The model has learned from the many features and it turns out there is a correlation within the feature space which at the end creates such a distinct difference between classes 0 and 1. In simpler terms, there is a pattern within 0 and 1 classes’ features.

More care has to be put into probabilities really close to 1 (100% probability). From the histogram plot above, we can see that the number of points near 100% probability is quite high. It is normal to dismiss someone as a non-predator but much harder to accuse someone, therefore that number should be lower.

 

More About Omdena

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

5 Reasons Why AI Hackathons Won’t Build Real-World Solutions

5 Reasons Why AI Hackathons Won’t Build Real-World Solutions

To create meaningful innovation and build real solutions, we need to move beyond (AI) hackathons.

 

By Omdena Founder Rudradeb Mitra 


 
 

 

Hackathons are perceived as a fast track to innovation. Creative minds come together and solve problems. This all sounds good in theory but let us look at the facts.

Currently, there are dozens of hackathons in response to COVID-19, 1000s of people are giving their time to build solutions. I even mentored one of the largest, where over 1000 engineers participated. The organizers put days, if not weeks, of work into it. So I truly commend their efforts and their goodwill. However, without taking away anything from them I question the effectiveness of hackathons.

 

Why AI hackathons won’t build real-world solutions

 

Reason 1: Lack of domain expertise

Social problems like COVID-19 cannot be solved only by engineers. To build real-world solutions we need to involve policymakers, domain experts, users — something that teams participating in hackathons often don’t have access to.

 

Reason 2: Absence of diverse backgrounds leads to bias

Hackathons are often formed by teams who know each other and thus lack fresh ideas. We have seen that systems developed only by engineers end up being biased. A “diversity disaster” has resulted in flawed systems that amplify gender and racial biases according to a survey, published by the AI Now Institute, encompassing more than 150 studies and reports.

The report summary says an overwhelmingly white and male field has reached ‘a moment of reckoning’ over discriminatory systems

In addition, when I was mentoring hackathons, I can say that out of 100+ ideas — we can summarize all of them in 10–15 ideas. Most of them were similar due to similar backgrounds of the participants.

 

Reason 3: Too little development time

Most hackathons run only for a few days or weeks, which is not enough to build or test solutions replicable in production environments.

In a hackathon often you are motivated to impress judges but in a real-world project, you are building entire solutions. Building a prototype is easy, but there is customer research, product marketing, design, and sales, which will determine if the prototype is actually implemented and creates value.

 

Reason 4: Competition vs. Collaboration

Why do we need hundreds of teams to compete solving the same problems while we could have several teams collaborate so solve multiple problems?

In one of my previous articles, I argue that competition-based models like Kaggle are not the best approach to build real-world solutions. Among many reasons, a key issue is that people are incentivized to win instead of working together to find the best solution to a problem.

For example, in a recent competition, a Kaggle 1st place winner cheated using a fake dataset to get $10,000 prize money and gain exposure.

 

The alternative? A collaborative approach

To solve the aforementioned problems and create an environment for building real-world solutions to problems like COVID-19, we founded Omdena.

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

Omdena’s Covid19 initiative displayed at Times Square NYC

Omdena’s Covid19 initiative displayed at Times Square NYC

 

Contrary to AI hackathons we embed the following mechanisms to build inclusive and deployable solutions.

 

1. Fostering global collaboration

Most of today’s challenges are global in nature. It is not that one country or group of people can solve it.

To find solutions we need a model where global communities can come together to solve problems, share their data and build solutions. In the world of Big data, AI and Machine Learning, data is key. It is not a sophisticated algorithm or a better team, but it’s the team with better (and more) data that wins.

 

2. Following a bottom-up model to enable creativity

I firmly believe the future of innovation is bottom-up, where communities come together to collaborate and solve their problems. Communities will drive the future of AI, not Governments or Corporations. I argue that communities have both intrinsic and extrinsic motivation to solve the problem, which is essential in building solutions.

And the evidence is clear. At Omdena our global community in more than 75 countries develops innovative solutions every month. Up to 50 engineers and domain experts collaborate for two months where ideas are shared openly to find the best-fit solution.

With a bottom up approach, those who are more involved with the specifics of their field are included in the ideation and brainstorming process, with the result being a more harmonized and inclusive development system.

 

3. Working closely with domain experts

Most real-world problems are not limited to just a data science problem but require domain experts to create value. In our projects, domain experts work in a constant exchange with the AI teams to help the company refine the problem, prepare the data, and build a solution that is applicable in their context.

 

4. Involving people who face the problem

Something which goes beyond domain expertise is to incorporate people who faced the problem. This brings empathy and can help to reduce bias.

On Omdena’s innovation platform almost 1000 AI engineers from 79 countries have collaborated to build real-world solutions within a time frame of two months. The benefits of Collaborative AI are enabled by the people who are part of the project. We believe rigid top-down management principles or winner-takes-all approaches create the wrong incentives for a world where solidarity and teamwork are more important than ever.

 

Human first, technology second

In conclusion, I am arguing:

All of us interested to build an inclusive future, need to think more holistically to create an environment beyond gender, race, and cultural backgrounds and focus on how we can collaborate as humans.

Digitization and AI have enormous potential for doing good in all aspects of life and in all sectors of the economy. However, it is the combination of people with technology that truly enables progress and higher productivity. We have to emphasize community and purpose. That is the key to create meaningful innovation and products.

 

About Omdena

Building AI Solutions Collaboratively 

Omdena is an innovation platform for building AI solutions to real-world problems through the power of bottom-up collaboration.

Learn more about the power of Collaborative AI.

Stay in touch via our newsletter.

Be notified (a few times a month) about top-notch articles, new real-world projects, and events with our community of changemakers.

Sign up here