How to Switch into a Data Science Career

How to Switch into a Data Science Career

This article was written by Rosana de Oliveira Gomes, Lead Machine Learning Engineer at Omdena, and Joseph Itopa A, Junior Machine Learning Engineer at Omdena.

 

Transitioning/switching into a new data science career can feel like boarding a plane that’s already taking off. The data science profession is relatively new, which means that many data scientists and machine learning engineers didn’t start their careers on this path. They switched from other fields as we did, and perhaps like many of you reading this and thinking of switching into a data science career. 

So let’s talk about what tools and skills you need for switching into data science—we’ll highlight possible challenges and give you practical advice on how to overcome them. 

 

Overcoming technical challenges 

 

 

1. Brush up on math and programming

Data science doesn’t require advanced math knowledge beyond what’s required for any science degree. But every artificial intelligence algorithm is based on some mathematical structure that you will need to understand. This usually involves linear algebra and some concepts from calculus. To interpret results, you will need to conduct statistical analysis, which also requires knowledge of probability and statistics. 

While math provides the concepts, programming languages are the tools to make those concepts tangible. This means that you have to choose a programming language to learn, which in this field usually is either Python or R, likely combined with SQL and Bash. A KDnuggets poll on programming software used by data scientists reveals that Python has surpassed R as the tool of choice. 

But the choice of a programming language essentially boils down to the task at hand and style preferences. Python is easy to pick up for someone who has experience with programming and is widely used across industries and specialties such as data science and machine learning. R is a good choice too if you have a statistics background and will be working mostly on analysis. It also has built-in tools and libraries to communicate results through reports. Our advice is to stick with one language and start building something after you are done with the basics.

Speaking from experience, we’ve learned that, during transitioning/switching, to acquire the necessary skills in data science, you should choose only one learning provider at a time and stick with it. The worst thing you can do, after switching into the data science stream is to keep learning the same things over and over.

 

2. Become a problem solver

You can view the science of data science as the ability to solve problems with creative and logical thinking. This requires knowledge of programming and an understanding of algorithms gained through practice. 

After acquiring some basic knowledge of programming, you can begin solving real-world problems by practicing via courses or platforms. GeeksforGeeks provides hands-on projects for competitive coding, Python, JAVA, and SQL. Solving some Kaggle competition problems can also boost your problem-solving skills, as you can easily leverage real-world data to practice with and find a lot of help in the community. DataCamp’s unguided projects are a great way to find your own solutions to open-ended projects.

It’s important that you find enjoyment in these accomplishments to pursue a data science career since switching. In a recent Omdena webinar, data science influencer Eric Weber said, “Don’t optimize for income only but for what brings you joy; otherwise, you may burn out quickly.”

 

3. Join collaborative projects

After working with some algorithms and practicing on a few projects, you will be ready for more advanced projects. This is where collaborative platforms come in. Collaborative data science programs rely on communities to develop projects in a diverse and productive way. 

The ruggedness and fun in street coding are best found in collaborative projects. You will learn from others as they learn from you. You will make friends while struggling with unstructured and messy data—there’s nothing like it.

Inspiring collaborative initiatives include Data Kind, Science to data science, and Data Science for Social Good. These options are often location-specific and may be costly or competitive due to limited availability and an extensive application process. 

An alternative to collaborative projects is Omdena, which launches several projects per month and applies a principle of volunteering to solve real-world problems through online collaboration. Learners work with domain experts who will help them stay motivated with webinars, courses, books, and blog posts. 

 

Overcoming non-technical challenges 

 

 

1. Organization and time management skills

Changing careers is a project. It requires a strategic plan, a timeline, and specific (and realistic) milestones. 

Ask yourself these questions:

  • Why do I want to be a data scientist and what subjects am I passionate about?
  • Will I quit my job and take the time to learn the skills I need or will I make the transition in parallel with my current work?
  • What am I good at? What are my weaknesses?
  • How much time and money am I willing to spend on changing careers?
  • What are the new skills and qualifications that I need to acquire for the new career path?
  • What is my learning style?

Once you answer these truthfully, you need a plan. In the end, you have to find what works for you. Here are some things you can do:

  • Print a weekly plan and stick it on your wall. 
  • Organize your daily schedule on a Google calendar. 
  • Identify what works and what needs to be readjusted in your plan. Maybe you underestimated how much time you needed for a course or the learning resource you chose is not quite as you imagined. 
  • Don’t be afraid to change your strategy. Go to online forums and search for a better resource. Or read some articles on time management and read on how to set SMART goals.
  • Keep checking tasks off your to-do list. This momentum will push you out of that initial frustration period of learning something new.

 

2. Become a better communicator

Another must-have skill for data science is communication. A data scientist translates tons of data into actionable insights for decision-makers and stakeholders. However, not all the people that you will need to communicate with our data scientists or have a background in STEM. If you’re an introvert, you may not be super excited about public speaking or constant communication. Communication in data science is not only about being a good speaker but also about building these habits and skills:

  • Provide good explanations of complex concepts to non-experts.
  • Document code and analysis so other team members can learn from your work in the future (or even your future self!). 
  • Become a good writer by making a habit of writing about everything you do in your career path. 

Documentation is power, so write and keep writing. In the very near future, you will need to demonstrate the skills that you put on your resume, and you can back them via nicely-documented code repositories (for example, on GitHub), blog posts, and webinars or talks about your work. The earlier you start, the faster you’ll improve.

 

3. Be prepared to fight imposter syndrome

One common challenge experienced by many career movers is imposter syndrome. It’s never easy having an established career and suddenly becoming a newbie. In this case, it’s all a matter of mindset: keep yourself motivated and excited about all the new things you are about to learn! Omdena has hosted a webinar on how to overcome imposter syndrome as a data scientist, which includes knowing the skills gap that you need to fill and identifying the skills you already learned from your previous career. 

 

4. Build your network

Networking on your job hunt doesn’t have to be awkward. In fact, statistically, the majority of job opportunities come from an individual’s network, not from applications (check the references here and here). When you network, you’re simply connecting to people who have similar interests as you, getting their take on data science topics, and getting a peek into their careers. These people may become your future colleagues. The absolute worst thing that can happen is not getting any response—what do you have to lose? 

One of the easiest ways to network is through collaborative projects, where you have the opportunity to share knowledge, work with experienced practitioners, and gain insights from people in different roles, and find leads for jobs.

Another good way to network is by following authorities in data science such as:

Contact any interesting people you encounter—a speaker from a podcast you liked, a teacher from an online course, or a blogger whose posts you enjoy. The golden rule is: Put yourself out there and ask questions. That’s the best way to get feedback.

A Magical Journey: From Omdena Collaborator to a Software Engineer at Google

A Magical Journey: From Omdena Collaborator to a Software Engineer at Google

Samir Sheriff shares his story of how 6 years in software engineering, dozens of online courses, and collaborative real-world experience with Omdena can make the difference

How did your journey start?

Nearly a decade ago, back in 2011, when I had just completed the 4th semester of my Computer Science & Engineering Degree course, I found that even though I had fared well in all my exams, my practical knowledge in this field was (in the words of Lord Kelvin — the famous mathematical physicist)“of a meager and unsatisfactory kind”.

This was due to the fact that, aside from a handful of really great courses, the majority of my coursework relied on rote-learning and the competitive pursuit of grades instead of practical knowledge.

I had joined the field of Computer Science to satisfy my childhood dream of working with computers, but I found I was still far from my dream of understanding and creating software with my computer. In this dismal state, I spent the beginning of my semester-break searching for a motivation in the online universe. After Googling for a short while, I stumbled upon Mehran Sahami’s CS106A video lectures on Stanford’s Youtube channel, and thus began my tryst with online education. Little did I know that life would never be the same again. Although it wasn’t a live course, I was able to grasp important concepts, thanks to the amazing video lectures and challenging assignments.

This experience inspired me to put my free time to good use by taking up even more online courses like Nick Parlante’s Python course, Julie Zelenski’s Programming Abstractions course, and a whole bunch of other wonderful courses. A few months down the line, Coursera and Udacity were founded, and this opened up new avenues for students like me from all around the world, who didn’t get the opportunity or couldn’t afford to enroll in prestigious institutes situated in other parts of the world. Machine Learning was a pretty new topic to me at that time, and I was first introduced to it by Andrew Ng’s famous Machine Learning course as well as Sebastian Thrun’s and Peter Norvig’s AI course.

 

How did you move into real-world projects?

Fast forward to early-2019. I had spent almost 6 years as a Software Engineer as well as completed a plethora of online courses such as Udacity’s Self-Driving Car Deep Learning Course and Jeremy Howard’s fast.ai Deep Learning Courses. Although I truly enjoyed the projects I worked on as part of these courses, I suddenly found myself with an inexplicable yearning to work on a real-world Machine Learning project with real people and real impact. This phase of my journey began with me participating in a few hackathon’s, Kaggle competitions, and a venture challenge; following (especially on Medium) other Machine Learning enthusiasts online to learn from their expertise; and joining the Towards Data Science publication on Medium as an author, to share my learning’s with others.

However, I knew I still hadn’t completely found what I was looking for, but I couldn’t explain why.

One fine day in May 2019, I stumbled upon two articles (Why Community and Collaboration is the Key for Building Ethical AI and Join the Global AI Community and Solve Big Social Problems) in my Medium newsfeed. Why these particular articles were recommended out of millions of Machine Learning articles, I will leave Medium’s recommendation systems to answer. However, as I perused these articles from beginning to end, I realized that Omdena was the answer to that inexplicable yearning to work on a real-world project with real people and real impact. I was really awestruck by the ideas of “collaboration instead of competition” and “the need to stop focusing on individuals or teams but start shifting our attention towards communities of people solving a problem”, which compelled me to muster up enough courage to apply for participation.

A few days later, I received news that I had been selected to be part of a 50-member team that would work together to detect anomalies on the surface of Mars over the next two months and my joy knew no bounds. Following the successful completion of this project, I also got a chance to work with a 50-member team on another project to prevent gang and gun violence via social media analysis. Through these two projects, I got the opportunity to work and learn as I had never done before.

 

What skills did you pick up?

 

1. Working in a 100% online environment

“It always seems impossible until it’s done” — Nelson Mandela

I was used to working in an office and believed that face-to-face interactions were always necessary for project success. We even had concepts like co-location, face-to-face 1:1 meetings, etc.

Omdena taught me that working in a 100% online environment is indeed possible and:

  • A global project brings with it diverse minds with diverse experience, perspectives, knowledge, and opinions which, when put together, will make human knowledge and humanity leap forward into a progressive, bright future.
  • Tools such as Slack, Zoom, GitHub, Google Colaboratory, Drive, Documents, and Slides come in handy when working on online projects.
  • Ironically, most corporate work is done online these days and this skill is priceless now!

 

2. Solving problems collaboratively

“In the long history of humankind (and animal kind, too) those who learned to collaborate and improvise most effectively have prevailed” — Charles Darwin

 

 

Working collaboratively on Omdena projects was akin to training a neural network to achieve a goal, as depicted in this short animation in which a neural network is gradually trained to distinguish between orange and blue points (Image Source: https://www.doc.ic.ac.uk/~nuric/teaching/imperial-college-machine-learning-neural-networks.html)

 

 

Working collaboratively on Omdena projects was akin to training a neural network, wherein things were chaotic at the beginning of the project with random strangers joining a newly-created collaborative environment (similar to how weights and biases in a neural network are initialized randomly during creation) and as time went by, we gradually got to know and learn from each other, thereby refining our understanding and becoming less chaotic by constantly iterating (similar to how a neural network gradually adjusts its weights over multiple epochs to minimize a cost function) until we achieved our goal or came close to it (similar to how a neural network eventually produces expected results).

My two key takeaways from solving problems collaboratively on Omdena’s projects are:

  • Investigate several approaches to a problem to arrive at the most viable solution in the shortest possible time — Every Omdena project involved breaking down an ambiguous problem into smaller parts, each of which could be investigated simultaneously by different members of the team, to eventually implement the most viable solution, all within the short span of 2 months. It was also magical to witness how certain investigations, which had originally led to dead-ends in the initial stages of a project, could be combined with certain solutions during later stages of that project to unravel better solutions.

Thanks to the different perspectives that the team brought to the table, I was able to quickly learn a lot more than I would have had I worked alone.

  • Share knowledge — Documenting the progress made during all our investigations helped in knowledge-sharing within the team, especially as some team members belonged to different time zones. Furthermore, I am grateful that Omdena provided me with the opportunity to contribute articles, code to the open-source community, as well as presentations/demos (live and recorded), through which I could share our learnings with the rest of the world.

 

3. Stepping out of my introvert zone

“In a gentle way, you can shake the world” — Mahatma Gandhi

I am an introvert by nature but Omdena succeeded in bringing me out of my shell. Omdena’s projects were extremely engaging and the passion with which the team discussed approaches to problems got me involved to such an extent that I started suggesting new ideas without any fear of drawing flak, and learned from ideas suggested by others — in no time, I had stepped out of my introvert as well as comfort zones.

 

4. Applying Machine Learning to real-world problems

“Nothing ever becomes real till it is experienced” — John Keats

These projects made me realize that real-world machine-learning problems are not as well-structured as all the assignments and course/hackathon projects I was so used to, especially because data is complex and usually never available in the relevant form. There are so many unknown variables to consider and so many trade-offs to make in order to come up with a practical solution.

 

How did you land an engineering job at Google?

Having worked in a Corporate setup for 6 years, I had already developed knowledge in the field of Software Engineer (ing) which acted as a prerequisite for the interview at Google. Omdena’s unconventional approach and methodology added to my skill sets and experience, thereby unlocking my hidden potential. Here is how it helped me and how you can increase your chances:

 

1. Building an eye-catching résumé

Since Google receives millions of software engineer job applications every year, it is important to make your résumé stand out from the rest in order to improve your chances of being selected for the interview process. One way to do this is to mention projects that you’ve worked on or managed, in order to demonstrate relevant skills and knowledge.

The open-source contributions as well as the articles/presentations that I created through the Omdena projects enhanced my résumé.

 

2. Building up confidence for the interview

Apart from assessing problem-solving skills, the interviews aim at evaluating other attributes of a candidate, such as the ability to communicate and collaborate with others. I can’t stress enough how important it is to augment your skills by working on real-world projects.

Online courses and assignments introduced me to important foundational knowledge to tackle problems, but unless this knowledge was put to good use regularly in real-world projects, I’m sure it would have slipped out of my memory and I would have had to spend more time re-learning a lot of concepts again.

The communication, collaboration, and problem-solving skills that I honed over the course of Omdena’s online projects helped me build my confidence and played a significant role in my software engineer job interviews at Google.

 

3. Knowing Google´s tools

This is a good-to-have attribute that could boost your confidence besides your résumé.

Coincidentally, to accomplish some of the tasks in the Omdena projects, we had used Google products like Colaboratory, Drive, Documents, Slides, etc., and I felt so proud of this fact when I walked in for the software engineer interviews at google.

 

4. Expertise in remote working

This would probably be a must-have skill for current times. Omdena’s projects will make you an expert in remote work.

Now, more than ever, due to the work-from-home environment that the ongoing pandemic has forced most of the world to accept as the “new normal”, I find that the invaluable experience I gained from working online with amazing people from different parts of the world on Omdena’s projects, helps me immensely — I work online with yet another set of the most amazing Google software engineer (s) from the world over, on software engineer projects at Google!

 

Any concluding words for our readers?

I believe that learning is a lifelong process and there is still so much more for me to learn, but I am glad that we live in an age where we can improve our knowledge on different subjects, with the aid of online courses and the opportunities provided by extraordinary platforms like Omdena, by collaborating instead of competing with each other.

In a world being plagued by greed, hatred, and intolerance, Omdena comes as a breath of fresh air to do away with national as well as man-made barriers and brings together a group of strangers from different corners of the Earth, transcending geographical borders and time zones to work together and solve fascinating social problems; whilst learning from and inspiring each other every single day. This is not just a pipe dream, thanks to online education, collaborative tools, and Omdena!

Balancing AI Courses and Real-World Projects, Mindset, and Securing an NVIDIA Internship

Balancing AI Courses and Real-World Projects, Mindset, and Securing an NVIDIA Internship

Data Scientist Kennedy K. Wangari shares his learnings as a community leader and incoming data science/ AI internship at NVIDIA.

 

Can you describe yourself in 50 words or less?

Kennedy Wangari is a Data Scientist, based in Kenya. An AI Community Leader and Innovator, Kennedy is passionate about tech communities, and harnessing the power of data and innovative AI technology to make a better and easier tomorrow. He’s a firm believer in the power of open source: words, knowledge, and ideas should be accessible to everyone.

 

Why are you into AI and not something else?

I believe that we can apply AI to tackle, and solve complex problems, in numerous domains, thus providing fundamentally new approaches to every problem and situation in this data-driven world.

The AI-powered future looks promising, and I would love to be part of this radical transformation changing the face of humanity.

 

To improve your skills, how do you complement course work and real-world projects?

Read

Cultivate the culture, and habit of reading research papers, domain academic literature, famous case studies, and articles. They will greatly help to deepen and solidify your craft, and domain understanding.

Reproduce

Next, attempt to reproduce and implement the research papers in projects.

Network

Listen, ask, talk, and build meaningful relationships and collaborations with people involved in your field of specialization (domain experts, customers) for feedback, mentorship, and guidance. This will improve your understanding of the domain, and gain a familiar taste with their challenges and potential data science/ analytics use cases.

To take things to the next level, attend conferences, events, and hackathons, you will find mentors and possible collaborators to partner.

Be involved in an online/ physical domain-related reading group or community with colleagues/ students, to study academic literature, and help one another.

 

What contributed the most to get your Data Science/AI internship offer at NVIDIA?

I could say that the proper data science and machine learning experience, and expertise gained from taking up real-world AI projects at Omdena, winning hackathons, building, and working with AI communities like deeplearning.ai, previous rigorous data science internships, and my current role provided an impressive background and skillset that interested recruiters from some top tech companies such as Microsoft Applied ML Software Engineering team, NVIDIA AI amongst others.

Throughout the rigorous rounds of interviews, I would probably say what contributed greatly to receiving the offer was my unique problem-solving mindset, analytical thinking, and approaches, based on how I responded, and tackled the various questions: technical, non-technical, use case related scenarios, and projects.

Most importantly, I thoroughly researched, read, and inquired internally on ongoing data science projects, research resources, articles, and ongoing work related to the role I was interviewing for.

One of the recruiters was quite impressed that I went beyond to greatly interact, utilize some of their products, and shared valuable feedback from my experience.

All these activities improved my understanding of the ongoing AI internship projects at NVIDIA, how they operate, function, are utilized, provided insights for potential data analytic use cases, and AI challenges from the perspective of the teams and people involved.

This provided a great baseline for our engagements and discussions with the recruiters. I gained insights, and intuition on contextual understanding of the data the teams work with, and how to utilize different data science technologies to solve related business problems that would later come up during the rounds of interviews.

Of course, it’s important to write good quality code, be technically competent, having a full understanding of what you are doing, and communicate effectively your work, and results, but there is more than that, and that’s what made me bag the offer.

 

What are your most important mindset tips (as mental conditions like impostor syndrome are becoming a problem)?

The learning mindset: become a life-long learner, to stay relevant, adapt to changes in the AI field, agile, adaptable to tap into opportunities and prospects, and future proof your career. Constantly be building up your knowledge and expertise.

The focused mindset: remain disciplined and focused as you build your craft in the AI Space. Make that real progress in ML, don’t be swayed by exciting ideas, projects springing up daily, and by latest development trends. Stay motivated, focused, and build up skills.

The self-trust mindset: you’ve got to trust yourself, be courageous enough to follow your passion aggressively, and believe in your capabilities. The AI space is vast, with lots of information, difficult concepts to master, a sea pool of learners, and its challenges. Don’t give up on the things you believe in and want to achieve.

When it comes to Imposter Syndrome, the voice in your head can get very loud about how big a fraud you are and how little you know. It will focus on your shortcomings, ignoring your success. Shift it by getting louder about your achievements. For every new skill, you gain, celebrate in grand. Always remember this quote by Albert Einstein: “The moment you stop learning, you start dying”. That feeling will always be there, get better at overcoming, and dealing with it, using it as a motivation to improve yourself.

Often value comes from flipping a perceived weakness on its head and figuring out its opposite. Use imposter syndrome as your greatest strength, act on it in the right way, and improve.

In this highly evolving, dynamic field with a sea pool of learners, and career people transitioning into the field, What will separate you from the rest in the field is your problem-solving skills; work towards improving your creativity, concrete problem solving, and critical analytical thinking abilities through different ways, and by practice. Know how to find out answers, develop the solutions yourself, and how to derive the answers.

 

A book or course, that you most recommend?

Ultralearning: Master Hand Skills, Outsmart the Competition, and Accelerate Your Career by Scott Young:

In a field that is highly dynamic, and evolving, where you’ve to be a lifelong learner. This is such a great read that will help us to become more pragmatic, relevant, re-invent ourselves, maximize our competitive advantage, adapt to changes in the AI space, and future proof of our careers in the data world.

Practical Natural Language Processing by Oreilly:

A very straightforward from the go book that does a great job in bridging the gap between Natural Language Processing (NLP) Research and practical applications. Covering from e-commerce, healthcare, finance, and other sought after domains where NLP is put into use, it’s such a great manual recommended for machine learning practitioners, data scientists, or anyone interested in the NLP field.

The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists:

A great read for aspiring and current data scientists to learn from the best. It’s a reference book packed full of strategies, suggestions, and recipes to launch and grow your own data science career.

How to Become a Data Engineer: The AI Plumber?

How to Become a Data Engineer: The AI Plumber?

By Natu Lauchande
 
 
21st Century roadmap to becoming a Data Engineer
 

What is a data engineer?

In broad strokes, a data engineer is responsible for engineering systems and tools that allow companies to collect raw data from a variety of sources, volume, and velocity into a format consumable by the broader organization. The most common downstream consumers of data engineering products are the AI/Machine Learning and Analytics functions of a company.

The best way to start talking and discussing this new and loosely defined role is the Data Science hierarchy of needs brilliantly depicted by Monica Rogatin in the pyramid below.

 

 

 

 

 

 

                                         Source: The Medium post “The AI Hierarchy of Needs”

 

 

A data engineer is the lead player on the first 3 foundational rows of the Pyramid: Collect, Move/Store and Explore and Transform. A plethora of roles from Data Analysts, Data Scientists, and Machine Learning Engineers are the heirs and lead role players on the higher phases of the value chain unlocking.

A Data Engineer is part of the functioning that provides the base to the highly critical job of the Data Scientists by hiding all the complexities involving the management, storage, and processing of the data assets of the company. He or she is a master of data ingestion, enrichment, and operations.

 

 

Source: Oreilly

 

 

With the deluge of data available within public and private companies, the ability to unlock this value is the critical factor in providing cheaper and better services to stakeholders and customers.

 

Skills of the trade

Data Engineers do come in different flavors and types. The core skills of the trade can be summarized below in order from essential to important:

  • Software Engineering: Data Engineering in its essence, is a discipline of Software Engineering where the same rhythms and methodologies of work are applied in order to execute the task at the end. The use of version control, unit testing, and agile techniques to ensure business alignment and quick delivery are paramount for success.
  • Relational Database/Data Warehouse Systems: Most of the data access in the data engineering space is democratized through access to ad-hoc querying into a relational database environment. Allowing expert users with basic knowledge of SQL to retrieve the data that they need in order to respond to a business query or decision.
  • Scalable Data Systems/Big Data: It’s central to the modern data engineer to understand data systems architectures. A good grasp of how distributed and parallel processing work is needed. The different types of indexing available in their environment to allow proper and efficient processing of the data at their disposal is a great skill to have.
  • Operating Systems / Command Line: Familiarity with your local environment of development being OS/*NiX/MIN is primal, particularly the command line where a lot of ad-hoc wrangling can happen.
  • Data Visualisation: A fundamental skill to effectively expose data products to a more general audience and quickly unlock data value through clear infographics, charts, and interactive analytics. Familiarity with a tool like Tableau, Superset, or Power BI is a must.
  • Data Science (Basics): An increasingly important user and stakeholder of a Data Engineering organization is the data science team. Understanding how data is used in the context of exploratory data analysis, machine learning, and predictive analytics ensures a virtuous cycle between critical data functions.

Data Engineers don’t need to be experts in all of the areas above. Having two core expertise in the above and a good understanding of the other areas go a long way in delivering value to a project.

A Data Engineer can come in different shapes and forms, so being very specific about your role is very important. As a nascent profession, it lacks standards and consistent job descriptions.

Typically transitions to successful data engineers are seen from the following backgrounds in the industry:

Software Developer/Engineer, Data Scientist, Database Administrator, Business Intelligence Developer, and, Data Analyst.

 

The path to mastery

To master data engineering I would start with the prerequisite of getting deep experience and expertise in two or more of the following areas.

  • Distributed Systems / Big Data
  • Database Systems / Data Warehousing
  • Software Development
  • Data Visualization

The most traditional path to mastery is a degree in a discipline with high Computing exposure (CS, EE, Info Sys., Applied Maths/Phys, Actuarial Science/Q) or a Quantitative degree followed by a couple of years in Software Development or Data Science with practical exposure to backend services and production systems. The data engineering field is loaded up with rockstar engineers from non-traditional backgrounds ( high school dropouts, literature majors, etc.).

A couple of top online courses and specialization available at the top websites ( Coursera, Udacity, Udemy, etc.) covering Big Data / Data Engineering tooling can give a good foundation to aspiring Data Engineers. The ones with the best reviews in your preferred learning platform will assist you in building a skill set for the role.

After this initial foundations I would recommend the following books for fundamentals in architecture:

  • Designing Data Data-Intensive Systems —Martin Klepmann
  • Data Engineering Cookbook—Andreas Kretz
  • Foundation of Architecting Data Solutions — Malaska et. AL,
  • Streaming Systems — Akidau et. al
  • The Data Warehouse Toolkit — Ralph Kimball

Nothing is more valuable at this stage than getting practical exposure in a real-world data engineer role. Keep practicing and growing the craft for the rest of your career.

Omdena as an organization that promotes AI challenges with volunteers across the world is the ideal place for anyone to sharpen their data engineering skills. In many of the Omdena challenges one of the most important skills needed is data engineering skills to prepare data, set up data pipelines, and operationalize pipelines.

 

Typical tools of the trade

With all the excitement in the field, a plethora of tools are popping up in the market, and knowing which one to use becomes a problem as there are many overlapping uses of them. A typical data engineer product/service does not differ much in terms of the complexity of a software system.

A typical data engineering pipeline will require expertise in at least one tool per function/category:

 

1. Function : Pipeline Creation / Management

Apache Airflow

  • End to end workflow authoring and management tool.
  • Provides a computing environment where your processes can run.

Alternatives: Azkaban, Luigi, AWS SWF

 

 2. Function: Data Processing

Apache Spark

  • A fundamental tool to process data in many formats at high scalability.
  • Allows facile enrichment and processing in SQL, Scala, and Python.

Alternatives: Apache Flink, Apache Beam, Faust

 

3. Function: Distributed Log/Queueing Systems

Apache Kafka — Scalable distributed queuing system that allows data to be processed and moved at a very high speed and large volumes.

 

4. Function: Stream Processing

Alternatives: Apache Flink

 

5. Function: Data/File Format

Apache Parquet — Very efficient data format geared for analytics and aggregations at scale on cloud or on-premises.

Alternatives: Arrow, CSV, etc.

 

6. Function: Data Warehousing /Querying

BigQery

  • A cloud-based data warehouse system for structured and relational data storage and analytics.

Alternatives: AWS Redshift, Apache Hive, etc.

 

Keep in mind that tools go and come over the years. Focus on the picture and functional areas will keep you updated and ready to learn the new fancy tool.

Starting or joining an open-source that uses any data engineering tool is a good move from a growth perspective and longer-term mentorship by captains of the industry.

 

The future

In order to fulfill the promise of unlocking the value of data, more investment in the Data Engineering space is expected. There’ll be increasingly intelligent tooling available to handle the current and future challenges around data governance, privacy, and security.

I can see an increase in blending AI and ML techniques directly on the Data Engineering toolchain from an operations perspective and data quality assurance. Good examples of such tools are Deequ from AWS Labs that applies machine learning to data profiling. At the center of modern Data Engineering are areas like synthetic data generation to alleviate issues around data privacy when the cost of acquisition of data and compliance is too high Tools to watch out on the synthetic data space: Snorkel and the use of generative adversarial neural networks to generate everyday tabular data.

With the rise of Auto ML for prediction and data analytics, a central role will be given to the underpinning data infrastructure engineering of the datasets that drives the enterprise strategy. From here, we can only see an outlook of increasing relevance and opportunities to contribute positively to society.

I would like to acknowledge Laisha WadhwaJames Wanderi, and Michael Burkhardt for their input and suggestions on the article.

I Struggled with PTSD, Now I Help to Address It Through AI

I Struggled with PTSD, Now I Help to Address It Through AI

 
Read the brave story of Anam from Pakistan who was struggling with Post-Traumatic-Stress Disorder (PTSD) after her dad was in a critical health condition. She had to prepare for entrance exams while taking care of her siblings for several months.

 

 

It is truly amazing how many inspiring individuals have applied to our Collaborative AI Projects. We very honored to share the story of Anam today from whom we learned a lot by just speaking and listening to her. Anam has been part of our AI challenge on building a machine learning model for PTSD assessment. 

 

Anam’s Story

I am a Computer Science student and before that, I was actually a pre-medical student. I switched a lot of majors. The thing in Pakistan is, after studying biology, you can either become a doctor or a dentist. I wanted to do research but there weren’t many options. That is why I decided to switch to computer science and for that, we have to study mathematics before college. I took a gap year to study math so that I was eligible to apply to an engineering university.

It was very hard to convince my parents to let me study maths at first because they were convinced that being a doctor would be a better choice for me. They finally agreed and I was studying maths and then I had to complete two years of the syllabus but I had only one year. Right after I started studying my dad got appendicitis and we went to get his appendix removed but it ended up being more than that.

His intestines stopped working, and he was in the hospital for a few months after that. We were just hoping his intestines would start working so we could go home. Then the surgical wound from where they opened him up, developed an infection. In order to get support, we had to move to Lahore, where the rest of my relatives live. When we moved to Lahore, they cleaned his wound, and it got infected again. He was on bed rest for about two months. His movements were minimal, which led to pulmonary embolism (blood clots had lodged in his lung). One day, he was going to the bathroom, when all of a sudden he passed out and nobody knew what was happening.

Everybody was at the hospital and nobody could figure out what happened. The doctors thought maybe he had a heart attack. He was taken to the ICU. The doctors started giving him CPR. I think he was gone for a minute or two, but the doctors were successful at bringing him back. They put him on life support forsupportfor a couple of days and that’s when we really lost all hope.

A couple of days later he finally woke up and we found out that he had had a pulmonary embolism.

I know a lot of people go to a lot of things and this is nothing compared to most of them. when my parents were in the hospital I was looking after my siblings. I had to tell them what was happening.

At the same time, I also had to focus on my studies. Even though there were a lot of people who did support us, at these times you really find out who is actually there for you and who isn’t. And a lot of people backed out. My friends would be telling me, you should be with your dad instead of even worrying about your studies. To avoid talks like these, I would hide while I studied. So after four months of being in the hospital and staying at my relatives, we could finally come home.

We were ecstatic.

I remember my mom telling me that she wanted to go out in the streets and shout for joy.

 

We came back home it was all fine and I gave my math exams after covering two years worth of syllabus in about 4 months, that too under extreme stress.

I came back after my last exam, ready to prepare for my college entrance tests, and something odd happened. I fell sick out of the blue. I had nausea 24/7. I couldn’t eat or drink. I would vomit if I tried, and I started to lose weight.

My parents took me to multiple doctors thinking that my stomach was upset. Months went by but we couldn’t figure out what was wrong. Later, I was diagnosed with severe anxiety which stemmed from the incident with my dad.

I remember my mom telling me, “When your dad was in the ICU, I would sit outside it all night and every day there was a new body being taken out of the ward. So, every time I saw the doors open, I hoped it wasn’t your dad’s body.” I could understand her feelings because that was exactly how I felt every time my mom called me from the hospital. I felt my heart drop every time my phone rang.

The fear had gotten stronger and now I had severe anxiety accompanied by recurring panic attacks. The fear that I might lose my parents kept me up all night. Whenever one of them left the house I would call them repeatedly to check up on them. I never turned my phone on silent while in class, because I was always fearing a call with a bad news.

I started medication, and my anxiety slowly started getting better. Throughout the recovery, my mother was always by my side. She distracted me when I had terrible thoughts. I felt safe only in her company.

My entrance test results finally came. I was accepted into one of the top CS universities in Pakistan.

My recovery still continues but today I feel great because I have never been able to share my story publicly before. I have always been told to keep it quiet as if talking about mental health problems is some sort of taboo. From my personal experience, I have realized that talking about it is what helps us get better. I hope I encourage people to speak up and share their stories

 

How to Transform Your Data Science Skills and Build a Meaningful Career

How to Transform Your Data Science Skills and Build a Meaningful Career

By Tanya Dixit
 
 

Data Science is a hot field and will remain as such with new roles evolving. More and more companies are realizing that in order to stay relevant in the future they need to become data science career-oriented organizations.

Great breakthroughs have happened with machine learning models helping to detect fake news, spotting diseases like malaria, building inclusive financial systems, and many more.

But data science is also a very diverse field — not only in terms of knowledge but also in terms of people, domains, and careers. This very diversity of ideas and individuals can make it a huge success. However, there are several obstacles along the way.

To overcome problems such as algorithmic bias, safety, and trust issues, and silo-driven development practices, we need to embrace all that humanity has to offer. To learn skills that are relevant in an inclusive workplace, we at Omdena are happy to turn toward online education with plenty of great resources to learn to code, dive into machine learning, and hackathons and competitions that will help to improve your modeling and analysis skills.

But what is missing, are skills that are not only about technical prowess, like Machine Learning, Data engineering, Visualization, Programming, Domain expertise, etc.

Even the most technically skilled data scientist needs to have the following soft skills to thrive today. All of which you can only learn through hands-on practice.

Collaboration, Cultural empathy, Creative thinking/problem solving, Initiative, Presentation skills

This is what a meaningful career means. This is what being a next-level data science engineer or data scientist means — A career where one is part of something that is bigger than oneself. The world’s biggest challenges can only be solved when people from diverse backgrounds with various skills and perspectives come together to bring out the best.

“Do not optimize for income only but for passion, for where you really want to make a difference.” — influencer Eric Weber in one of our webinars.

Our mission is to democratize AI not just by talking but by the means of purpose-driven action and collective intelligence.

The Japanese call this Ikigai (生き甲斐) — “a reason for being. The thing that gets you up in the morning”.

 

The people who are doing it

In the following, six individuals from various backgrounds — career movers, a Professor at ESADE, a Postdoc in Physics, a data engineer, a researcher, and a student — sharing their experiences in working in Omdena´s real-world projects.

 

Adapting careers 

 

 

Professor Xavier has been involved for a long time in Business analytics and consulting. He took a step further in his career when he earned a Masters’s degree in Business Intelligence and Data Science. He joined Omdena´s disaster management project with UN WFP.

According to him, “(1) Doing AI for Good (2) on a global scale (3) and with a global team” were the 3-ingredient salad that made him want to participate in the project. He improved significantly his data engineering skills, tools for feature importance analysis, hyperparameter tuning, and cross-validation techniques and learned a lot about neural networks. He also improved skills in Python and in using libraries such as Scikit-Learn and visualization libraries such as Seaborn.

“I also learned that it doesn’t matter if you try something that maybe later it’s not part of the pipeline of the project. It’s part of the learning process.“  –  Professor Xavier

When asked about soft skills, Professor Xavier states that he improved his team management skills, mainly because he was part of a very diverse team with different cultures and backgrounds at Omdena. A very crucial aspect that he learned was to swim in a world of chaos at the beginning of a project and be willing to be patient for things to come into place later, with clear tasks and responsibilities.

“There is a lot of smart people out there willing to help and to make good using AI. Being part of such a community like that makes me proud of it. Makes me want to share it with my relatives and friends.”  –  Xavier Torres Fatsini

 

 

From Omdena to a full-time offer at Microsoft

 

 

Kritika Rupauliha is a CS undergrad, currently in the 6th semester of her degree. She has worked at organizations like Leading India AI, Reflex Solutions LLP, Omdena, IIIT Allahabad, and Microsoft as an intern.

At Omdena, she worked her way up from Junior ML Engineer to the Lead ML Engineer of a project. This was her first time managing a task with a large number of globally diverse participants. According to Kritika, by participating in an Omdena project, she learned to balance between empowering people to take their own initiative and get things done, while at the same time setting goals to keep the overall task on track. She also gained valuable skills in clear and regular communication, especially while working remotely. She learned to manage work for Omdena, her day job, and family commitments, and also to manage notifications appearing round the clock because of the global nature of the collaboration.

“Before joining Omdena, I had been involved in some research work and college projects under my professors. But I had never been exposed to such a big community of similar-minded individuals. I found out that I thrived in such a community, learning with my peers, and exploring the horizons of AI. Omdena is also the reason why I got selected for a software engineering intern at Microsoft.” – Kritika Rupauliha

For Kritika, being around experienced professionals and learning from them was the best thing, and they went on to become close friends who will always mentor her in her future endeavors. She learned valuable communication skills for which she credits her projects at Omdena.

We congratulate Kritika for bagging a full-time offer at Microsoft!!

 

Leading teams at Omdena

 

 

Rosana de Oliveira Gomes is an inspiring Astrophysicist who is now a Lead Machine Learning Engineer at Omdena.

Rosana improved her leadership and “communication to non-scientist” skills and increased her multicultural experience by working with collaborators from three continents for the first time.

“I feel more connected with the world after speaking every day with people from all over the world in order to build something together. I also feel more confident and valued (before the experience at Omdena I have suffered from bullying at the workplace and this really helped me to rebuild my confidence in my skills.” – Rosana de Oliveira Gomes

 

 

No need to have perfect knowledge

 

 

Marek Cichy got his Masters’s degree in the Portuguese language and culture and spent 10 years professionally working as a Polish-Portuguese/Spanish translator and interpreter. He wanted to shift to being an NLP specialist. He tells us that before joining Omdena, he felt he had a lot of hurdles to jump over and impostor syndrome was haunting him. He initially felt quite overwhelmed with all the maths and programming he had to learn.

But after joining Omdena, he realized a few things which immensely boosted him. Let’s listen from him:

“I don’t need to have perfect knowledge in all the above-mentioned areas to contribute in a valuable way to a project like this”“Compared with other participants, even ones with a “better” curriculum, I didn’t feel I was lagging behind, and definitely the atmosphere in Omdena projects is really positive and non-toxic”“I realized my domain knowledge (e.g. speaking Polish in the Omdena+SexEdPL project) is an equally valuable resource as ML knowledge is”“I always thought I’m not good at managing other people, but I discovered that in a positive environment I’m able to do it.”

Marek also says that he met such a diverse group of people who greatly expanded his network, and got to work on a real-world problem that can now help him in job interviews.

 

Data engineering & finding like-minded people

 

 

Raghuveer is an IT professional working as a Platform Engineer to provide an intelligent platform to investigate fraudulent activities. He was interested in the field of data science, machine learning, and AI since his college days. Before Omdena, he was working with small pet projects, but couldn’t apply it on a large scale. He also didn’t get to interact much with the data science community before. According to Raghuveer, working with Omdena helped him in several ways and transformed his data science career:

  • Working on a large scale project (end to end) for the first time.
  • Developing data munging/collection skills via scraping
  • Building “people” skills as he got to interact with almost +25 people

 

Working through passion

 

 

Juber Rahman is a researcher in the field of Electrical and Computer Engineering turned Data Scientist who believes in the power of passion.

“I believe a person performs the best when he or she is passionate to solve a problem. Very few organizations (e.g. Omdena) can ignite the passion in you. Most organizations make it feel an obligation to do something rather than creating a drive to solve a meaningful problem.”

Juber started learning advanced model development combining multiple models as part of his project at Omdena. He says the most important thing he learned was to accept and tolerate disagreement with fellow workers.

 

Our mission is not only to help in solving real-world problems but to transform data science career in the process too. We are happy about each success story as part of our community-driven work. The way we solve problems is changing, and we all are witnessing the dawn of a new era — one where collaboration matters more than the competition.

 

 

 

Stay in touch via our newsletter.

Be notified (a few times a month) about top-notch articles, new real-world projects, and events with our community of changemakers.

Sign up here