**A step by step guide, with 50+ resources to make 2021 your year of meaningful Data Science (by Rohith Paul).**

“Listening to the data is important… but so is experience and intuition. After all, what is intuition at its best but large amounts of data of all kinds filtered through a human brain rather than a math model?”— Andrew Lang

**Step 1: Math & Stats for Data Science**

One of the most important steps as Data Science is a quantitative domain and core mathematical foundations will serve as a base for your learning.

**Probability**

Probability is the measure of the likelihood that an event will occur. A lot of data science is based on attempting to measure the likelihood of events, everything from the odds of an advertisement getting clicked on, to the probability of failure for a part on an assembly line.

**Online Courses**

**Books**

- Introduction to Probability
- Think Stats: Probability and Statistics for Python programmers
- The Probability and Statistics Cookbook

**Statistics**

Once you have a firm grasp on probability theory you can move on to learning about statistics, which is the general branch of mathematics that deals with analyzing and interpreting data.

**Online Courses**

- Intro to Descriptive Statistics by Udacity
- Intro to Inferential Statistics by Udacity
- Statistics courses by Datacamp

**Books**

**Multivariable Calculus & Linear Algebra**

The studies of vector spacing and linear mapping between these spaces. It is used heavily in machine learning, and if you really want to understand how these algorithms work, you will need to build a basic understanding of Linear Algebra.

**Online courses and videos**

- Mathematics for Machine Learning: Multivariate Calculus
- Mathematics for Machine Learning: Linear Algebra
- 3Blue1Brown Essence of Calculus
- 3Blue1Brown Essence of Linear Algebra
- MIT Linear Algebra
- MIT Multivariable Calcumit lus
- Computational Linear Algebra by fast.ai

**Books**

**Step 2: Learn to Code**

**Python**

Python is an interpreted, high-level programming language. Python allows programmers to use different programming styles to create simple or complex programs, get quicker results and write code almost as if speaking in a human language. It was named after the comedy troupe Monty Python in 1991 and is one of the official languages at Google.

**Resources to learn Python**

- Learn Python
- Python for Data Science by Data Camp
- Python for Data Science by IBM
- Python by Codeacademy

**R programming**

R is one of the best programming languages for analysis and visualization with its expansive community and interactive visualization tool and packages like ggplot2 making it one amongst the most used languages in Analysis and Data Science

**Resources to learn R programming**

**Step 3: Machine Learning and Algorithms**

**Online courses**

- Machine Learning by Stanford University
- Machine Learning Specialization by University of Washington
- Introduction to Machine Learning for Coders by fast.ai
- Advanced Machine Learning Specialization
- Machine Learning by Georgia Tech

**University online courses**

**Books**

**The importance of data preprocessing**

Before working on a Machine Learning process your data needs to be clean for modeling. Often neglected but one of the most important skills. Here are some resources that will help you in data preprocessing:

- Preprocessing For Machine Learning in Python
- Pandas Tutorial
- Numpy Tutorial
- Preprocessing data using scikit-learn

**Visualizing the data**

To better understand the data it is important to visualize the data to find out the correlation between different variables. Here are some resources that can get you started with data visualization:

- Data Visualisation using Seaborn tutorial
- Data Visualisation using Seaborn datacamp tutorial
- Data Visualisation using Matplotlib tutorial
- Data Visualisation using Matplotlib datacamp tutorial
- Data Visualisation using Tableau

**Cloud Computing**

One other domain whose knowledge is essential for a Machine Learning project is Cloud Computing because Machine learning systems tend to work better on cloud computing servers. This is because of the following reasons — low cost of operations, scalability, and huge processing power to analyze the huge amount of data. So, the blend of machine learning with cloud computing is beneficial for both technologies. If you want to get started with cloud computing here are some resources which you can refer to:

- Machine Learning with TensorFlow on Google Cloud Platform Specialisation
- Machine Learning with Amazon Web Services
- Machine Learning with Microsoft Azure Platform
- Spell.run

**Step 4: Deep Learning, Natural Language Processing, Computer Vision and Reinforcement Learning**

**Online courses**

- Deep Learning Specialization by deeplearning.ai
- Practical Deep Learning for Coders by fast.ai
- Part 2: Deep Learning from the Foundations by fast.ai
- Code-First Introduction to Natural Language Processing by fast.ai
- Deep Learning Explained
- TensorFlow in Practice Specialization by deeplearning.ai
- Udacity’s Deep Learning Nanodegree
- Reinforcement Learning

**University videos**

- MIT 6.S191 Introduction to Deep Learning
- MIT 6.S094 Deep Learning for Self-Driving Cars
- MIT 6.S099 Artificial General Intelligence
- Stanford CS231n: Convolutional Neural Networks for Visual Recognition — YouTube playlist from Winter 2016, taught by Li, Karpathy, & Johnson
- Stanford CS224n — Natural Language Processing with Deep Learning (Winter 2017)
- Oxford Deep Natural Language Processing
- Natural Language Processing (NLP)
- Berkeley CS294–112: Deep Reinforcement Learning — YouTube playlist
- Berkeley Deep RL Bootcamp
- Reinforcement Learning Explained

**Books**

- Neural Networks and Deep Learning
- Deep Learning Book
- Deep Learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville
- Reinforcement Learning: An Introduction
- Reinforcement Learning
- Speech and Language Processing (3rd ed. draft)
- Natural Language Processing with Python
- An Introduction to Information Retrieval

**Step 5: Connect, Learn & Grow with the Community**

**1.** **Join collaborative challenges**

Work with collaborators all over the world solving real-world problems such as Hunger, Sexual Harassment, Forest Fires, and PTSD while further boosting your skills in teams of 40 to 50 collaborators per AI Challenge.

**2. Join competitions**

Challenge your skills and broaden your existing skills by competing with other (aspiring) data scientists.

**3.** **Go to (online) meetups and connect with fellows**

**4. Join Communities like PyData and ****PyCon**

**PyData **provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other.

**5. Attend (online) conferences**

One of the best ways to learn about the latest developments is by attending conferences in the space. Besides helping professionals gain knowledge through hands-on workshops, these events and conferences also provide a platform to network with industry peers and understand the latest development in this space.

Here are some amazing conferences, which you can attend online or hopefully offline soon.

- Deep Learning 2.0 Summit
- MarTech Summit
- Machine Learning Prague
- AI & Big Data Expo Global
- AI in Finance Summit
- World Data Summit
- PAW Machine Learning Week
- The Responsible AI Forum
- ISC High Performance 2021
- SciPy Conferences

**Step 6 — Operating at the Scale of Big Data**

Big data refers to extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.

**The Spark framework**

Understand the advantage of the in-memory cluster memory framework.

Scalable Machine Learning on Big Data using Apache Spark by IBM

**Additional resources**

If you’re interested in getting a little closer to the hardware used in deep learning, there are some good courses that introduce programming for specific architectures. All require proficiency in C and are relatively advanced:

- Fundamentals of Parallelism on Intel Architecture — covers vectorization, OpenMP, MPI, etc. on Intel Xeon Phi (Knights Landing with AVX-512), from Intel/Colfax Research/Coursera
- Performance Optimization on Intel Architecture — covers optimization techniques for Intel Xeon Phi architecture, from Intel/Colfax Research/Coursera
- Introduction to Parallel Programming — CUDA programming on Nvidia GPUs from Nvidia/Udacity, covering things like parallel scatter/gather and manually implemented kernels

And if you want to build your own deep learning server from scratch,

**Step 7: Stay up to date **

The following websites will make sure you don’t miss any important updates.

**arXiv.org subject classes:**

**Semantic Scholar searches:**

**I wish you all the best for this amazing journey and hope that you will bring a positive change in society using AI!!**