In this guide, we draw a tested and proven data science road map to get the hang of practical data science skills, starting from learning Python fundamentals to building experience through real problems and projects. Adding 9 practical tips along the way. With a proven timeline.
Author: Rehab Emam
1. Learn the fundamentals
You don’t need a Ph.D. to do data science
1.1. Prepare your workspace
Most learning platforms have integrated code exercises where you don’t need to install anything locally. But to learn it right, you should have an IDE installed on your local machine. Suggestions will be a marketplace with many options and few improvements from one platform to another.
Tip 1: Just use one and stick to it
- Anaconda: It’s a tool kit that fulfills all your necessities in writing and running code. From Powershell prompt to Jupyter Notebook and PyCharm, even R Studio (if interested to try R)
- Atom: A more advanced Python interface, highly recommended by experts.
- Google Colab: It’s like a Jupyter Notebook but in the cloud. You don’t need to install anything locally. All the important libraries are already installed. For example NumPy, Pandas, Matplotlib, and Sci-kit Learn
- PyCharm: PyCharm is another excellent IDE that enables you to integrate with libraries such as NumPy and Matplotlib, allowing you to work with array viewers and interactive plots.
- Thonny: Thonny is an IDE for teaching and learning programming. Thonny is equipped with a debugger, and supports code completion, and highlights syntax errors.
1.2. Best courses
1.2.1. Beginner level – Duration: 1-2 months, 3 hours/day:
Tip 2: Focus on one course, learn the fundamentals
Variables, strings, data structures, etc., and apply the code.
Tip 3: Don’t chase certifications
The best introductory course for Python fundamentals from variables to data structures is Udacity – Intro to Python Programming
You need to go through the lessons, code along. It will only give you the programming necessities to start a Python Data Science journey
188.8.131.52. To practice more besides the lesson’s exercises, here is a list of the 10 top sites that provide programming practice platforms updated in 2022:
- Project Euler
The best way to learn data science is by doing data science
Tip 4: Don’t spend too much time on theory fundamentals
1.2.2. Intermediate level – Duration: 6-8 months, 3 hours/day:
Coursera – Applied Data Science with Python Specialization – Provided by University of Michigan
Tip 5: You can apply for financial aid to start the specialization and get a certification. But what if you are not accepted or not interested in certifications (which you’ll find later that are not important in your data science roadmap). Here is how to get into courses for free
Go to the specialization, scroll down to the first course, go to the course’s page, click Enroll for Free, you’ll get this popup.
In the very end, click “Audit the course”, you’ll start any course in that way, you get the knowledge behind it, only no assignments and no certifications, but still you log in to all the curriculum, and you can do that to all the courses in the specialization and any specialization in Coursera, how cool is that? (Thank you, Prof. Andrew Ng).
You might also like
2. Leverage your skills into advanced levels
Duration: 3 months, 3 hours/day
Completing this specialization, you will have a good grasp of data exploration, data analysis, data visualization, introduction to NLP, and a good course on Machine learning.
Believe it or not, you don’t need more than this course to start a machine learning project. It covers the most used Python packages needed in data science: Numpy, Pandas, Sci-kit learn, NLTK, and others.
Besides this specialization, if you are a reader type, we recommend “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems 2nd Edition” – 2nd Edition, be careful while ordering 😉
Going through this book will take you to a higher level of Python programming, Machine Learning in-depth. And all that you need to know about Deep Learning. It covers all data structures. And all models till neural networks using the most used libraries like Sci-Kit learn (in-depth), TensorFlow and Keras.
We can recommend another 3-5 books such as:
||Author – Andriy Burkov
Latest Edition – First
Publisher – Andriy Burkov
Format – ebook (Leanpub)/Hardcover/Paperback
||Author – Tom M. Mitchell
Latest Edition – First
Publisher – McGraw Hill Education
Format – Paperback
||Author – Christopher M. Bishop
Latest Edition – Second
Publisher – Springer
Format – Hardcover/Kindle/Paperback
By finishing that specialization and any relevant books, you leverage your knowledge from fundamentals to advanced deep learning passing through machine learning.
Now, you need to understand where you want to go, directly apply for jobs? (which you can)
Let’s have a look at what kind of jobs are there, and what level of proficiency you should have.
- Data Analysts — Easy to Medium
- ML Engineers — Medium
- Data Engineers — Medium to Hard
- Research/Data Scientists — Hard
- AI Engineers/Deep Learning Practitioners — Very Hard
You might also like
At this point, we recommend you start building a projects portfolio.
3. Build out a project portfolio
Duration: 1-2 months, 3 hours/day
Google how to do that, just type “How to build a data science portfolio” in Google search.
Read all articles there, did anyone say “Kaggle”? In 2022, the answer is NO.
Maybe a few years ago, you’d find many newbies head to Kaggle for datasets and get experience but the reality is totally different. So you should start working on messy datasets, and best of all “no-datasets”. What?!
Yes, no-datasets is the real-world data science experience, you have to collect data and build data sets.
Hence we recommend you start practicing and building your real experience
As a warm-up, these interesting storytelling projects will wet your tongue
An interesting visualization of NBA player movements is shown below, code is provided in the previous link
Had some fun?
3.1. Apply what you learn
3.1.1. Work with real-world datasets
- Google dataset search tool Google gives you a search tool to get any available online data set, data related to governments, finance, retail, e-commerce, etc.
- Even you can download the IMDB dataset and start exploring which movie has the highest revenue of all time!
- Google Cloud Public Datasets Making use of publicly available datasets
- Explore the data for several insights, define questions that have never been asked before, dig into journals and research papers to look for related material, and then uncover hidden patterns using statistical models
- Papers with code are one of the recent platforms that provide research papers with the datasets you can use to apply the methodologies in the papers.
Tip 6: Apply research paper findings on your code
One of the highest skills you can develop, is applying research paper concepts and algorithms in your code and on your problems.
- Connected Papers Get a visual overview of a new academic field
- Enter a typical paper and we’ll build you a graph of similar papers in the field. Explore and build more graphs for interesting papers that you find – soon you’ll have a real, visual understanding of the trends, popular works, and dynamics of the field you’re interested in.
Going further, acquire this pro skill:
3.1.2. Collect your data and build your datasets
Duration: 2-3 months, 4 hours/day
- Omdena is specialized in building your career and experience while making a global impact. In 8-weeks challenges, you can join global teams of data scientists and build an environmental solution using your data science skills. A new challenge every week that targets social impacts, like infrastructure planning, agriculture development, climate change, and clean energy. In these challenges, you start by collecting your data, building datasets, cleaning, process, explore then building machine learning models. Be sure that your level of experience has a place in a team of 50, so don’t hesitate to apply.
- Collect data from a website/API (open for public consumption) of your choice, and transform the data to store it from different sources into an aggregated file or table. Example APIs include TMDB, quandl, Twitter API, and so on.
Side dish, optional courses
They are not mandatory but they are very important to understand the concepts behind the code you build. Still, they are not a must to start practicing data science.
Data Science is not only Data Analysis or Machine Learning
It’s a bundle of skills you develop by practice. You will need to understand more Math and Statistics.
Specific programming topics to know include
- Common data structures (data types, lists, dictionaries, sets, tuples), writing functions, logic, control flow, searching and sorting algorithms, object-oriented programming. And working with external libraries.
- SQL scripting: Querying databases using joins, aggregations, and subqueries
- Comfort using the Terminal, version control in Git, and using GitHub
- Cloud computing; using one of AWS, Azure, or Google Cloud
- Big data
4. Mastering one Data Science field
Duration: 3 months, 4 hours/day
Tip 7: Go deep into one domain
To stand out, we recommend you master one of these fields. They are very popular in the jobs market now.
4.1. Remote Sensing is the use of satellite or aircraft-based sensor technologies to detect and classify objects on Earth. Download opensource satellite images using packages like Rasterio and Folium, get meaningful and insightful data from every pixel in a satellite image.
4.2. Natural Language Processing is how to teach a computer to be capable of “understanding” the contents of documents, including the contextual nuances of the language within them. Some interesting fields to focus on, Sentiment Analysis and Topic Modeling. To start, follow this tutorial:
Learn more course: Solving Business Problems with NLP
4.3. Computer Vision includes methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world. The first steps are to learn basic image processing and object detection tools like OpenCV and practice modeling on some pre-trained models like YOLO.
Learn more course: Mastering Computer Vision to Make a Positive Impact
4.4. Anomaly Detection Also known as Outliers detection is the identification of rare items, events, or observations that raise suspicions by differing significantly from the majority of the data. Mainly used in Healthcare and Pathology detection. An interesting application is to detect anomalies on the surface of Mars from landing images.
Learn more course : Deep Learning Course on Anomaly Detection (Mars Version)
5. Best practices to maintain along the way
5.1. Get engaged in Data Science communities
- Reddit, Data Science subreddits
- Quora, a big community of data science enthusiasts where you can ask any question and find a variety of answers from beginners to experts
- Data Tau, data science news, best tools announcements, the CNN of Data Science 🙂
5.2. Identify yourself – Narrow down your expertise
You can’t learn everything and do everything. Nobody does.
5.3. Communications and Presentation skills
If we say it’s the most important thing you have to acquire and develop, we are not exaggerating. Believe it or not, jobs are gained by the best communications and presentation skills.
Keep engaged in communities, help others. Contribute to open-source collaborations like in GitHub and Omdena. Build a community of data science enthusiasts around you.
5.4. Show yourself – Blogging
A critical piece of a data science portfolio, as it covers a good portion of real-world data science work. This also shows that you understand concepts and how things work at a deep level, not just at a syntax level. This deep understanding is important in being able to justify your choices and walk others through your work.
In order to build an explanatory technical article, you’ll need to pick a data science topic to explain. Then write up a blog post taking someone from the very ground level all the way up to having a working example of the concept.
Many Platforms host technical articles under quality conditions
- Medium (Drawback that it’s blocked in some big countries and high competition)
- Towards Data Science
- Data Science Central
To wrap up, this journey never ends, it’s better you don’t stop it, wake up with a purpose to learn something new every day. And apply it. Your journey carries on and your data science mastery is built by consistent steps. Of course, we can’t cover all topics in one article. But this data science road map is totally enough to start a career.
Tip 8: You don’t have to know everything before applying to jobs
Tip 9: Target the job you want and tailor your experience around it
Attach yourself to a mission, a calling, a purpose ONLY. That’s how you maintain your inner power and your peace.
Finally, welcome to the Data Science family 🙂.