MediaPipe Python Tutorial [How to Install + Real-Time Hand Tracking Example]
December 30, 2021
In this tutorial, we will guide you on how to install MediaPipe Python step by step with an example Real-Time Hand Tracking Project.
MediaPipe Python is a powerful tool for developers looking to incorporate computer vision and machine learning into projects. It provides a high-level API for building real-time ML solutions for mobile, edge, cloud, and web.
In this tutorial, we’ll cover the following topics:
- Understanding the MediaPipe API
- Installing MediaPipe Python
- Building a real-time hand-tracking application
Let’s start! 🙂
What is MediaPipe Python?
MediaPipe is Google’s open-source framework, used for media processing. It is cross-platform or we can say it is platform friendly. It is run on Android, iOS, web, and YouTube servers that’s what Cross-platform means, to run everywhere.
What do you think is common in all these pictures?
Think for a while and guess what is common in all images below!
Your guess is absolutely correct, module MediaPipe is common in all these images.
What are the uses of MediaPipe?
Uses of MediaPipe
Every Youtube video we watch is processed with machine learning models using MediaPipe. Google has not hired thousands of employees to watch every video people upload, because thousands of people are not enough to look after and check each published video, the amount of data Google gets daily is not easy for humans to check. Machine Learning models are developed to make our life easier, so for tasks that are hard for us to complete, machine learning and deep learning models help us to do them in less amount of time, on the other hand, we can save money by not hiring employees.
Yes, Google has machine learning/deep learning models to see if the videos match their policies and if the content is not having copyright issues.
Basically, MediaPipe is a framework for Computer Vision and Deep Learning that builds perception pipelines. For now, you just need to know, perception pipelines are some sort of audio, video, or time-series data that catch the process in the pipelining zone.
Why does Google use MediaPipe?
Google has been using MediaPipe for so long and mainly Google uses it for two tasks.
1. Dataset preparation for Machine learning training
Pose Estimation
Pose estimation means finding a person’s or an object’s key points. A person’s key points are elbow, knee, wrist, etc so MediaPipe can be used for training the ML model to learn the key points and further use the knowledge for specific tasks, this actually can be useful for action recognition.
2. ML inference pipelines
Live Data
ML inference is the process of running live data points.
Example: We all have used Snap_chat and Instagram filters and may have recorded videos, this is what ML inference means.
What is possible with MediaPipe?
There are a number of AI problems that can be done by MediaPipe. Here some are mentioned:
- Object Tracking
- Box Tracking
- Face Mesh
- Hair Segmentation
- Live Hand Tracking and many more.
MediaPipe Hands: Real-Time Hand Tracking Project
Here I have developed the Live Hand Tracking project using MediaPipe.
Hand Tracking uses two modules on the backend
1. Palm detection
Works on complete image and crops the image of hands to just work on the palm.
2. Hand Landmarks
From the cropped image, the landmark module finds 21 different landmarks on the hand.
How to Install MediaPipe in Python?
For this specific task, we require three modules, cv2, MediaPipe, and time.
We can install all the modules/libraries of Python by installing pyforest in the Jupyter Notebook.
Once the modules are installed and the next time when this command is run, the output will be shown that (the requirements are already satisfied). See below in the image.
If MediaPipe is still not installed and does not work, install it separately because MediaPipe is the newest module maybe it is not yet included in the pyforest, as I thought to work directly on Kaggle notebook but found out that MediaPipe was not working, I installed it and worked on Jupyter Notebook, Jupyter Notebooks do not require internet it is a plus point.
This is how to install MediaPipe in Jupyter notebook.
How to Import Modules in Jupyter?
How to Camera Object in Python?
In the below code, I have created a camera object just to check if the camera is working properly.
Here is the output.
How to Create Object from Class Hand?
Created a hand object from the hand class so that BGR image is converted to RGB, as the hands object only uses/accepts RGB.
Extracting Information from the object results
Before extracting hands further details, make sure there is something in the object (results), do this simple step, Use a print statement, and print the object result to see what it holds. It just shows MediaPipe solution-based solutions and nothing else even if the hand is shown.
How to Check if the Hand is Being Detected or not?
Update print statement by putting (multi_hand_landmarks), and see if the camera is detecting hands.
Now as I have updated the print statement, the information I am getting is “None” because no hand is shown.
Let’s see what information is extracted when hand/ hands are shown.
So you see, when the hand is detected by the camera it gives some values.
How to Detect Landmarks and Draw points on Hand?
In the below code, the drawing object is created (mp_draw), further the if statement says that if the landmarks are detected the for loop will run and draw a point wherever landmark is detected.
Interesting right! See the image.
How to Draw Connections Between Landmarks?
Connections are drawn by using a hand object (mp_hand.HAND_CONNECTIONS).
Frame Rate
For fps two variables are declared, p_time and c_time (previous and current time).
Extracting value of each landmark
Just in case if any specific point is needed to be tracked for any purpose.
As we know there are 21 landmarks in a hand (0 to 20). The landmark information gives the x,y, and z coordinates with id which are listed in the correct order. We can use x and y coordinates to find the location of a landmark on hand.
Here firstly I have checked the height, width, and channels (h, w, c) of the image. In the previous code, I have got the decimal values and now I wanted exact integer values, therefore, I have converted the circle values (cx, cy) to integers.
Drawing circle on a specific landmark
So for drawing, I have created a drawing object (mp_draw), further, I have declared an if condition for point 0 because I wanted a filled circle at the landmark 0.
High lighting fingertips
For fingertips, the landmarks are (4,8,12, 16, and 20). See the code in the below image.
This is how we can use these landmarks for different tasks. Here I am ending the article also it’s not the end of the study there is still a lot to explore.
How to contact the Omdena Pakistan Chapter?
If you face any issues regarding any AI and ML project, or you want details about workshops, or you want to be part of any AI project and don’t know where to start you can instantly reach us for assistance. Our social media team is always active in helping Engineers and posting regarding upcoming workshops and ongoing projects. You can follow us on the below-mentioned pages to stay updated.
Facebook:Omdena Pakistan Chapter
Github:Qasim Hassan
Medium:Iqra Anwar
Ready to test your skills?
If you’re interested in collaborating, apply to join an Omdena project at: https://www.omdena.com/projects
People also learn more:
- How to Build a Web Scraping Pipeline in Python Using BeautifulSoup
- How to Build a Chatbot Using Rasa: Use Case of an AI Driving Assistant
- How to Build a Satellite Imagery for Water Quality Monitoring in Kutch Region
- A Guide to Using EDA for Vehicle Image Analysis and Insurance Fraud Prevention
- Retail Customer Journey Analysis Using Edge Computer Vision on CCTV Cameras