Computer Vision

MediaPipe Python Tutorial [How to Install + Real-Time Hand Tracking Example]

December 30, 20217 min readUpdated October 31, 2025

Omdena Pakistan Chapter

In this tutorial, we will guide you on how to install MediaPipe Python step by step with an example Real-Time Hand Tracking Project.

MediaPipe Python is a powerful tool for developers looking to incorporate computer vision and machine learning into projects. It provides a high-level API for building real-time ML solutions for mobile, edge, cloud, and web.

In this tutorial, we’ll cover the following topics:

Understanding the MediaPipe API
Installing MediaPipe Python
Building a real-time hand-tracking application

Let’s start! 🙂

What is MediaPipe Python?

MediaPipe is Google’s open-source framework, used for media processing. It is cross-platform or we can say it is platform friendly. It is run on Android, iOS, web, and YouTube servers that’s what Cross-platform means, to run everywhere.

What do you think is common in all these pictures?

Think for a while and guess what is common in all images below!

Your guess is absolutely correct, module MediaPipe is common in all these images.

What are the uses of MediaPipe?

Uses of MediaPipe

Every Youtube video we watch is processed with machine learning models using MediaPipe. Google has not hired thousands of employees to watch every video people upload, because thousands of people are not enough to look after and check each published video, the amount of data Google gets daily is not easy for humans to check. Machine Learning models are developed to make our life easier, so for tasks that are hard for us to complete, machine learning and deep learning models help us to do them in less amount of time, on the other hand, we can save money by not hiring employees.

Yes, Google has machine learning/deep learning models to see if the videos match their policies and if the content is not having copyright issues.

Basically, MediaPipe is a framework for Computer Vision and Deep Learning that builds perception pipelines. For now, you just need to know, perception pipelines are some sort of audio, video, or time-series data that catch the process in the pipelining zone.

Why does Google use MediaPipe?

Google has been using MediaPipe for so long and mainly Google uses it for two tasks.

1. Dataset preparation for Machine learning training

Pose Estimation

Pose estimation means finding a person’s or an object’s key points. A person’s key points are elbow, knee, wrist, etc so MediaPipe can be used for training the ML model to learn the key points and further use the knowledge for specific tasks, this actually can be useful for action recognition.

2. ML inference pipelines

Live Data

ML inference is the process of running live data points.

Example: We all have used Snap_chat and Instagram filters and may have recorded videos, this is what ML inference means.

What is possible with MediaPipe?

There are a number of AI problems that can be done by MediaPipe. Here some are mentioned:

Object Tracking
Box Tracking
Face Mesh
Hair Segmentation
Live Hand Tracking and many more.

MediaPipe Hands: Real-Time Hand Tracking Project

Here I have developed the Live Hand Tracking project using MediaPipe.

Hand Tracking uses two modules on the backend

1. Palm detection

Works on complete image and crops the image of hands to just work on the palm.

2. Hand Landmarks

From the cropped image, the landmark module finds 21 different landmarks on the hand.

How to Install MediaPipe in Python?

For this specific task, we require three modules, cv2, MediaPipe, and time.

We can install all the modules/libraries of Python by installing pyforest in the Jupyter Notebook.

Once the modules are installed and the next time when this command is run, the output will be shown that (the requirements are already satisfied). See below in the image.

If MediaPipe is still not installed and does not work, install it separately because MediaPipe is the newest module maybe it is not yet included in the pyforest, as I thought to work directly on Kaggle notebook but found out that MediaPipe was not working, I installed it and worked on Jupyter Notebook, Jupyter Notebooks do not require internet it is a plus point.

This is how to install MediaPipe in Jupyter notebook.

How to Import Modules in Jupyter?

How to Camera Object in Python?

In the below code, I have created a camera object just to check if the camera is working properly.

Create a camera object

Here is the output.

How to Create Object from Class Hand?

Created a hand object from the hand class so that BGR image is converted to RGB, as the hands object only uses/accepts RGB.

Extracting Information from the object results

Before extracting hands further details, make sure there is something in the object (results), do this simple step, Use a print statement, and print the object result to see what it holds. It just shows MediaPipe solution-based solutions and nothing else even if the hand is shown.

Object Result

How to Check if the Hand is Being Detected or not?

Update print statement by putting (multi_hand_landmarks), and see if the camera is detecting hands.

Update print statement by putting (multi_hand_landmarks)

Now as I have updated the print statement, the information I am getting is “None” because no hand is shown.

Let’s see what information is extracted when hand/ hands are shown.

Hand is detected by the camera

So you see, when the hand is detected by the camera it gives some values.

How to Detect Landmarks and Draw points on Hand?

In the below code, the drawing object is created (mp_draw), further the if statement says that if the landmarks are detected the for loop will run and draw a point wherever landmark is detected.

Interesting right! See the image.

Landmarks are detected and points are drawn

How to Draw Connections Between Landmarks?

Connections are drawn by using a hand object (mp_hand.HAND_CONNECTIONS).

Frame Rate

For fps two variables are declared, p_time and c_time (previous and current time).

Frame Rate

Extracting value of each landmark

Just in case if any specific point is needed to be tracked for any purpose.

As we know there are 21 landmarks in a hand (0 to 20). The landmark information gives the x,y, and z coordinates with id which are listed in the correct order. We can use x and y coordinates to find the location of a landmark on hand.

Here firstly I have checked the height, width, and channels (h, w, c) of the image. In the previous code, I have got the decimal values and now I wanted exact integer values, therefore, I have converted the circle values (cx, cy) to integers.

id and coordinates

Drawing circle on a specific landmark

So for drawing, I have created a drawing object (mp_draw), further, I have declared an if condition for point 0 because I wanted a filled circle at the landmark 0.

Drawing circle on a specific landmark

High lighting fingertips

For fingertips, the landmarks are (4,8,12, 16, and 20). See the code in the below image.

High lighting fingertips

This is how we can use these landmarks for different tasks. Here I am ending the article also it’s not the end of the study there is still a lot to explore.

How to contact the Omdena Pakistan Chapter?

If you face any issues regarding any AI and ML project, or you want details about workshops, or you want to be part of any AI project and don’t know where to start you can instantly reach us for assistance. Our social media team is always active in helping Engineers and posting regarding upcoming workshops and ongoing projects. You can follow us on the below-mentioned pages to stay updated.

Facebook:Omdena Pakistan Chapter

Github:Qasim Hassan

Medium:Iqra Anwar

People also learn more:

Want to work with us too?

Let’s see if we are a good fit

FAQ

What is MediaPipe Python?

MediaPipe Python is Google’s open-source framework for building real-time computer vision and machine learning pipelines across platforms like Android, iOS, web, and desktop.

What can MediaPipe be used for?

It’s used for AI tasks such as hand tracking, face mesh, pose estimation, object tracking, and gesture recognition—making it ideal for vision-based applications.

Why does Google use MediaPipe?

Google uses MediaPipe for video content analysis, dataset preparation, and machine learning inference pipelines like those powering YouTube and AR filters.

How do I install MediaPipe in Python?

Use pip install mediapipe in your terminal or Jupyter Notebook. Make sure OpenCV (cv2) and time modules are also installed for real-time processing.

What libraries are needed for hand tracking?

You’ll need cv2, mediapipe, and time. Together, they enable real-time video capture, landmark detection, and tracking visualization.

Share this article

Share on LinkedIn, send by email, or copy the direct link.

LinkedIn Email

Technical Case Studies

A Simple Guide to Optimizing Memory Usage and Computation Time in Big Data

September 25, 2024

Real-World Tutorials

A Beginner’s Guide to Exploratory Data Analysis with Python

February 27, 2024

NLP

Best Topic Modeling Python Libraries Compared (+ Top NLP Projects)

July 5, 2022

Data Science

Top 10 GitHub Data Science Projects with Source Code in Python

June 15, 2022

Computer Vision

MediaPipe Python Tutorial [How to Install + Real-Time Hand Tracking Example]

December 30, 20217 min readUpdated October 31, 2025

Omdena Pakistan Chapter

In this tutorial, we will guide you on how to install MediaPipe Python step by step with an example Real-Time Hand Tracking Project.

In this tutorial, we’ll cover the following topics:

Understanding the MediaPipe API
Installing MediaPipe Python
Building a real-time hand-tracking application

Let’s start! 🙂

What is MediaPipe Python?

What do you think is common in all these pictures?

Think for a while and guess what is common in all images below!

Your guess is absolutely correct, module MediaPipe is common in all these images.