Applying commonly available Machine Learning-driven object detection frameworks to equirectangular projections images.
Author: Parit Mehta
What is a Panorama?
Imagine boarding a ferry to a picturesque island with a beautiful view of the ocean from the top of a hill on a sunny day. If you are like me, you have refused to be a click-happy tourist and always questioned whether selfie-sticks should exist, but you make an exception for the breathtaking ocean view in front of you. After all, it might be useful for all the computer vision experiments you’re working on!
In fact, you want to capture the full landscape visible to you. So you whip out your smartphone, switch your camera to Panorama mode, and pan it across with the ocean in the backdrop. And lo, the whole wide landscape is now in your pocket! In this short imaginary escapade, you successfully captured “an unbroken view of the whole region surrounding an observer”, or a panorama. Let’s get into the weeds of this very familiar concept.
Technically, a panorama has an aspect ratio of 2:1 or higher, which means at least twice as wide as its height. The angular extent of a panorama exceeds the typical human binocular field of view of 120° up to a full 360°.
Tourism is however far from the only place where panoramas have relevance. From virtual reality to architecture, road infrastructure and urban planning, satellite images of the Earth and even Marscapes make use of this wide image format. As panoramic photography improves in resolution and becomes more and more accessible with consumer devices, the time is right to adapt Machine Learning-based methods to such images.
One such use case encountered was during Omdena’s challenge ‘Rating Road Safety Through Machine Learning to Prevent Road Accidents’ in partnership with iRAP, where a team of 31 AI experts and learners collaborated to build an automated rating system to make the world’s roads safer. An important task of this project was to leverage the larger perspective offered by panoramas is useful to accurately assess road usage and infrastructure. It is therefore vital to have a look at how object detection in panoramas differs from normal 2D images and how it can be achieved with commonly available Machine Learning frameworks.
Line-bending: working with panoramic projections
Panorama can contain up to the full 360° field of view in all directions around an observer allowing full or partial 3D scenes to be projected onto 2D surfaces. When we capture panoramic pictures, we tend to rotate the camera while taking the picture as we stay in the same position. The camera can be thought of as moving along the curve of a sphere, projecting the spherical image on a resulting plane picture.
Note: If you are interested in detecting vehicles in 2D images, a full tutorial with tools and examples is here.
There are many ways to project spherical images onto a plane. For the case of panoramic images, we use equirectangular projection, which is quite simply the full view around the observer i.e. 360° around and 180° vertically. Fig. 1 shows a few examples of equirectangular panoramas.
As you can see in Fig. 1, objects and people in equirectangular panoramas can appear to be distorted to the human eye. One of the challenges in object detection in panoramic images is in fact that pre-existing neural networks like the YOLO detector are trained on undistorted 2D images. As training, the detector with custom annotated data is not always an option, what if we could convert equirectangular panoramas to be projected such that objects are not distorted anymore? This is exactly the approach employed by the authors of the academic paper ‘Object Detection in Equirectangular Panorama’. We can then use a pre-trained YOLO detector, substantially simplifying the task!
In the paper, the authors attempt to re-project equirectangular panoramas in two different ways. We will dive into a bit of geometry to understand the mathematical relation between projections, which will be in turn used in the Python code. An illustration of the problem is shown in Fig. 2.
The point P(xp, yp, 1) on the viewing plane is a projection of a point p(θ, φ) on the sphere and are related by the following equations:
The point at the distance d from the center of the sphere going opposite to the direction of the plane is called the projection center. If the projection center is at the center of the sphere such that d = 0, we obtain a perspective projection on the plane. On the other hand, if d = 1, the projection on the plane is stereographic.
Let us summarize the above in one short swoop. Viewed from one side of the imaginary sphere, the points on the other end will result in a stereographic projection, whereas if viewed from the center of the sphere we obtain a perspective projection. Fig. 3 shows the same sample image using both projection styles.
Objects in the stereographic projection appear to preserve more information about the objects displayed in the picture, whereas straight lines appear to be more distorted than in the perspective projection. Since we are more interested in detecting objects, we will move ahead with converting equirectangular panoramas to the stereographic projection (some of the code below is sourced from GitHub).
Start with importing the necessary libraries
import sys import numpy as np import cv2 from math import pi, atan, cos, sin, acos, sqrt, tan from scipy.interpolate import RectBivariateSpline
Let us first create a function ‘projection_angle’ with arguments x (coordinate) and d (distance to the projection center) to calculate the projection angles θ or φ in Eqs. (1). Note that x and φ are used interchangeably with x and θ for this particular function. The maximum value of x in Eqs. (1), defined by the variable x_max below, corresponds to φ = 90°. When φ is less than 90°, Eqs. (1) are solved for φ, for which we define variables for the numerator and the denominator by mathematically manipulating Eqs. (1). Subsequently, the projection angle is defined by project_angle below.
def projection_angle(x, d): x_max = (1 + d) / d numerator = -2 * d * x ** 2 + 2 * (d + 1) * sqrt((1 - d ** 2) * x ** 2 + (d + 1) ** 2) denominator = 2 * (x ** 2 + (d + 1) ** 2) if 0 < x < x_max: project_angle = acos(numerator / denominator) elif x < 0: project_angle = - acos(numerator / denominator) elif x == x_max: project_angle = pi/2. else: raise Exception('invalid input args') return project_angle
Now we define another function, panotostereo, to split the panoramic image to multiple stereographic projections.
def panotostereo(panorama, distance): frames =  input_img = panorama height, width, _ = input_img.shape d = distance xp_max = (1 + d) / d yp_max = (1 + d) / d xp_domain = xp_max * (np.arange(-1., 1., 2. / height) + 1.0 / height) yp_domain = yp_max * (np.arange(-1., 1., 2. / height) + 1.0 / height) """Get the radian values of each pixel in the image with delta_rad""" delta_rad = 2 * pi / width for face in range(4): print('generating stereo image', face) output_img = np.zeros((height, height, 3)) """Use scipy's interpolation function""" interpolate_0 = RectBivariateSpline(np.arange(height), np.arange(width), input_img[:, :, 0]) interpolate_1 = RectBivariateSpline(np.arange(height), np.arange(width), input_img[:, :, 1]) interpolate_2 = RectBivariateSpline(np.arange(height), np.arange(width), input_img[:, :, 2]) pano_x = np.zeros((height, 1)) pano_y = np.zeros((height, 1)) for j, xp in enumerate(xp_domain): phi = projection_angle(xp, d) pano_x[j] = (width / 2.0 + (phi / delta_rad)) for i, yp in enumerate(yp_domain): theta = projection_angle(yp, d) pano_y[i] = height/2.0 + (theta/delta_rad) output_img[:, :, 0] = interpolate_0(pano_y, pano_x) output_img[:, :, 1] = interpolate_1(pano_y, pano_x) output_img[:, :, 2] = interpolate_2(pano_y, pano_x) cv2.imwrite('split_'+str(face)+'_'+str(d)+'.jpg', output_img) frames.append(output_img) input_img = np.concatenate( (input_img[:, int(width/4):, :], input_img[:, :int(width/4), :]), axis=1) return frames
Let’s now check the code by applying the function to a sample panorama named ‘sample.png’ for convenience. This will return four images, each spanning a horizontal field-of-view of 90°.
im = cv2.imread('sample.png') projection = panotostereo(im, 2)
Both the sample image and the resulting stereographic projections processed in the above code are shown in Fig. 4 (sample image on top followed by converted stereographic images).
Bravo! Object detection on the converted stereographic images commences.
Detecting objects with ImageAI
How does one select the best object detection framework? Some factors that influence this decision would be:
- Non-Max Suppression
- Can both image/video data be dealt with, if yes, how easy is it to integrate GPU functionality?
- Does it enable custom training of pre-trained models?
- Is it compatible with the format of annotated training data?
Slightly lower but still high priority
- Modularity (separating the functionality of a program into independent, interchangeable modules)
- Ease of further customization of code for API integration etc.
- Model interchangeability (e.g. swapping YOLOv3 with other models without affecting too much of the code)
ImageAI is one python library that covers most of the above aspects. It enables building a robust object detection system with a few lines of code. As per the ImageAI documentation, the ObjectDetection class enables performing object detection on any image or set of images, using pre-trained models trained on the COCO dataset such that 80 different kinds of common everyday objects are recognized. The models supported are RetinaNet, YOLOv3 and TinyYOLOv3.
We first import and install the necessary packages as usual.
!pip3 install tensorflow==2.4.0 keras==2.4.3 numpy==1.19.3 pillow==7.0.0 scipy==1.4.1 h5py==2.10.0 matplotlib==3.3.2 !pip3 install opencv-python keras-resnet==0.2.0 tensorflow-gpu==2.4.0 !pip3 install imageai --upgrade import glob import os import itertools import cv2 import wget from imageai.Detection import ObjectDetection import numpy as np import pandas as pd
Assuming the home directory is the same as for the output images from the previous section, we simply import them using the ‘glob’ function.
filenames_input_image = glob.glob(os.path.join('split_*.jpg')) """Check the imported filenames""" filenames_input_image
To use pre-trained Neural Network models, the corresponding YOLO and ResNet files must be downloaded. The download links must be updated from the ImageAI documentation (https://imageai.readthedocs.io/en/latest/detection/index.html) in case the download link is outdated.
"""By including an if statement, we make sure to not download the detector files if they already exist in the home directory of the Python notebook""" if os.path.isfile("yolo.h5")!=True: #download trained yolov3 model if not already downloaded wget.download('https://github.com/OlafenwaMoses/ImageAI/releases/download/1.0/yolo.h5') if os.path.isfile("resnet50_coco_best_v2.1.0.h5")!=True: wget.download('https://github.com/OlafenwaMoses/ImageAI/releases/download/essentials-v5/resnet50_coco_best_v2.1.0.h5/')
Let us now define a function ‘imagedetectionyolo()’ to wrap the YOLOv3 based detection model into a functioning detector that we can apply to the stereographic images created by ‘panortostereo’.
def imagedetectionyolo(): detector = ObjectDetection() detector.setModelTypeAsYOLOv3() #initialize the model detector.setModelPath("yolo.h5") detector.loadModel() detected_object_list =  list_counts =  question = input("Do you want an image-by-image list of all objects? (y/n)") for image in itertools.islice(filenames_input_image, 0, 10): #loop over input images, limited to 10 for simplicity. the numbers can be changed. detection_dict = detector.detectObjectsFromImage(input_image=image, output_image_path=os.path.join(os.path.basename(image)), minimum_percentage_probability=30) #parameter 'minimum_percentage_probability' in the function 'detectObjectsFromImage' is used #to determine the integrity of the detection results. Lowering the value shows more objects while #increasing the value ensures objects with the highest accuracy are detected. """The parameter 'minimum_percentage_probability' in the function 'detectObjectsFromImage' above is used to determine the integrity of the detection results (where a high probability of detection for a specific object corresponds to high integrity). Effectively, lowering the minimum percentage value shows more detections of a specific object while increasing the value ensures objects are detected with a higher accuracy. It acts as a threshold to filter out objects that might be ambiguous. For example, the detector may be 10% sure that a Yacht looks like a Car, in which case we can use the threshold to filter it out and save computing time.""" """It is useful in many cases to count the numbers of detected objects for further analysis. We can now use Python’s Pandas library to create lists of counts per unique object for every image, total counts per unique object for all images combined and finally a list of all detected objects. To save computation time, the 'question' variable prompts the user to indicate with yes(y) or no(n) if a list of all objects should be printed.""" detections = pd.DataFrame(detection_dict) if detections.empty == False: detections.index = [os.path.basename(image)] * len(detections) df1 = detections['name'].value_counts().rename_axis('object').reset_index(name='counts') df1.index = [os.path.basename(image)] * len(df1) list_counts.append(df1) detected_object_list.append(detections) list_counts = pd.concat(list_counts) print(" ") print(" ") print("Counts per unique object for every image") print(" ") display(list_counts) detected_object_list = pd.concat(detected_object_list) print(" ") print(" ") print("Total counts per unique object for all images combined") print(" ") print(detected_object_list['name'].value_counts()) if question == 'y': print(" ") print(" ") print("All detected objects") display(detected_object_list)
Calling the detection function will result in a y/n prompt. Input the desired value to proceed.
The resulting images with object detections are shown in Fig. 5. The lists of object counts are shown in Figs. 6 and 7.
Fig. 5: 3D Object detection on converted stereographic images – Source: Omdena
We must look out for the quality of data before implementing detection algorithms. In this blog post, we developed a simple detection system by manipulating Panoramic image data into a form that pre-trained detectors like YOLO are proven to be good at, which ensures the best chance of minimizing errors. One can additionally compare that to object detections performed directly on Panoramic images without changing the style of projections.
In summary, in this tutorial we successfully converted 360° panoramic images to stereographic images, allowing control over distortions in panoramic images and the use of powerful pre-existing and pre-trained computer vision networks to detect objects of interest.
Note: ImageAI is set to switch to a Pytorch backend later this year. Therefore, readers are encouraged to check the ImageAI documentation regularly to implement the necessary changes in the code included in this post.