Applying commonly available Machine Learning-driven object detection frameworks to equirectangular projections images.

 

Author: Parit Mehta

 

What is a Panorama?

Imagine boarding a ferry to a picturesque island with a beautiful view of the ocean from the top of a hill on a sunny day. If you are like me, you have refused to be a click-happy tourist and always questioned whether selfie-sticks should exist, but you make an exception for the breathtaking ocean view in front of you. After all, it might be useful for all the computer vision experiments you’re working on! 

 

In fact, you want to capture the full landscape visible to you. So you whip out your smartphone, switch your camera to Panorama mode, and pan it across with the ocean in the backdrop. And lo, the whole wide landscape is now in your pocket! In this short imaginary escapade, you successfully captured “an unbroken view of the whole region surrounding an observer”, or a panorama. Let’s get into the weeds of this very familiar concept. 

Technically, a panorama has an aspect ratio of 2:1 or higher, which means at least twice as wide as its height. The angular extent of a panorama exceeds the typical human binocular field of view of 120° up to a full 360°.     

Tourism is however far from the only place where panoramas have relevance. From virtual reality to architecture, road infrastructure and urban planning, satellite images of the Earth and even Marscapes make use of this wide image format. As panoramic photography improves in resolution and becomes more and more accessible with consumer devices, the time is right to adapt Machine Learning-based methods to such images. 

 

One such use case encountered was during Omdena’s challenge ‘Rating Road Safety Through Machine Learning to Prevent Road Accidents’ in partnership with iRAP, where a team of 31 AI experts and learners collaborated to build an automated rating system to make the world’s roads safer.  An important task of this project was to leverage the larger perspective offered by panoramas is useful to accurately assess road usage and infrastructure. It is therefore vital to have a look at how object detection in panoramas differs from normal 2D images and how it can be achieved with commonly available Machine Learning frameworks.

      

Line-bending: working with panoramic projections

Panorama can contain up to the full 360° field of view in all directions around an observer allowing full or partial 3D scenes to be projected onto 2D surfaces. When we capture panoramic pictures, we tend to rotate the camera while taking the picture as we stay in the same position. The camera can be thought of as moving along the curve of a sphere, projecting the spherical image on a resulting plane picture.   

Note: If you are interested in detecting vehicles in 2D images, a full tutorial with tools and examples is here.

 

There are many ways to project spherical images onto a plane. For the case of panoramic images, we use equirectangular projection, which is quite simply the full view around the observer i.e. 360° around and 180° vertically. Fig. 1 shows a few examples of equirectangular panoramas.  

Fig.1: Equirectangular panoramas from videos captured and uploaded by users on Youtube. Source: Object Detection in Equirectangular Panorama

Fig.1: Equirectangular panoramas from videos captured and uploaded by users on Youtube. Source: Object Detection in Equirectangular Panorama

As you can see in Fig. 1, objects and people in equirectangular panoramas can appear to be distorted to the human eye. One of the challenges in object detection in panoramic images is in fact that pre-existing neural networks like the YOLO detector are trained on undistorted 2D images. As training, the detector with custom annotated data is not always an option, what if we could convert equirectangular panoramas to be projected such that objects are not distorted anymore? This is exactly the approach employed by the authors of the academic paper ‘Object Detection in Equirectangular Panorama’. We can then use a pre-trained YOLO detector, substantially simplifying the task!    

In the paper, the authors attempt to re-project equirectangular panoramas in two different ways. We will dive into a bit of geometry to understand the mathematical relation between projections, which will be in turn used in the Python code. An illustration of the problem is shown in Fig. 2.  

Fig. 2:  Illustration showing a general projection from a sphere to a plane surface. On the left, the plane marked by a dashed boundary is placed tangent to a sphere. The axes x, y, and z have their usual meanings with positive values to the right of the origin. The radius r of the sphere is set to 1. The point P(xp, yp, 1) on the plane is a projection of a point p(θ, φ) on the sphere where θ and φ are projection angles. On the right is a 2D representation of the same projection. Source: Object Detection in Equirectangular Panorama     

Fig. 2:  Illustration showing a general projection from a sphere to a plane surface. On the left, the plane marked by a dashed boundary is placed tangent to a sphere. The axes x, y, and z have their usual meanings with positive values to the right of the origin. The radius r of the sphere is set to 1. The point P(xp, yp, 1) on the plane is a projection of a point p(θ, φ) on the sphere where θ and φ are projection angles. On the right is a 2D representation of the same projection. Source: Object Detection in Equirectangular Panorama

 

The point P(xp, yp, 1) on the viewing plane is a projection of a point p(θ, φ) on the sphere and are related by the following equations:

Let's call this equation 1

Let’s call this equation 1

The point at the distance d from the center of the sphere going opposite to the direction of the plane is called the projection center. If the projection center is at the center of the sphere such that d = 0, we obtain a perspective projection on the plane. On the other hand, if d = 1, the projection on the plane is stereographic

 

Let us summarize the above in one short swoop. Viewed from one side of the imaginary sphere, the points on the other end will result in a stereographic projection, whereas if viewed from the center of the sphere we obtain a perspective projection. Fig. 3 shows the same sample image using both projection styles.

Fig. 3: Perspective (left) and stereographic (right) projections of the same spherical image. Source: 3D Object Detection in Equirectangular Panorama

Fig. 3: Perspective (left) and stereographic (right) projections of the same spherical image. Source: Object Detection in Equirectangular Panorama

Objects in the stereographic projection appear to preserve more information about the objects displayed in the picture, whereas straight lines appear to be more distorted than in the perspective projection. Since we are more interested in detecting objects, we will move ahead with converting equirectangular panoramas to the stereographic projection (some of the code below is sourced from GitHub). 

   
Start with importing the necessary libraries

import sys
import numpy as np
import cv2
from math import pi, atan, cos, sin, acos, sqrt, tan
from scipy.interpolate import RectBivariateSpline

 

Let us first create a function ‘projection_angle’ with arguments x (coordinate) and d (distance to the projection center) to calculate the projection angles θ or φ in Eqs. (1). Note that x and φ are used interchangeably with x and θ for this particular function. The maximum value of x in Eqs. (1), defined by the variable x_max below, corresponds to φ = 90°. When φ is less than 90°, Eqs. (1) are solved for φ, for which we define variables for the numerator and the denominator by mathematically manipulating Eqs. (1). Subsequently, the projection angle is defined by project_angle below. 

def projection_angle(x, d): 
     x_max = (1 + d) / d 
     numerator = -2 * d * x ** 2 + 2 * (d + 1) * sqrt((1 - d ** 2) * x ** 2 + (d + 1) ** 2)
     denominator = 2 * (x ** 2 + (d + 1) ** 2)

     if 0 < x < x_max:
        project_angle = acos(numerator / denominator)
    elif x < 0:
        project_angle = - acos(numerator / denominator)
    elif x == x_max:
        project_angle = pi/2.
    else:
        raise Exception('invalid input args')

    return project_angle

Now we define another function, panotostereo, to split the panoramic image to multiple stereographic projections.

def panotostereo(panorama, distance):

    frames = []
    input_img = panorama
    height, width, _ = input_img.shape
    d = distance
    xp_max = (1 + d) / d  
    yp_max = (1 + d) / d  
    xp_domain = xp_max * (np.arange(-1., 1., 2. / height) + 1.0 / height)
    yp_domain = yp_max * (np.arange(-1., 1., 2. / height) + 1.0 / height)
    """Get the radian values of each pixel in the image with delta_rad"""
    delta_rad = 2 * pi / width  
    
    for face in range(4):
        print('generating stereo image', face)
        output_img = np.zeros((height, height, 3))
        
        """Use scipy's interpolation function"""
        interpolate_0 = RectBivariateSpline(np.arange(height), np.arange(width), input_img[:, :, 0])
        interpolate_1 = RectBivariateSpline(np.arange(height), np.arange(width), input_img[:, :, 1])
        interpolate_2 = RectBivariateSpline(np.arange(height), np.arange(width), input_img[:, :, 2])
        pano_x = np.zeros((height, 1))
        pano_y = np.zeros((height, 1))

        for j, xp in enumerate(xp_domain):
            phi = projection_angle(xp, d)
            pano_x[j] = (width / 2.0 + (phi / delta_rad))

        for i, yp in enumerate(yp_domain):
            theta = projection_angle(yp, d)
            pano_y[i] = height/2.0 + (theta/delta_rad)

        output_img[:, :, 0] = interpolate_0(pano_y, pano_x)
        output_img[:, :, 1] = interpolate_1(pano_y, pano_x)
        output_img[:, :, 2] = interpolate_2(pano_y, pano_x)

        cv2.imwrite('split_'+str(face)+'_'+str(d)+'.jpg', output_img)
        frames.append(output_img)
        input_img = np.concatenate(
            (input_img[:, int(width/4):, :], input_img[:, :int(width/4), :]), axis=1)
    return frames

Let’s now check the code by applying the function to a sample panorama named ‘sample.png’ for convenience. This will return four images, each spanning a horizontal field-of-view of 90°.

 

im = cv2.imread('sample.png')  
projection = panotostereo(im, 2) 

 

Both the sample image and the resulting stereographic projections processed in the above code are shown in Fig. 4 (sample image on top followed by converted stereographic images).

Fig. 4: Sample panorama (top image) converted to stereographic images (bottom four)

Fig. 4: Sample panorama (top image) converted to stereographic images (bottom four)

Stereographic images - Source: Omdena

Stereographic images – Source: Omdena

Stereographic images - Source: Omdena

Stereographic images – Source: Omdena

Stereographic images - Source: Omdena

Stereographic images – Source: Omdena

Stereographic images - Source: Omdena

Stereographic images – Source: Omdena

  • Non-Max Suppression
  • Can both image/video data be dealt with, if yes, how easy is it to integrate GPU functionality?
  • Does it enable custom training of pre-trained models?
  • Is it compatible with the format of annotated training data?
  • Ease of further customization of code for API integration etc.
  • Model interchangeability (e.g. swapping YOLOv3 with other models without affecting too much of the code)
!pip3 install tensorflow==2.4.0 keras==2.4.3 numpy==1.19.3 pillow==7.0.0 scipy==1.4.1 h5py==2.10.0 matplotlib==3.3.2 
!pip3 install opencv-python keras-resnet==0.2.0 tensorflow-gpu==2.4.0
!pip3 install imageai --upgrade
import glob
import os
import itertools
import cv2
import wget
from imageai.Detection import ObjectDetection
import numpy as np
import pandas as pd

 

Assuming the home directory is the same as for the output images from the previous section, we simply import them using the ‘glob’ function.

filenames_input_image = glob.glob(os.path.join('split_*.jpg')) 

"""Check the imported filenames"""
filenames_input_image

To use pre-trained Neural Network models, the corresponding YOLO and ResNet files must be downloaded. The download links must be updated from the ImageAI documentation (https://imageai.readthedocs.io/en/latest/detection/index.html) in case the download link is outdated.

"""By including an if statement, we make sure to not download the detector files if they already exist 
in the home directory of the Python notebook"""

if os.path.isfile("yolo.h5")!=True:
    #download trained yolov3 model if not already downloaded      
    wget.download('https://github.com/OlafenwaMoses/ImageAI/releases/download/1.0/yolo.h5') 
if os.path.isfile("resnet50_coco_best_v2.1.0.h5")!=True:        
    wget.download('https://github.com/OlafenwaMoses/ImageAI/releases/download/essentials-v5/resnet50_coco_best_v2.1.0.h5/')

 

Let us now define a function ‘imagedetectionyolo()’ to wrap the YOLOv3 based detection model into a functioning detector that we can apply to the stereographic images created by ‘panortostereo’.

def imagedetectionyolo():

    detector = ObjectDetection()
    detector.setModelTypeAsYOLOv3() #initialize the model
    detector.setModelPath("yolo.h5")
    detector.loadModel()
    
    detected_object_list = []
    list_counts = []
    question = input("Do you want an image-by-image list of all objects? (y/n)")
    
    
    for image in itertools.islice(filenames_input_image, 0, 10): 
        #loop over input images, limited to 10 for simplicity. the numbers can be changed.
        detection_dict = detector.detectObjectsFromImage(input_image=image, 
                                                         output_image_path=os.path.join(os.path.basename(image)), 
                                                         minimum_percentage_probability=30)  
        #parameter 'minimum_percentage_probability' in the function 'detectObjectsFromImage' is used 
        #to determine the integrity of the detection results. Lowering the value shows more objects while 
        #increasing the value ensures objects with the highest accuracy are detected. 
        
        """The parameter 'minimum_percentage_probability' in the function 'detectObjectsFromImage' above is used 
        to determine the integrity of the detection results (where a high probability of detection for a specific 
        object corresponds to high integrity). Effectively, lowering the minimum percentage value shows more 
        detections of a specific object while increasing the value ensures objects are detected with a higher 
        accuracy. It acts as a threshold to filter out objects that might be ambiguous. For example, the 
        detector may be 10% sure that a Yacht looks like a Car, in which case we can use the threshold to 
        filter it out and save computing time."""
        
        """It is useful in many cases to count the numbers of detected objects for further analysis. 
        We can now use Python’s Pandas library to create lists of counts per unique object for every image, 
        total counts per unique object for all images combined and finally a list of all detected objects. 
        To save computation time, the 'question' variable prompts the user to indicate with yes(y) or no(n) 
        if a list of all objects should be printed."""
        
        detections = pd.DataFrame(detection_dict)
        
        if detections.empty == False:
            detections.index = [os.path.basename(image)] * len(detections)

            df1 = detections['name'].value_counts().rename_axis('object').reset_index(name='counts')
            df1.index = [os.path.basename(image)] * len(df1)
            list_counts.append(df1)

        
            detected_object_list.append(detections)
    
    
    list_counts = pd.concat(list_counts) 
    print(" ")
    print(" ")
    print("Counts per unique object for every image")
    print(" ")
    display(list_counts)
    
    detected_object_list = pd.concat(detected_object_list)
    print(" ")
    print(" ")
    print("Total counts per unique object for all images combined")      
    print(" ")
    print(detected_object_list['name'].value_counts())
    
    
    if question == 'y':
        print(" ")
        print(" ")
        print("All detected objects")
        display(detected_object_list)

The resulting images with object detections are shown in Fig. 5. The lists of object counts are shown in Figs. 6 and 7.

Fig. 5: 3D Object detection on converted stereographic images - Source: Omdena

Fig. 5: 3D Object detection on converted stereographic images – Source: Omdena

3D Object detection on converted stereographic images - Source: Omdena

3D Object detection on converted stereographic images – Source: Omdena

3D Object detection on converted stereographic images - Source: Omdena

3D Object detection on converted stereographic images – Source: Omdena

3D Object detection on converted stereographic images - Source: Omdena

3D Object detection on converted stereographic images – Source: Omdena

3D Object detection on converted stereographic images - Source: Omdena

3D Object detection on converted stereographic images – Source: Omdena

Fig. 6: Lists of object detection - Source: Omdena

Fig. 6: Lists of object detection – Source: Omdena

Fig. 7: Truncated list of all detected objects with probability of detection and pixel coordinates of the corresponding bounding boxes

Fig. 7: Truncated list of all detected objects with the probability of detection and pixel coordinates of the corresponding bounding boxes

Task successfully! 

 

Develop Your Career and Make a Real-World Impact

Innovation

The world´s only place for truly collaborative AI projects to apply your skills on real-world data with changemakers from around the world.

Apply & grow your skills in our real-world projects

Upcoming AI Projects

AI Teams

Make an impact in our upcoming projects in Natural Language Processing, Computer Vision, Machine Learning, Remote Sensing, and more.

Check out our projects!

Stay in touch via our newsletter.

Be notified (a few times a month) about top-notch articles, new real-world projects, and events with our community of changemakers.

Sign up here