The article shows steps for deploying a model with flask, creating a Docker container so that it can be easily deployed in the cloud, and creating an offline pathology mobile app so that it can be used in places without an internet connection like Africa. Check the mobile app in the results below.


Authors: Gerald Okioma, Gowthami Wudaru, Shashi Gharti



Problem Statement

We have participated in Detecting Pathologies Through Computer Vision in Ultrasound Omdena challenge to build an Ultrasound solution that is able to detect the type and location of different pathologies. The solution works with 2D images and also is able to process a video stream.

Identify the presence of a specific pathology on the ultrasound image and provide the location of the pathology with bounding box coordinates and mask. Ultrasound is a relatively inexpensive and portable modality of diagnosis of life-threatening diseases and for use in point of care. This will assist to deliver impactful and feasible medical solutions to countries where there are significant resource challenges.


Inference Pipeline


We deploy a model using Docker container with REST-enabled services that receive an image, do some processing if needed, predict the model output, and sends as bytes or JSON.

Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure so you can deliver software quickly.

The model we are deploying is a Mask R-CNN (Region-Based Convolutional Neural Network) model that gives mask, bounded box coordinates, type of pathology, and their score (i.e probability).


Source: Omdena Inference Pipeline

Source: Omdena Inference Pipeline


Import the required libraries: flask is for API, flassger is for integration with swagger documentation, NumPy is for some array processing, PIL is for image processing and ONNX Runtime is a cross-platform inference and training machine-learning accelerator compatible with deep learning frameworks, PyTorch and TensorFlow/Keras, as well as classical machine learning libraries such as sci-kit learn, and more.

from flask import Flask, request, jsonify
from flasgger import Swagger
import numpy as np
from PIL import Image
import onnxruntime as rt

We initialize the flask app and write a template for swagger. We also write a function that checks the filename for type and allows only png, jpg, jpeg file types.

app = Flask(__name__)
swagger = Swagger(app, template={
 "swagger": "2.0",
 "info": {
  "title": "Inference",
  "version": "1.0.0"
ALLOWED_EXTENSIONS = {'png', 'jpg', 'jpeg'}
labels = ["Normal","Benign","Malignant"]
def allowed_file(filename):
  return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS

Load the model using onnxruntime. ONNX Runtime reads a model saved in ONNX format. The main class InferenceSession wraps loading and running models functionalities in a single place. And we can use run method of the class to compute the prediction. Model loading happens only when the docker container starts. We can use different models and different inference runtimes (such as onnx or tensorflow). When adding models the increased memory consumption and inference time should be considered according to the available resources for deployment.

model = rt.InferenceSession('model.onnx')# path is relative

The route decorator in Flask is used to bind URL to a function. We add a route and the method for it and define a function for it. The data inside “”” is used by swagger for showing the description and parameter and sample responses. We then check for the type of image file and return 400 if the input is not supported. Otherwise, we load the image and predict the outputs. We send mask as png image, label and bounded box as headers.

Flasgger comes with embedded Swagger UI so you can access http://localhost:5001/apidocs and visualize and interact with your API resources. It also provides validation of the incoming data. We can add what type of responses can be expected too. A response is defined by its HTTP status code and the data returned in the response body and/or headers.

def predict():
  Upload Image and get mask, bounded box coordinates and label     
    - in: formData
      name: image
      type: file
      required: true
      description: gets output
      description: input not supported
  image_file = request.files[‘image’]
  if image and allowed_file(image.filename):
    pil_img =
    image = np.array(pil_img)
    img = img[np.newaxis, ...]
    pred =,{'input': img.astype(np.float32)})
    data = pred[0]
    data = cv2.normalize(data, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
    data = data.transpose((1,2,0))
    mask = Image.fromarray(data,'RGB')
    output = io.BytesIO(), format="PNG")
    response = Response(output.getvalue(), mimetype = 'image/png')
    response.headers['Label'] = labels[pred[1]]
    response.headers['Bounded_box-Coordinates'] = pred[2]  
    return response, 200
  return 'only png, jpg, jpeg file types are allowed', 400

We can send it as JSON by converting the mask image to base64string. We need to import base64 as a dependency. The purpose of converting to base64 string is to send multiple images as output. We can add heat maps or mask outline as outputs and send them alongside the mask image.

    encoded_mask = base64.b64encode(output.getvalue())
    response = jsonify({'mask':encoded_mask.decode('utf-8')})

The flask app, port, environment are mentioned in .flaskenv file

And then we run the app

if __name__ == '__main__': = True)

Swagger URL: http://localhost:5001/apidocs/


Source: Omdena Swagger Documentation for Inference Pipeline - Pathology Mobile App

Source: Omdena Swagger Documentation for Inference Pipeline



Parameters specify the input. In the multipart form data, the incoming request should have an image file in binary format. We can use the requests library to send requests and open to read data from the file and send it.

import requests'',files = {'image': open('file_path/file_name.png','rb')})

The input and output are shown


Source: Omdena Inference Pipeline Inputs and Outputs

Source: Omdena Inference Pipeline Inputs and Outputs


Docker provides the ability to package and run an application in a loosely isolated environment called a container. The isolation and security allow you to run many containers simultaneously on a given host. Containers are lightweight and contain everything needed to run the application, so you do not need to rely on what is currently installed on the host.

Docker uses a client-server architecture. The Docker client talks to the Docker daemon, which does the heavy lifting of building, running, and distributing your Docker containers. A Docker registry stores Docker images. Docker Hub is a public registry that anyone can use. An image is a read-only template with instructions for creating a Docker container. A container is a runnable instance of an image.


Source: Docker Architecture

Source: Docker Architecture


We place all libraries and their versions in requirements.txt file and use Docker file to create docker image.

The requirements.txt


The Dockerfile

FROM python:3.8-slim-buster
WORKDIR /user/src/app

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD [ "flask", "run" ]

Build the image. A docker_id is only required if you want to push the image

export DOCKERID=docker_id
docker image build -tag $DOCKERID/project_name:project_version .

Push the image after building it

docker login
docker push $DOCKERID/project_name:project_version

Run or pull the image

docker run -p 5001:5001 — network 'host' -rm docker_id/project_name:project_version




Pathology Mobile App

An YOLOYv5s model was trained using Pytorch, and later converted to torchscript for deployment in Android. Android Studio was used in the development and Pytorch Android API was used to load, run the model, and get the predictions within the Android environment. The deployment workflow followed PyTorch’s mobile development guides.


PyTorch’s workflow for Android development and deployment (

PyTorch’s workflow for Android development and deployment (



Steps to deploy a trained model to a pathology mobile app (Android)

Step 1: As shown in the figure above, the first step is to convert the pytorch trained model to torch script. The pytorch code is then converted to serializable and optimizable models as follows

ts_module = torch.jit.trace(model, data)"")

Step 2: In android development editor, include the android torchvision dependencies in gradle. pytorch_android is the main dependency and pytorch_android_torchvision has the utility to convert Image and Bitmap to tensors.

dependencies {
   implementation 'org.pytorch:pytorch_android:1.7.0'
   implementation 'org.pytorch:pytorch_android_torchvision:1.7.0'
final Tensor itensor = TensorImageUtils.bitmapToFloat32Tensor(bmap, PrePostProcessor.NO_MEAN_RGB, PrePostProcessor.NO_STD_RGB);

Step 3: Loading the model and run the inference

mdl = PyTorchAndroid.loadModuleFromAsset(getAssets(), "");
IValue[] outputTuple = mdl.forward(IValue.from(inputTensor)).toTuple();



The benefit of deploying the model itself to a pathology mobile app without using any API is to make it work in the offline mode. The purpose was to empower the health workers working in remote areas by providing a tool that can work without an internet connection. And mobile is the best option for it which can be carried and transported easily.


Source: Omdena EndPoint and Pathology Mobile App

Source: Omdena EndPoint and Mobile App


Source: Omdena EndPoint and Pathology Mobile App

Source: Omdena EndPoint and Mobile App



The article showed steps for deploying the model with flask, creating a Docker container so that it can be easily deployed in the cloud, and creating an offline pathology mobile app so that it can be used in places without an internet connection like Africa.

Develop Your Career and Make a Real-World Impact


The world´s only place for truly collaborative AI projects to apply your skills on real-world data with changemakers from around the world.

Apply & grow your skills in our real-world projects

Upcoming AI Projects

AI Teams

Make an impact in our upcoming projects in Natural Language Processing, Computer Vision, Machine Learning, Remote Sensing, and more.

Check out our projects!

Stay in touch via our newsletter.

Be notified (a few times a month) about top-notch articles, new real-world projects, and events with our community of changemakers.

Sign up here