Deploying your machine learning model is integrating it with an existing environment in production which can take inputs on which your model was trained and give its prediction as an output to the end-user. However, some of us might find the task of model deployment a bit tedious considering all the technicalities related to the web interface involved in it.
We have divided this article into 3 sections:
1. Model Building
2. Model Deployment on Mia
Let’s get started!
For Omdena’s Katapault challenge, 50 collaborators came together to build an MVP that can predict the success rate of a startup to reduce bias in the decisions while selecting a startup for funding rounds. The problem statement stated that,
Research on early-stage startups is sparse and these startups often do not have financial information readily available, which is commonly used by investors to influence decision-making regarding investments. Numerous impact startups could lose out on funding due to lack of data points commonly used as benchmarks in the industry.
Therefore, the primary objective is to create a Minimum Viable Product (MVP) tool that can predict an early-stage startup’s success rate based on a list of quantifiable success factors as input. Through this tool, investors are able to reduce bias in their decision-making, which ensures fairness, objectivity, and equality in measuring the success rates of different early-stage startups.
Backed up with proper research, a well-defined list of factors defining the success of a startup was curated.
We have built many apps using different models for the above challenge. All of these apps were deployed using Mia. A baseline model was developed using some of the many success factors as a feature of the startup. To read more about data collection and the whole project pipeline, check here.
The first app (XGBoost)
The features that were selected to train the baseline model are:
1. Number of Employees
2. Age of the startup
3. Business stage
For the target variable, we have considered the ‘Total Funding’ feature from the available dataset, and created a new column ‘raised’ where the fundings equal to or more than 2 million are represented as ‘1’ and ‘0’ otherwise. We have selected this as a target variable by taking reference from this article which says that if an early-stage startup is successful in raising 2 million or more, then it has better chances of moving to the next stage of funding rounds.
Once the data was preprocessed it was trained with the following models:
1. XGBoost (testing accuracy: 95.02%)
2. Artificial Neural Network (testing accuracy: 92.097%)
Model Deployment on Mia
Once you have your machine learning model ready, you are just a few easy steps away from deploying it on Mia.
Let us now understand how to deploy a simple machine learning model on Mia. Before you start deploying your model make sure that your model meets the following requirements:
- Regression or classification model with one or multiple numerical or categorical inputs and a single output.
- Text classification model that takes a single string of text as an input. Any text transformation steps must be built directly into the model itself (e.g. using a scikit-learn pipeline).
- Built on one of the following versions of TensorFlow, scikit-learn or XGBoost with Python 3.7 or 3.5:
- Model files less than 32 MB exported into one of the following formats:
- One-hot encoding or label encoding for categorical inputs.
- Does not require normalized inputs.
For our challenge we developed a XGBoost model which was exported as follows:
After saving your model, you can log into or sign up to Mia. Logging in to Mia will bring you to its homepage, from where you can select an option to upload your model.
Once you select ‘Upload a model’, you’ll be redirected to fill in some details about your model. You can name your model as per your choice and you also have an option to add a description of your model. Giving a version name to your model e.g version 0.0.1 will help you track your model’s progress as and when you update your existing model, along with its description which is optional.
In the next step, you’ll be asked to mention the details like the name and version of the framework that you have used to build your model. In our case we used XGBoost version 0.90.
After filling in all the necessary details, now you just have to upload your model. In this case a ‘model.bst’ file. Then select ‘UPLOAD MODEL’ to use it to develop an interesting app that can be used by end-users. Ta-da! Wasn’t the entire process super easy?
Now that you have deployed your model on Mia, let us see how to use it to create an interactive web app without a single line of code.
Select ‘Create an App’ option once you are done uploading your model, which will land you on a page where you’ll need to fill a form that best describes your app.
From the below image, you’ll see that you are asked to enter the name of your app, its description, the option to keep it private, public or share it with specific people and it also asks you whether you want to add the app to a page. To explain you in brief, Mia provides you with an option to create a page, more like a portfolio where you can keep all your apps in one place! So, if you wish to add your app to any page that you have created or that has been shared with you, you can select it from the dropdown or keep it to its default choice ‘None’.
Later on, you’ll have to select the model and its version from the models that you uploaded to Mia. You have the option to mention the accuracy of the model or add some links to the dataset that you used, the code file, or an article you might want to add to your app. There’s a lot that you can add here, Mia gives you the independence to make your app customized and easy for users to understand.
Now, you’ll be prompted to provide inputs to your web app. You can add as much input as your model requires to give the desired output/prediction. Mia gives you the choice to include inputs of type:
1. Slider (number)
4. Multiple Choice
For our app, we used several input options like Slider for the Age of the Company and Number of Employees, Drop down for Industry, Business Type and Location, and Number for Annual Revenue.
You should remember that whenever you select dropdown as an input choice, make sure that you have treated those features either using One Hot encoding or Label Encoding, as these two are supported by Mia.
After getting your input right, you’ll be redirected to select the format of your prediction, i.e. whether the final result is going to be a regression or classification type.
Our model uses classification to tell whether a startup in consideration will be able to raise more than 2 million dollars or not. Therefore, we select the format of prediction as type ‘Classification’. Considering it as a classification problem one might expect it to give output in the form of 1 or 0, or ‘yes’ or ‘no’, however, our challenge required the prediction to be in the form of probabilistic value that will tell how likely it is for a startup to raise the given amount? And with Mia you have an option to show your classification result as a probability value, you just have to choose whether you want your output to be in the form of probability or a label.
Then you can set a threshold for your classification results and provide a customized statement that you want to be displayed when the app gives results for its classification.
A newly added feature of Mia lets you add a scatter plot next to your model. We have displayed visualization of the distribution of total funds of different startups from around the globe (these details have not been shown here as they are confidential to the owner). Now that you have provided all the necessary details that are required for Mia to build your app, you are just one click away from creating your very own app that uses your machine model to give results. Easy-peasy, ain’t it?
This was about deploying XGBoost models, you’ll come across similar steps to deploy your models built using Sci-kit learn or Tensorflow. You just have to take care while loading your model in the format that is required.
For the same app, we also deployed a model using TensorFlow,
Below is an image of our app deployed on Mia, using Tensorflow framework.
The second app (Scikit-learn)
What is Scikit-learn or Sklearn?
Scikit-learn is probably the most useful library for machine learning in Python. The sklearn library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering, and dimensionality reduction.
Please note that sklearn is used to build machine learning models. It should not be used for reading the data, manipulating, and summarizing it. There are better libraries for that (e.g. NumPy, Pandas, etc.)
You can learn more about it from Here.
How do I import Sklearn?
pip install scikit-learn
If you want to install a specific version of scikit-learn, you can code:
pip install scikit-learn==0.23.1 (You can change the number of version as you like)
conda install scikit-learn
We collected our data from this site.
Data Cleaning and preprocessing
We chose integer numerical variables, not floating variables because we need accurate results using Logistic regression, result if true (above 50%), and result if false (below 50%).
Our data contains the Worldwide Governance Indicators (WGI) project. The WGI cover over 200 countries and territories, measuring six dimensions of governance starting in 1996: Voice and Accountability, Political Stability and Absence of Violence/Terrorism, Government Effectiveness, Regulatory Quality, Rule of Law, and Control of Corruption. The aggregate indicators are based on several hundred individual underlying variables, taken from a wide variety of existing data sources. The data reflect the views on governance of survey respondents and public, private, and NGO sector experts worldwide.
There are more than 15 research studies here that confirm the importance of these indicators on startups’ performance.
- We used big data that contains 214 countries, 6 success factors (27644 rows and 5 columns).
Data Exploration (EDA) and Feature Engineering
Feature Selection Method
The features that were selected to train the baseline model are:
We add High_Rank to our dataset in order to classify percentile rank using logistic regression.
Our target variable is High_Rank equal to or more than 50% is represented as ‘1’ and ‘0’.
Percentile ranks are often expressed as a number between 1 and 99, with 50 being the average.
Year is the time between 1996 and 2019.
Country contains 214 countries.
Indicator is a performance indicator of startups that contains 6 different success factors (Control_of_Corruption, Rule_of_Law, Political_Stability, Government_Effectivness, Regulatory_Quality, Political Stability and Absence of Violence/Terrorism).
This is a classification predictive modeling problem with numerical input variables.
- ANOVA correlation coefficient (linear).
- A test Logistic regression problem is prepared using the make_classification() function.
- Numerical Input, Categorical Output
- Numerical Variables: Discrete Integer Point Variables.
Categorical Output: Boolean Variables (dichotomous).
Algorithm Selection and Training
Once the data was preprocessed it was trained with the following model:
1. Scikit-Learn (testing accuracy: 70.3%)
- Model Deployment on Mia:
- Built on one of the following versions of scikit-learn with Python 3.7 or 3.5:
For our app we developed a Scikit-Learn model which was exported as follows:
Importing libraries in order to train our model:
We used Scikit-learn version 0.23.1:
Now you just have to upload your model. In this case a ‘model.joblib’ file.
Note: mia currently supports sci-kit learn models exported in the model.joblib or model.pkl format. Your model’s filename must exactly match one of these options. You can read about how to export models for predictions here.
How to export model.joblib in our model:
After finishing your model, you can upload it on Mia using model.joblib in any version of scikit-learn which is available on Mia.
For our app, we used one input, Slider for Percentile_Rank, Control of Corruption, Political Stability and Absence of Violence/Terrorism, Rule of Law.
Note: At least one input must be provided. These inputs must appear in the same order as was used to train the model. Currently, only one-hot and label encoding are supported for categorical features.
Note: The number of inputs (variables) in the model must equal the number of inputs on your app.
Now you can choose which type of prediction do you want according to your model,
We selected ‘Classification’ format of prediction. Our output is in the form of 1 or 0, we add 2 options “Your startup will be successful” result if true (above 50%), and “Your startup won’t be successful” result if false (below 50%).
You can test this app Here
Deploying your machine learning model has never been easy, especially if you are a novice and overwhelmed by so many technicalities! Mia lets you focus on building an effective model with higher accuracy by taking care of the interface, a few clicks and you have a working app that uses your machine learning model. Do you have a model that is waiting to be deployed, go ahead visit Mia and play around with the amazing features that it has to offer.