Deploying Azure AutoML AI Models on Bare Metal, On-Premises, or Custom Servers: A Tutorial with Flask, Gunicorn, Conda & Nginx

A Cost-Efficient Hybrid Approach for Custom Environments: Decoding Azure’s Abstract Layer

Ta-seen Junaid
17 min readMay 3, 2024

Deploying AI models efficiently requires establishing a strategic balance between utilizing current infrastructure, such as on-premises servers or specialized hardware, and exploiting the additional advance capabilities of cloud platforms such as Azure AutoML. A hybrid strategy allows enterprises to reduce costs, seamlessly integrate AI into current applications, and modify their infrastructure to meet specific performance, security, and compliance requirements. This strategy combines the best of both worlds, allowing enterprises to benefit from cloud innovation while maintaining control over vital infrastructure and data assets. Identifying the many abstract layers inside Azure AI models can be like solving a puzzle, making it a difficult but necessary task for efficient deployment in a hybrid infrastructure configuration.

Prerequisite

  1. Microsoft Azure Machine Learning
  2. Artificial Intelligence
  3. Software deployment

We suggest Microsoft Certified: Azure Data Scientist Associate course and Study guide for Exam DP-100 for the people who do not know about Microsoft Azure Machine Learning.

Why Azure AutoML

Azure AutoML is a sophisticated tool from Microsoft Azure that simplifies the process of creating and deploying machine learning models. It is intended to make machine learning more approachable to developers, data scientists, and business analysts by abstracting away much of the complexity associated with the machine learning process. Below is an overview of Azure AutoML’s core features and capabilities:

  1. Automated Model Selection: Azure AutoML chooses the appropriate machine learning model for your dataset based on a number of algorithms and architectures, such as decision trees, random forests, gradient boosting, and neural networks. This saves you time and effort while testing with various methods.
  2. Hyperparameter Optimization: It adjusts the hyperparameters of the chosen models to improve their performance. Hyperparameters are model-specific factors that influence the learning process, such as the learning rate, number of layers in a neural network, and tree depth in decision trees. Azure AutoML use techniques such as grid search, random search, and Bayesian optimization to rapidly search the hyperparameter space and identify the best configuration.
  3. Feature Engineering: Azure AutoML automates feature engineering, which is the process of choosing, manipulating, and producing features from raw data to improve model performance. It automates operations such as missing value imputation, categorical encoding, feature scaling, and feature selection, allowing you to concentrate on the problem domain rather than the complexities of data preprocessing.
  4. Model Interpretability: The service offers insights into how the model creates predictions, allowing you to better understand the factors that drive its judgments. This is critical for regulatory compliance, model debugging, and increasing confidence in the model’s predictions.
  5. Scalability and Performance: Azure AutoML is built on Microsoft Azure’s resilient infrastructure, allowing it to scale smoothly to handle huge datasets and compute-intensive workloads. It uses distributed computing resources to efficiently train models and make high-performance predictions.
  6. Integration with Azure Ecosystem: Azure AutoML integrates seamlessly with other Azure services, including Azure Machine Learning, Azure Databricks, Azure Synapse Analytics, and Azure DevOps, allowing you to harness the full power of the Azure ecosystem to create end-to-end machine learning solutions.
  7. Deployment Options: After training a model using Azure AutoML, you can simply deploy it to a variety of deployment destinations, such as Azure Kubernetes Service (AKS), Azure Container Instances (ACI), Azure IoT Edge, Azure Functions, or a web service endpoint. This allows you to grow your models and link them with your existing apps and workflows.

Overall, Azure AutoML simplifies the machine learning workflow, from data preparation and model training to deployment and monitoring, allowing enterprises to harness the power of AI in their products and services.

Why Hybrid Approach

  1. Cost-Efficient:

Utilizing Existing Infrastructure: By eliminating the need to invest in additional cloud resources, leveraging existing on-premises or custom servers can drastically lower the expenses involved with deploying AI models.
Optimized Resource usage: By consolidating numerous deployments on a single server instance, enterprises can improve resource usage and reduce idle capacity, resulting in increased cost efficiency.

2. Integration of Existing Applications:

Seamless Integration: A hybrid approach enables the seamless integration of AI models into current systems without requiring large architectural modifications or reliance on external cloud services.
Minimized Disruption: Integration with existing apps minimizes disturbance to workflows and processes, allowing for the smooth adoption of AI capabilities within the existing ecosystem.

3. Flexibility in Infrastructure:

Custom Infrastructure: Organizations may have specialized infrastructure needs, such as bare-metal servers or on-premises data centers, that are better suited to certain workloads or compliance requirements.
Tailored Solutions: With a hybrid approach, enterprises may tailor their infrastructure to meet specific performance, security, and regulatory needs while still benefiting from Azure AutoML’s advanced features.

4. Best of Two Worlds:

Azure AutoML Integration: Organizations may use Azure AutoML to train and develop models, taking advantage of its advanced capabilities.
On-Premises Deployment: By deploying models on-premises or on custom servers, enterprises can maintain control over their infrastructure while also leveraging Azure AutoML’s model development and optimization capabilities.

5. Data sovereignty and compliance:

Compliance Requirements: Certain businesses or locations have strict data sovereignty and compliance rules that require the deployment of AI models within certain geographical boundaries or on-premises infrastructure.
Mitigated concerns: By keeping sensitive data within their own infrastructure while employing Azure AutoML for model building and management, enterprises can reduce data privacy and compliance concerns.

6. Performance and latency considerations:

Low Latency Requirements: For applications that require low latency, putting AI models on-premises or closer to the data source can help reduce latency and improve overall performance.
High Throughput: By putting models on custom servers with high-throughput capabilities, companies may efficiently manage massive amounts of data while maintaining optimal performance for real-time inference.

7. Hybrid Cloud Strategy:

Strategic Flexibility: A hybrid cloud strategy enables enterprises to balance the benefits of cloud scalability and innovation with the management and security of on-premises infrastructure, resulting in greater flexibility and agility when deploying AI solutions.

Architecture

Before diving into a hybrid architecture, it’s critical to understand the fundamental structure of Microsoft Azure Machine Learning. The Compute instance is at the heart of Azure Machine Learning, serving as a single center for functions like executing codes, scripts, and instructions to manage data, compute clusters, registries, and deployments.

Architecture for Microsoft Azure Machine Learning

Data is originally saved in a persistent store. We can obtain data from this store using a variety of techniques, including cloud flows, SQL, manual processes, and URLs, resulting in datasets. These datasets give an extra layer of abstraction and metadata to raw data, allowing machines to comprehend connection setups, data types, source formats, headers, and more. Complex processes and heavy lifting operations like data preprocessing and training are carried out within the compute cluster, utilizing its powerful GPU/CPU resources. Given the vast number of techniques, experiments, models, algorithms, artifacts, and versions, it is critical to keep a versioned repository. As a result, a model registry is used to store important models along with their codes, scripts, artifacts, and versions.

To provide platform independence and unique shipment across several Azure services; the AI model, Azure-managed code, scripts, versions, and runtime environments are all encapsulated in a container registry. This allows for smoother deployment operations. Finally, AI models are deployed to Azure’s managed services, such as Azure Kubernetes Service (AKS), Azure Container Instances (ACI), Azure IoT Edge, Azure Functions, and web service endpoints. Applications use these endpoints to retrieve results.

Hybrid architecture

Alternatively, our hybrid technique replicates the deployment of an AI model using Flask, Gunicorn, and Nginx. The major distinction is customizing the runtime environment to handle the many abstract layers included in the Azure AutoML model and running our code including those abstract layers.

To deploy a machine learning model with Flask, Gunicorn, and Nginx, we first create a Flask application to serve our model predictions. Flask is a lightweight Python web framework that is perfect for creating web applications. We will create routes in your Flask application to handle incoming requests and return predictions created by your machine learning model. To serve our Flask application, we’ll use Gunicorn, a Python WSGI HTTP server. Gunicorn efficiently manages several concurrent requests, ensuring that your application’s traffic flows smoothly.

Finally, Nginx, a high-performance web server and reverse proxy, can be used to direct incoming traffic to Gunicorn. Nginx also includes load balancing, caching, and security upgrades, making it a good option for building production-grade server. We may build a strong and scalable deployment strategy for our machine learning models by combining Flask for application logic, Gunicorn for processing Python web requests, and Nginx for efficient routing and scalability.

To keep the main topic simple and easy to understand, we have opted not to incorporate pipelines, automation, or other advanced features in this tutorial. Additionally, we provide minimum runnable code snippets to facilitate quick implementation and understanding. Our server is running with Linux (ubuntu 20.04) and we also prefer that the code is shipped with it”s runtime environment like containerization.

Experimental Lab Setup

We assume you already setup your Microsoft Azure Machine Learning Studio. We suggest Microsoft Certified: Azure Data Scientist Associate course and Study guide for Exam DP-100 for the people who do not setup Microsoft Azure Machine Learning Studio.

To clone the repository, run this command from your compute instance command line.

git clone https://github.com/Ta-SeenJunaid/Azure-AutoML-Custom-Deployment.git 
Cloning the code repository

Refresh your directory to visualize the change.

Data Pre Processing & Training a classification model with Automated Machine Learning

Go to Azure-AutoML-Custom-Deployment folder and run ‘Classification with Automated Machine Learning.ipynb’. Code, explanation and instruction for Azure AutoML is in that file. Running the code is the same
standard process as running IPython Notebook. The IPython Notebook is now known as the Jupyter Notebook. It is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots and rich media.

Download the Best Model and Related Artifacts

Go to Azure-AutoML-Custom-Deployment folder and run ‘Download the Best Model and Related Artifacts.ipynb’. Code, explanation and instruction are given in that file. After running this file, go to the directory Azure-AutoML-Custom-Deployment/artifact_downloads/outputs. Make sure you reload the directory. You’ll find our AI model stored as a pickle file titled model.pkl. You’ll also find the scoring script and the Conda environment file. The scoring script is important since it offers instructions for using the model within code, whereas the conda environment file is required to set up the runtime environment.

Transfer the Code into Your Server

Transfer cloud local folder to a remote server through SSH, you can use the scp (secure copy) command. We recommend to transfer the whole Azure-AutoML-Custom-Deployment directory to the remote server.

Here’s the command using SSH key authentication:

scp -i /path/to/your/key.pem -r /path/to/local/folder username@remote_host:/path/to/destination/folder

And here’s an alternative command using username and password authentication:

scp -r /path/to/local/folder username@remote_host:/path/to/destination/folder

Put the value according to your setup.

For me, the command is:

scp -r /home/azureuser/cloudfiles/code/Users/taseen.junaid/Azure-AutoML-Custom-Deployment tj@20.79.154.74:/home/tj

From now, we do not have any further interaction with Azure Machine Learning.

SSH to your server

SSH, also known as Secure Shell or Secure Socket Shell, is a network protocol that gives users, particularly system administrators, a secure way to access a computer over an unsecured network. Here’s the command using SSH key authentication:

ssh -i /path/to/private_key.pem username@remote_host

Here’s an alternative command using username and password authentication:

ssh username@remote_host

Put the value according to your setup.

For us the command is:

ssh tj@20.79.154.74

After that we will find your project folder into your remote server through ls command.

Remote server through SSH

When “tj@tj” appears in the terminal, it signifies that the command is being executed on our remote server.

Setting up the Runtime Environment

We are using the conda environment because the Azure AutoML model is created with conda. To install miniconda into your server we recommended to follow their official installation guide.

Here are the commands we use:

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh

After installing, initialize your newly-installed Miniconda. The following commands initialize for bash.

~/miniconda3/bin/conda init bash

See ‘conda init — help’ for more information and options.

Reload bashrc with follwoing command “source ~/.bashrc” if necessary.

Go to project directory.

cd Azure-AutoML-Custom-Deployment/

After that create conda environment by using conda.yaml file.

conda env create -f conda.yaml
Conda environment creation

Finally activate the conda environment named as ‘conda-env’ with following command:

conda activate conda-env

Run app.py code locally to see it is working properly with the environment.

python3 app.py  

Invoke the local endpoint with curl to see it is working properly.

curl -X POST http://localhost:5000/prediction -H 'Content-Type: application/json' -d '{"PatientID": 1020531, "Pregnancies": 3, "PlasmaGlucose": 125, "DiastolicBloodPressure": 82, "TricepsThickness": 23, "SerumInsulin": 112, "BMI": 34.95472243, "DiabetesPedigree": 0.204847272, "Age": 46}'
Activating conda environment, running app.py and tested it with curl

Now close the local server by pressing ctrl+c into terminal.

Gunicorn WSGI (Web Server Gateway Interface) Setup

Gunicorn, or “Green Unicorn,” is a popular WSGI HTTP server for Python web applications. Python web applications, such as Flask, Django, and Pyramid, are typically deployed using this method. Gunicorn is a popular reverse proxy that works well with Nginx or other web servers due to its simplicity, reliability, and performance. Its rapid handling of multiple concurrent requests makes it a popular choice for deploying production-grade web applications.

We now instruct the Gunicorn server on how to connect with the application via wsgi.py code, which imports the Flask instance from our application. Before proceeding, ensure that Gunicorn can serve the application correctly. We can accomplish this by specifying the name of the application’s entry point. This is made up of the module name (without the .py extension) and the name of the application’s callable. In this case, it’s wsgi:app. Specify the interface and port to bind to with the 0.0.0.0:5000 option so that we can test the application.

gunicorn --bind 0.0.0.0:5000 wsgi:app

Invoke the local endpoint with curl to see it is working properly.

curl -X POST http://0.0.0.0:5000/prediction -H 'Content-Type: application/json' -d '{"PatientID": 1020531, "Pregnancies": 3, "PlasmaGlucose": 125, "DiastolicBloodPressure": 82, "TricepsThickness": 23, "SerumInsulin": 112, "BMI": 34.95472243, "DiabetesPedigree": 0.204847272, "Age": 46}'
Gunicorn is working properly and tested it with curl

Now close the local server by pressing ctrl+c into terminal. Deactivate the conda environment with following command:

conda deactivate

Next, create a systemd service unit file. Creating a systemd unit file enables Ubuntu’s init system to automatically restart Gunicorn and serve the Flask application upon server boot.
Create a unit file ending in .service within the /etc/systemd/system directory:

sudo nano /etc/systemd/system/automl_deployment.service

Put similar content inside this file, change the content according to your setup:

[Unit]
Description=Gunicorn instance to serve automl_deployment
After=network.target

[Service]
User=tj
Group=www-data
WorkingDirectory=/home/tj/Azure-AutoML-Custom-Deployment
Environment="PATH=/home/tj/miniconda3/envs/conda-env/bin"
ExecStart=/home/tj/miniconda3/envs/conda-env/bin/gunicorn --workers 3 --bind unix:app_sock.sock -m 007 wsgi:app

[Install]
WantedBy=multi-user.target

Let’s break down each section:

[Unit]

  • Description: Provides a description of the service, in this case, it’s a Gunicorn instance to serve an Azure AutoML deployment.
  • After: Specifies that this service should start after the network is up.

[Service]

  • User: Specifies the user under which the service will run, in this case, “tj”.
  • Group: Specifies the group under which the service will run, typically used for permission settings, in this case, “www-data”.
  • WorkingDirectory: Sets the working directory for the service, where it will execute commands.
  • Environment: Defines environment variables for the service, setting the PATH variable to include the location of a specific Conda environment.
  • ExecStart: Specifies the command to start the service. Here, it’s starting Gunicorn with specific configurations:
  • --workers 3: Specifies the number of worker processes Gunicorn should use.
  • --bind unix:app_sock.sock: Binds Gunicorn to a Unix socket file named "app_sock.sock".
  • -m 007: Sets the file permissions for the socket file to 007.
  • wsgi:app: Specifies the entry point for the WSGI application, which typically resides in a file named "wsgi.py".

[Install]

  • WantedBy: Specifies the target multi-user environment where this service should be enabled.

After that run the following commands and check the status.

sudo systemctl daemon-reload
sudo systemctl start automl_deployment
sudo systemctl enable automl_deployment
sudo systemctl status automl_deployment
Service is running properly

Give this command according to your setup.

sudo chmod 775 /home/tj/Azure-AutoML-Custom-Deployment

This command gives the owner and the group associated with the directory /home/tj/Azure-AutoML-Custom-Deployment full permissions (read, write, and execute), while granting read and execute permissions to all other users.

If you want to access your Gunicorn server directly from outside without using Nginx, you can modify your systemd service file accordingly. You only need to change ExecStart line like following “ExecStart=/home/tj/miniconda3/envs/conda-env/bin/gunicorn — workers 3 — bind 0.0.0.0:5000 -m 007 wsgi:app”. If your Firewall is open for port 5000 you can use the API directly likewise following curl API call “ curl -X POST http://your_ip:5000/prediction -H ‘Content-Type: application/json’ -d ‘{“PatientID”: 1020531, “Pregnancies”: 3, “PlasmaGlucose”: 125, “DiastolicBloodPressure”: 82, “TricepsThickness”: 23, “SerumInsulin”: 112, “BMI”: 34.95472243, “DiabetesPedigree”: 0.204847272, “Age”: 46}’

Nginx Reverse Proxy Server Setup

Nginx is a high-performance, open-source web server. It excels at managing concurrent connections and providing fast static content delivery. Nginx is commonly used as a reverse proxy, load balancer, and HTTP cache. This technology is commonly utilized in commercial situations to enhance web application performance, dependability, and security. Nginx’s lightweight design, scalability, and flexibility make it a popular choice for hosting websites and online applications.

1st we have to install Nginx with following command:

sudo apt update
sudo apt install nginx

After that we have to adjust the Firewall. To see the application that ufw knows run this command and you will see Nginx.

sudo ufw app list

‘Nginx Full’ will open both port 80 (normal, unencrypted web traffic) and port 443 (TLS/SSL encrypted traffic) for communication. Run the following command to enable this:

sudo ufw allow 'Nginx FULL'
sudo ufw allow ssh

If your Firewall is not enabled, you can enable this with:

sudo ufw enable

Verify the ufw status with following command:

sudo ufw status

To check that the service related to Nginx is running properly, run the following command:

systemctl status nginx
Firewall setup and Nginx status

If you visit http://your_server_ip (like http://20.79.154.74)you will see Nginx default landing page.

Nginx default page

Now we have to create a server block for our application beside the default landing page. Navigate to the Nginx configuration directory, usually located at /etc/nginx/sites-available/. Create a new configuration file for your API, we named it automl_deployment.

sudo nano /etc/nginx/sites-available/automl_deployment

Put following content, edit it according to your configuration:

server {
listen 80;
server_name taseenjunaid.com www.taseenjunaid.com;

location /prediction{
include proxy_params;
proxy_pass http://unix:/home/tj/Azure-AutoML-Custom-Deployment/app_sock.sock;
}
}

Explanation of the configuration:

  • server: This block defines a virtual server in Nginx.
  • listen 80;: Specifies that Nginx should listen for incoming connections on port 80, the default port for HTTP traffic.
  • server_name taseenjunaid.com www.taseenjunaid.com;: Specifies the domain names for which this server block will be used. Requests coming to taseenjunaid.com or www.taseenjunaid.com will be handled by this server block.
  • location /prediction { ... }: This block defines how Nginx should handle requests that match the specified location pattern, in this case, requests with a URI starting with /prediction which we implemented at app.py code.
  • include proxy_params;: Includes additional configuration parameters for the proxy. This is typically used to ensure that certain headers and settings are passed along correctly.
  • proxy_pass http://unix:/home/tj/Azure-AutoML-Custom-Deployment/app_sock.sock;: Specifies that incoming requests should be proxied to the Unix socket located at /home/tj/Azure-AutoML-Custom-Deployment/app_sock.sock. This directive tells Nginx to forward requests to the specified upstream server. Here, it’s using a Unix socket (unix:) instead of a traditional IP address and port combination. Unix sockets are a method of inter-process communication (IPC) used for communication between processes on the same host. In this case, it’s pointing to the Unix socket file /home/tj/Azure-AutoML-Custom-Deployment/app_sock.sock, where your Gunicorn server is listening for requests.

So, when a request comes in for a URI starting with /prediction, Nginx will pass that request to the upstream server defined by proxy_pass, which in turn will forward it to your Gunicorn server via the specified Unix socket file.

Next we have to enable this server block. We need to create a symbolic link to it in the sites-enabled directory to enable it:

sudo ln -s /etc/nginx/sites-available/automl_deployment /etc/nginx/sites-enabled/

Before you restart Nginx, it’s a good idea to test your configuration file for syntax errors:

sudo nginx -t

If the test is successful, restart Nginx to apply the changes:

sudo systemctl restart nginx

Now your reverse proxy server is up and running and you can test it with following curl command.

curl -X POST http://taseenjunaid.com/prediction -H 'Content-Type: application/json' -d '{"PatientID": 1020531, "Pregnancies": 3, "PlasmaGlucose": 125, "DiastolicBloodPressure": 82, "TricepsThickness": 23, "SerumInsulin": 112, "BMI": 34.95472243, "DiabetesPedigree": 0.204847272, "Age": 46}'
Invoking reverse proxy server without SSL

If you face any issue, you can check the error log.

sudo tail /var/log/nginx/error.log

If you get ‘502 Bad Gateway’ error, it’s a permission denied issue when it try to access the socket file. You can fix it with following command, change it according to your settings:

sudo chmod 775 /home/tj/

Now If you want you can secure your API through SSL. Because it encrypts data transferred between users’ browsers and web servers, SSL (Secure Sockets Layer) is essential for safeguarding online communication because it prevents illegal access and tampering. It guarantees privacy, authenticity, and data integrity — all of which are essential for safeguarding private data like credit card numbers and login credentials. By confirming the legitimacy of websites, SSL certificates from reliable sources promote user confidence and counter phishing efforts. SSL also improves search engine rankings and aids with an organization’s compliance with PCI DSS and other regulatory standards. All things considered, SSL is essential for preserving compliance, building trust, and protecting data in the digital sphere.

If you do not have SSL certificate, you can use OpenSSL to generate a self-signed SSL certificate and key:

sudo mkdir -p /etc/nginx/ssl/
sudo openssl req -x509 -nodes -newkey rsa:4096 -keyout /etc/nginx/ssl/self.key -out /etc/nginx/ssl/self.crt -days 365

Ensure that the SSL certificate and key files are readable by the Nginx user (www-data in this case). Run:

sudo chown root:www-data /etc/nginx/ssl/self.crt
sudo chown root:www-data /etc/nginx/ssl/self.key
sudo chmod 775 /etc/nginx/ssl/self.crt
sudo chmod 775 /etc/nginx/ssl/self.key

Edit etc/nginx/sites-available/automl_deployment according to your configuration.

sudo nano /etc/nginx/sites-available/automl_deployment

Put following content according to your configuration.

server {
listen 80;
server_name taseenjunaid.com www.taseenjunaid.com;
return 301 https://$server_name$request_uri;
}

server {
listen 443 ssl;
server_name taseenjunaid.com www.taseenjunaid.com;

ssl_certificate /etc/nginx/ssl/self.crt;
ssl_certificate_key /etc/nginx/ssl/self.key;

location /prediction {
include proxy_params;
proxy_pass http://unix:/home/tj/Azure-AutoML-Custom-Deployment/app_sock.sock;
}
}

In this configuration:

  • The first server block listens on port 80 and handles HTTP requests. It redirects all HTTP traffic to the HTTPS version of your site using a 301 redirect.
  • The second server block listens on port 443 and handles HTTPS requests. It serves your site over HTTPS using the SSL certificate and key specified.

Before you restart Nginx, it’s a good idea to test your configuration file for syntax errors:

sudo nginx -t

If the test is successful, restart Nginx to apply the changes:

sudo systemctl restart nginx

Now your reverse proxy server is up and running with SSL and you can test it with following curl command. The -k option is included in the curl command to allow insecure SSL connections, which may be necessary for systems with self-signed certificates, or certificates without root trust or other SSL issues.

curl -X POST https://taseenjunaid.com/prediction -H 'Content-Type: application/json' -d '{"PatientID": 1020531, "Pregnancies": 3, "PlasmaGlucose": 125, "DiastolicBloodPressure": 82, "TricepsThickness": 23, "SerumInsulin": 112, "BMI": 34.95472243, "DiabetesPedigree": 0.204847272, "Age": 46}' -k
Invoking reverse proxy server with SSL

Now It’s time for you to add authentication, incorporate pipelines, automation, containerization or other advanced features related to your setup.

We hope this tutorial is helpful for you, we are looking for your valuable feedback. We appreciate you taking the time to read our writing.

--

--