This tutorial demonstrates how to deploy a model as a web service on Azure Kubernetes Service (AKS). AKS is good for high-scale production deployments; use it if you need one or more of the following capabilities:
You will learn to:
If you don’t have access to an Azure ML workspace, follow the setup tutorial to configure and create a workspace.
Start by setting up your environment. This includes importing the azuremlsdk package and connecting to your workspace.
Instantiate a workspace object from your existing workspace. The following code will load the workspace details from a config.json file if you previously wrote one out with write_workspace_config()
.
Or, you can retrieve a workspace by directly specifying your workspace details:
In this tutorial we will deploy a model that was trained in one of the samples. The model was trained with the Iris dataset and can be used to determine if a flower is one of three Iris flower species (setosa, versicolor, virginica). We have provided the model file (model.rds
) for the tutorial; it is located in the deploy-to-aks
subfolder of this vignette.
First, register the model to your workspace with register_model()
. A registered model can be any collection of files, but in this case the R model file is sufficient. Azure ML will use the registered model for deployment.
When deploying a web service to AKS, you deploy to an AKS cluster that is connected to your workspace. There are two ways to connect an AKS cluster to your workspace:
attach_aks_compute()
method.Creating or attaching an AKS cluster is a one-time process for your workspace. You can reuse this cluster for multiple deployments. If you delete the cluster or the resource group that contains it, you must create a new cluster the next time you need to deploy.
In this tutorial, we will go with the first method of provisioning a new cluster. See the create_aks_compute()
reference for the full set of configurable parameters. If you pick custom values for the agent_count
and vm_size
parameters, you need to make sure agent_count
multiplied by vm_size
is greater than or equal to 12
virtual CPUs.
aks_target <- create_aks_compute(ws, cluster_name = 'myakscluster')
wait_for_provisioning_completion(aks_target, show_output = TRUE)
The Azure ML SDK does not provide support for scaling an AKS cluster. To scale the nodes in the cluster, use the UI for your AKS cluster in the Azure portal. You can only change the node count, not the VM size of the cluster.
To deploy a model, you need an inference configuration, which describes the environment needed to host the model and web service. To create an inference config, you will first need a scoring script and an Azure ML environment.
The scoring script (entry_script
) is an R script that will take as input variable values (in JSON format) and output a prediction from your model. For this tutorial, use the provided scoring file score.R
. The scoring script must contain an init()
method that loads your model and returns a function that uses the model to make a prediction based on the input data. See the documentation for more details.
Next, define an Azure ML environment for your script’s package dependencies. With an environment, you specify R packages (from CRAN or elsewhere) that are needed for your script to run. You can also provide the values of environment variables that your script can reference to modify its behavior.
By default Azure ML will build a default Docker image that includes R, the Azure ML SDK, and additional required dependencies for deployment. See the documentation here for the full list of dependencies that will be installed in the default container. You can also specify additional packages to be installed at runtime, or even a custom Docker image to be used instead of the base image that will be built, using the other available parameters to r_environment()
.
Now you have everything you need to create an inference config for encapsulating your scoring script and environment dependencies.
Now, define the deployment configuration that describes the compute resources needed, for example, the number of cores and memory. See the aks_webservice_deployment_config()
for the full set of configurable parameters.
Now, deploy your model as a web service to the AKS cluster you created earlier.
aks_service <- deploy_model(ws,
'my-new-aksservice',
models = list(model),
inference_config = inference_config,
deployment_config = aks_config,
deployment_target = aks_target)
wait_for_deployment(aks_service, show_output = TRUE)
To inspect the logs from the deployment:
If you encounter any issue in deploying the web service, please visit the troubleshooting guide.
Now that your model is deployed as a service, you can test the service from R using invoke_webservice()
. Provide a new set of data to predict from, convert it to JSON, and send it to the service.
library(jsonlite)
# versicolor
plant <- data.frame(Sepal.Length = 6.4,
Sepal.Width = 2.8,
Petal.Length = 4.6,
Petal.Width = 1.8)
# setosa
# plant <- data.frame(Sepal.Length = 5.1,
# Sepal.Width = 3.5,
# Petal.Length = 1.4,
# Petal.Width = 0.2)
# virginica
# plant <- data.frame(Sepal.Length = 6.7,
# Sepal.Width = 3.3,
# Petal.Length = 5.2,
# Petal.Width = 2.3)
predicted_val <- invoke_webservice(aks_service, toJSON(plant))
message(predicted_val)
You can also get the web service’s HTTP endpoint, which accepts REST client calls. You can share this endpoint with anyone who wants to test the web service or integrate it into an application.
When deploying to AKS, key-based authentication is enabled by default. You can also enable token-based authentication. Token-based authentication requires clients to use an Azure Active Directory account to request an authentication token, which is used to make requests to the deployed service.
To disable key-based auth, set the auth_enabled = FALSE
parameter when creating the deployment configuration with aks_webservice_deployment_config()
. To enable token-based auth, set token_auth_enabled = TRUE
when creating the deployment config.
If key authentication is enabled, you can use the get_webservice_keys()
method to retrieve a primary and secondary authentication key. To generate a new key, use generate_new_webservice_key()
.
If token authentication is enabled, you can use the get_webservice_token()
method to retrieve a JWT token and that token’s expiration time. Make sure to request a new token after the token’s expiration time.