Securing an AKS deployment with Traefik

The vignette “Deploying a prediction service with Plumber” showed you how to deploy a simple prediction service to a Kubernetes cluster in Azure. That deployment had a number of drawbacks: chiefly, it was open to the public, and used HTTP and thus was vulnerable to man-in-the-middle (MITM) attacks. Here, we’ll show how to address these issues with basic authentication and HTTPS. To run the code in this vignette, you’ll need the helm binary installed on your machine in addition to docker and kubectl.

Model object and image

We’ll reuse the image from the Plumber vignette. The code and artifacts to build this image are reproduced below for convenience.

Save the model object:

data(Boston, package="MASS")
library(randomForest)

# train a model for median house price as a function of the other variables
bos_rf <- randomForest(medv ~ ., data=Boston, ntree=100)

# save the model
saveRDS(bos_rf, "bos_rf.rds")

Scoring script:

bos_rf <- readRDS("bos_rf.rds")
library(randomForest)

#* @param df data frame of variables
#* @post /score
function(req, df)
{
    df <- as.data.frame(df)
    predict(bos_rf, df)
}

Dockerfile:

FROM trestletech/plumber

# install the randomForest package
RUN R -e 'install.packages(c("randomForest"))'

# copy model and scoring script
RUN mkdir /data
COPY bos_rf.rds /data
COPY bos_rf_score.R /data
WORKDIR /data

# plumb and run server
EXPOSE 8000
ENTRYPOINT ["R", "-e", \
    "pr <- plumber::plumb('/data/bos_rf_score.R'); pr$run(host='0.0.0.0', port=8000)"]

Build and upload the image to ACR:

library(AzureContainers)

# create a resource group for our deployments
deployresgrp <- AzureRMR::get_azure_login()$
    get_subscription("sub_id")$
    create_resource_group("deployresgrp", location="australiaeast")

# create container registry
deployreg_svc <- deployresgrp$create_acr("deployreg")

# build image 'bos_rf'
call_docker("build -t bos_rf .")

# upload the image to Azure
deployreg <- deployreg_svc$get_docker_registry()
deployreg$push("bos_rf")

Deploy to AKS

Create a new AKS resource for this deployment:

# create a Kubernetes cluster
deployclus_svc <- deployresgrp$create_aks("secureclus", agent_pools=agent_pool("pool1", 2))

# grant the cluster pull access to the registry
deployreg_svc$add_role_assignment(deployclus_svc, "Acrpull")

Install traefik

The middleware we’ll use to enable HTTPS and authentication is Traefik. This features built-in integration with Let’s Encrypt, which is a popular source for TLS certificates (necessary for HTTPS). An alternative to Traefik is Nginx.

Deploying traefik and enabling HTTPS involves the following steps, on top of deploying the model service itself:

The following yaml contains the configuration parameters for traefik. Insert your email address where indicated (it will be used to obtain a certificate from Let’s Encrypt), and save this as traefik_values.yaml.

# traefik_values.yaml
fullnameOverride: traefik
replicas: 1
resources:
  limits:
    cpu: 500m
    memory: 500Mi
  requests:
    cpu: 100m
    memory: 200Mi
externalTrafficPolicy: Local
kubernetes:
  ingressClass: traefik
  ingressEndpoint:
    useDefaultPublishedService: true
dashboard:
  enabled: false
debug:
  enabled: false
accessLogs:
  enabled: true
  fields:
    defaultMode: keep
    headers:
      defaultMode: keep
      names:
        Authorization: redact
rbac:
  enabled: true
acme:
  enabled: true
  email: "email.addr@example.com"  # insert your email address here
  staging: false
  challengeType: tls-alpn-01
ssl:
  enabled: true
  enforced: true
  permanentRedirect: true
  tlsMinVersion: VersionTLS12
  cipherSuites:
    - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
    - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
    - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
    - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
    - TLS_RSA_WITH_AES_128_GCM_SHA256
    - TLS_RSA_WITH_AES_256_GCM_SHA384
    - TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256

We can now install traefik:

# get the cluster endpoint
deployclus <- deployclus_svc$get_cluster()

deployclus$helm("repo add stable https://kubernetes-charts.storage.googleapis.com/")
deployclus$helm("repo update")
deployclus$helm("install traefik-ingress stable/traefik --namespace kube-system --values traefik_values.yaml")

Assign domain name

Installing an ingress controller will create a public IP resource, which is the visible address for the cluster. It takes a minute or two for this resource to be created; once it appears, we assign a domain name to it.

for(i in 1:100)
{
    res <- read.table(text=deployclus$get("service", "--all-namespaces")$stdout, header=TRUE, stringsAsFactors=FALSE)
    ip <- res$EXTERNAL.IP[res$NAME == "traefik"]
    if(ip != "<pending>")
        break
    Sys.sleep(10)
}
if(ip == "<pending>") stop("Ingress public IP not assigned")

cluster_resources <- deployclus_svc$list_cluster_resources()
found <- FALSE
for(res in cluster_resources)
{
    if(res$type != "Microsoft.Network/publicIPAddresses")
        next
    res$sync_fields()
    if(res$properties$ipAddress == ip)
    {
        found <- TRUE
        break
    }
}
if(!found) stop("Ingress public IP resource not found")

# assign domain name to IP address
res$do_operation(
    body=list(
        location=ip$location,
        properties=list(
            dnsSettings=list(domainNameLabel="secureclus"),
            publicIPAllocationMethod=ip$properties$publicIPAllocationMethod
        )
    ),
    http_verb="PUT"
)

Generate an auth file

We’ll use Traefik to handle authentication details. While some packages like RestRserve can also authenticate users, a key benefit of assigning this to the ingress controller is that it frees our container to focus purely on the task of returning predicted values. For example, if we want to switch to a different kind of model, we can do so without having to copy over any authentication code.

Using basic auth with Traefik requires an authentication file, in Apache .htpasswd format. Here is R code to create such a file. This is essentially the same code that is used in the “Deploying an ACI service with HTTPS and authentication” vignette.

library(bcrypt)

user_list <- list(
    c("user1", "password1")
)
user_str <- sapply(user_list, function(x) paste(x[1], hashpw(x[2]), sep=":"))
writeLines(user_str, "auth")

Deploy the service

The yaml for the deployment and service is the same as in the original Plumber vignette. The main difference is that we now create a separate bos-rf namespace for our objects, which is the recommended practice: it allows us to manage the service independently of any others that may exist on the cluster.

Save the following as secure_deploy.yaml:

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: bos-rf
  namespace: bos-rf
spec:
  selector:
    matchLabels:
      app: bos-rf
  replicas: 1
  template:
    metadata:
      labels:
        app: bos-rf
    spec:
      containers:
      - name: bos-rf
        image: deployreg.azurecr.io/bos_rf # insert URI for your container registry/image
        ports:
        - containerPort: 8000
        resources:
          requests:
            cpu: 250m
          limits:
            cpu: 500m
---
apiVersion: v1
kind: Service
metadata:
  name: bos-rf-svc
  namespace: bos-rf
spec:
  selector:
    app: bos-rf
  ports:
  - protocol: TCP
    port: 8000

The yaml to create the ingress route is below. Save this as secure_ingress.yaml.

# ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: bos-rf-ingress
  namespace: bos-rf
  annotations:
    kubernetes.io/ingress.class: traefik
    ingress.kubernetes.io/auth-type: basic
    ingress.kubernetes.io/auth-secret: bos-rf-secret
spec:
  rules:
  - host: secureclus.australiaeast.cloudapp.azure.com
    http:
      paths:
      - path: /
        backend:
          serviceName: bos-rf-svc
          servicePort: 8000

We can now create the deployment, service, and ingress route:

deployclus$kubectl("create namespace bos-rf")
deployclus$kubectl("create secret generic bos-rf-secret --from-file=auth --namespace bos-rf")
deployclus$create("secure_deploy.yaml")
deployclus$apply("secure_ingress.yaml")

Call the service

Once the deployment is complete, we can obtain predicted values from it. Notice that we use the default HTTPS port for the request (port 443). While port 8000 is nominally exposed by the service, this is visible only within the cluster; it is connected to the external port 443 by Traefik.

response <- httr::POST("https://secureclus.australiaeast.cloudapp.azure.com/score",
    httr::authenticate("user1", "password1"),
    body=list(df=MASS::Boston[1:10,]), encode="json")
httr::content(response, simplifyVector=TRUE)
#> [1] 25.9269 22.0636 34.1876 33.7737 34.8081 27.6394 21.8007 22.3577 16.7812 18.9785

Further comments

In this vignette, we’ve secured the predictive service with a single username and password. While we could add more users to the authentication file, a more flexible and scalable solution is to use a frontend service, such as Azure API Management, or a directory service like Azure Active Directory. The specifics will typically be determined by your organisation’s IT infrastructure, and are beyond the scope of this vignette.