Azure reference architecture

This document details best practices and a reference architecture for Tailscale deployments on Microsoft Azure. The following guidance applies for all Tailscale modes of operation—devices, exit nodes, subnet routers, etc.

Terminology

  • Tailscale device—for the purposes of this document Tailscale device can refer to a Tailscale node, exit node, subnet router, etc.

See Terminology and concepts for additional terms.

High-level architecture

Potential deployments of Tailscale to access Azure resources

Ways to deploy Tailscale to connect to and from Azure resources

Tailscale provides a few options for connecting to resources within Azure. At a high-level they are:

Agent-to-Agent connectivity

We recommend installing the Tailscale agent wherever possible—for example, setting up servers on Azure VMs. This generally provides the best and most scalable connectivity while enabling Tailscale agent-based functionality such as Tailscale SSH.

See Access Azure Linux VMs privately using Tailscale and Access Azure Windows VMs privately using Tailscale for general guides.

IP-based connectivity with subnet router

For managed resources where you cannot install the Tailscale agent, such as Azure SQL, Azure Cosmos DB, etc., you can run a subnet router within your virtual network to access these resources from Tailscale.

DNS-based routing with an app connector

App connectors allow you to route traffic bound for SaaS applications or managed services by proxying DNS for the target domains and advertising the subnet routes for the observed DNS results. This is useful for cases where the application has an allowlist of IP addresses which can connect to it; the IP addresses of the nodes running an app connector can be added to the allowlist, and all nodes on the tailnet will use that IP address for their traffic egress.

Kubernetes services and API server proxy with Tailscale Kubernetes operator

The Tailscale Kubernetes operator allows you to expose services in your Kubernetes cluster to your Tailscale network, and use an API server proxy for secure connectivity to the Kubernetes control plane.

Azure Functions, Azure App Service, and other container solutions

Tailscale supports userspace networking where processes in the container can connect to other resources on your Tailscale network via a SOCKS5 or HTTP proxy. This allows Azure Functions, Azure App Service, and other container-based solutions to connect to the Tailscale network with minimal configuration needed.

See Using Tailscale on Azure App Service for a general guide.

Production best practices

Below are general recommendations and best practices for running Tailscale in production environments. Much of what is listed below is explained in greater detail throughout this document:

  • When possible deploy subnet routers, exit nodes, app connectors, etc., to public subnets with public IP addresses to ensure direct connections and optimal performance.
  • Run subnet routers, exit nodes, app connectors, etc., separately from the systems you are administering with Tailscale—for example, run your subnet routers outside of your Azure AKS clusters.
  • Deploy dynamically scaled resources (for example, containers, serverless functions, etc.) as ephemeral nodes to automatically clean up devices after they shut down.

High availability and regional routing

Performance best practices

See Performance best practices for general recommendations.

In-region load balancing

Deploy multiple overlapping connectors within a DERP region to take advantage of in-region load balancing to evenly spread load across the connectors and provide in-region redundancy.

Normal usage

When installing Tailscale on an Azure VM as a “normal” Tailscale device (for example, not a subnet router, exit node, etc.), you likely have already sized that VM to a suitable type for its workload and running Tailscale on it will likely add negligible resource usage.

Subnet routers, exit nodes, and app connectors

There are many variables that affect performance and workloads vary widely so we do not have specific size recommendations, but we do have general guidance for selecting a VM size for an Azure VM running as a subnet router, exit node, or app connector:

  • In general, higher CPU clock speed is more important than more cores.
  • In general, VMs with Ampere Altra Arm–based processors processors are quite cost effective for packet forwarding
  • Use a non-burstable VM type to achieve consistent CPU performance.
    • Per Azure documentation, burstable performance machine sizes (such as B-series VMs) use a CPU credit mechanism which can result in variable performance.
  • Tailscale will generally perform better on Linux than other operating systems due to additional optimizations that have been implemented.

Using Tailscale with Azure

Network security groups

Tailscale uses various NAT traversal techniques to safely connect to other Tailscale nodes without manual intervention. Nearly all of the time, you do not need to open any firewall ports for Tailscale. However, if your virtual network and network security groups are overly restrictive about internet-bound egress traffic, refer to What firewall ports should I open to use Tailscale.

Public vs private subnets

Tailscale devices deployed to a public subnet with a public IP address will benefit from direct connections between nodes for the best performance.

Azure NAT Gateway

Tailscale uses both direct and relayed connections, opting for direct connections where possible. Azure NAT Gateway is known to impede direct connections causing connections to use Tailscale DERP relay servers. This does not cause connectivity issues, but can lead to lower throughput and performance than direct connections.

If you must deploy Tailscale such that internet-bound connections go through a Azure NAT Gateway (for example, to reuse existing IP addresses that are allow-listed to third parties), contact your Tailscale account team to discuss more advanced deployment options that utilize public and private subnet routing on a single Azure VM.

Virtual network DNS resolution

To allow non-Azure devices on your tailnet to query Azure DNS private zones, create a Azure DNS private resolver for your virtual network and configure split DNS to forward queries for your virtual network’s internal domain name suffix to the IP address of the DNS private resolver’s inbound endpoint.

Virtual network peering

Virtual network peering is a common strategy for connecting multiple virtual networks together. You can deploy a subnet router (or a set for high availability) within a virtual network to allow access to multiple virtual networks.

If you have virtual networks or subnets with overlapping IPv4 addresses, use 4via6 subnet routers to access resources with unique IPv6 addresses for each overlapping subnet.

Subnet routers

Operating a subnet router within Azure AKS

Oftentimes organizations are using Tailscale to connect to and administer their Azure AKS clusters. While Tailscale can run within a container and be deployed to AKS, we recommend running your subnet routers externally to these clusters to ensure connectivity is available in the event your cluster is having issues. In other words, run your subnet router on dedicated VMs or an AKS cluster separate from than the cluster you’re administering.

Tailscale SSH session recording

Deploy multiple session recorder instances across multiple availability zones to improve resiliency against zone failures. If your organization operates across multiple regions, consider deploying SSH session recording nodes in each region you operate and configure SSH access rules to send recording information to the local region for your nodes.