Azure reference architecture
This document details best practices and a reference architecture for Tailscale deployments on Microsoft Azure. The following guidance applies for all Tailscale modes of operation—devices, exit nodes, subnet routers, etc.
- Tailscale device—for the purposes of this document Tailscale device can refer to a Tailscale node, exit node, subnet router, etc.
See Terminology and concepts for additional terms.
Tailscale provides a few options for connecting to resources within Azure. At a high-level they are:
- Agent-to-Agent connectivity—e.g. connecting to “static” resources such as virtual machines (VMs). This is recommended where you can install and run Tailscale directly on the resource you wish to connect to.
- IP-based connectivity with a Tailscale subnet router—e.g. connecting to managed Azure resources such as Azure SQL, Azure Cosmos DB, etc. This is recommended where you cannot run Tailscale on the resource you are connecting to, or want to expose an existing subnet or services in a virtual network to your tailnet.
- Kubernetes services and auth proxy with Tailscale Kubernetes operator—expose services in your Azure Kubernetes Service (AKS) cluster and your AKS cluster control plane directly to your Tailscale network. This is recommended where you are connecting to resources running in a Kubernetes cluster, or to a Kubernetes cluster’s control plane.
- Azure Functions, Azure App Service, and other container solutions—access resources on your tailnet from Azure Functions, Azure App Service, and other container solutions.
We recommend installing the Tailscale agent wherever possible—e.g. setting up servers on Azure VMs. This generally provides the best and most scalable connectivity while enabling Tailscale agent-based functionality such as Tailscale SSH.
See Access Azure Linux VMs privately using Tailscale and Access Azure Windows VMs privately using Tailscale for general guides.
For managed resources where you cannot install the Tailscale agent, such as Azure SQL, Azure Cosmos DB, etc., you can run a subnet router within your virtual network to access these resources from Tailscale.
The Tailscale Kubernetes operator allows you to expose services in your Kubernetes cluster to your Tailscale network, and use an API server proxy for secure connectivity to the Kubernetes control plane.
Tailscale supports userspace networking where processes in the container can connect to other resources on your Tailscale network via a SOCKS5 or HTTP proxy. This allows Azure Functions, Azure App Service, and other container-based solutions to connect to the Tailscale network with minimal configuration needed.
See Using Tailscale on Azure App Service for a general guide.
Below are general recommendations and best practices for running Tailscale in production environments. Much of what is listed below is explained in greater detail throughout this document:
- Run subnet routers, exit nodes, etc., separately from the systems you are administering with Tailscale—e.g. run your subnet routers outside of your Azure AKS clusters.
- Run multiple subnet routers across multiple Azure availability zones to improve resiliency against zone failures with subnet router failover.
- Run multiple Tailscale SSH session recorder nodes across multiple Azure availability zones to improve resiliency against zone failures with recorder node failover.
- Deploy dynamically scaled resources (e.g. containers, serverless functions, etc.) as ephemeral nodes to automatically clean up devices after they shut down.
- When possible deploy subnet routers, exit nodes, etc., to public subnets with public IP addresses to ensure direct connections and optimal performance.
When installing Tailscale on an Azure VM as a “normal” Tailscale device (e.g. not a subnet router, exit node, etc.), you likely have already sized that VM to a suitable type for its workload and running Tailscale on it will likely add negligible resource usage.
There are many variables that affect performance and workloads vary widely so we do not have specific size recommendations, but we do have general guidance for selecting a VM size for an Azure VM running as a subnet router or exit node:
- In general, higher CPU clock speed is more important than more cores.
- In general, VMs with Ampere Altra Arm–based processors processors are quite cost effective for packet forwarding
- Use a non-burstable VM type to achieve consistent CPU performance.
- Per Azure documentation, burstable performance machine sizes (such as B-series VMs) use a CPU credit mechanism which can result in variable performance.
- Tailscale will generally perform better on Linux than other operating systems due to additional optimizations that have been implemented.
Tailscale uses various NAT traversal techniques to safely connect to other Tailscale nodes without manual intervention. Nearly all of the time, you do not need to open any firewall ports for Tailscale. However, if your virtual network and network security groups are overly restrictive about internet-bound egress traffic, refer to What firewall ports should I open to use Tailscale.
Tailscale devices deployed to a public subnet with a public IP address will benefit from direct connections between nodes for the best performance.
Tailscale uses both direct and relayed connections, opting for direct connections where possible. Azure NAT Gateway is known to impede direct connections causing connections to use Tailscale DERP relay servers. This does not cause connectivity issues, but can lead to lower throughput and performance than direct connections.
If you must deploy Tailscale such that internet-bound connections go through a Azure NAT Gateway (e.g. to reuse existing IP addresses that are allow-listed to third parties), contact your Tailscale account team to discuss more advanced deployment options that utilize public and private subnet routing on a single Azure VM.
To allow non-Azure devices on your tailnet to query Azure DNS private zones, create a Azure DNS private resolver for your virtual network and configure split DNS to forward queries for your virtual network’s internal domain name suffix to the IP address of the DNS private resolver’s inbound endpoint.
Virtual network peering is a common strategy for connecting multiple virtual networks together. You can deploy a subnet router (or a set for subnet router failover) within a virtual network to allow access to multiple virtual networks.
If you have virtual networks or subnets with overlapping IPv4 addresses, use 4via6 subnet routers to access resources with unique IPv6 addresses for each overlapping subnet.
Multiple subnet routers can be deployed and configured to advertise the same routes to achieve subnet router failover. This ensures users of your network can continue to access resources if one routing device goes offline. Deploy two or more subnet routers across multiple availability zones to have better isolation and protection against zone failures.
Oftentimes organizations are using Tailscale to connect to and administer their Azure AKS clusters. While Tailscale can run within a container and be deployed to AKS, we recommend running your subnet routers externally to these clusters to ensure connectivity is available in the event your cluster is having issues. In other words, run your subnet router on dedicated VMs or an AKS cluster separate from than the cluster you’re administering.
Deploy multiple session recorder instances across multiple availability zones to improve resiliency against zone failures. If your organization operates across multiple regions, consider deploying SSH session recording nodes in each region you operate and configure SSH access rules to send recording information to the local region for your nodes.