Azure reference architecture

This document details best practices and a reference architecture for Tailscale deployments on Microsoft Azure. The following guidance applies for all Tailscale modes of operation—devices, exit nodes, subnet routers, etc.

Terminology

  • Tailscale device—for the purposes of this document Tailscale device can refer to a Tailscale node, exit node, subnet router, etc.

See Terminology and concepts for additional terms.

High-level architecture

Potential deployments of Tailscale to access Azure resources

Ways to deploy Tailscale to connect to and from Azure resources

Tailscale provides a few options for connecting to resources within Azure. At a high-level they are:

Agent-to-Agent connectivity

We recommend installing the Tailscale agent wherever possible—e.g. setting up servers on Azure VMs. This generally provides the best and most scalable connectivity while enabling Tailscale agent-based functionality such as Tailscale SSH.

See Access Azure Linux VMs privately using Tailscale and Access Azure Windows VMs privately using Tailscale for general guides.

IP-based connectivity with subnet router

For managed resources where you cannot install the Tailscale agent, such as Azure SQL, Azure Cosmos DB, etc., you can run a subnet router within your virtual network to access these resources from Tailscale.

Kubernetes services and API server proxy with Tailscale Kubernetes operator

The Tailscale Kubernetes operator allows you to expose services in your Kubernetes cluster to your Tailscale network, and use an API server proxy for secure connectivity to the Kubernetes control plane.

Azure Functions, Azure App Service, and other container solutions

Tailscale supports userspace networking where processes in the container can connect to other resources on your Tailscale network via a SOCKS5 or HTTP proxy. This allows Azure Functions, Azure App Service, and other container-based solutions to connect to the Tailscale network with minimal configuration needed.

See Using Tailscale on Azure App Service for a general guide.

Production best practices

Below are general recommendations and best practices for running Tailscale in production environments. Much of what is listed below is explained in greater detail throughout this document:

  • Run subnet routers, exit nodes, etc., separately from the systems you are administering with Tailscale—e.g. run your subnet routers outside of your Azure AKS clusters.
  • Run multiple subnet routers across multiple Azure availability zones to improve resiliency against zone failures with subnet router failover.
  • Run multiple Tailscale SSH session recorder nodes across multiple Azure availability zones to improve resiliency against zone failures with recorder node failover.
  • Deploy dynamically scaled resources (e.g. containers, serverless functions, etc.) as ephemeral nodes to automatically clean up devices after they shut down.
  • When possible deploy subnet routers, exit nodes, etc., to public subnets with public IP addresses to ensure direct connections and optimal performance.

Normal usage

When installing Tailscale on an Azure VM as a “normal” Tailscale device (e.g. not a subnet router, exit node, etc.), you likely have already sized that VM to a suitable type for its workload and running Tailscale on it will likely add negligible resource usage.

Subnet router and exit node

There are many variables that affect performance and workloads vary widely so we do not have specific size recommendations, but we do have general guidance for selecting a VM size for an Azure VM running as a subnet router or exit node:

  • In general, higher CPU clock speed is more important than more cores.
  • In general, VMs with Ampere Altra Arm–based processors processors are quite cost effective for packet forwarding
  • Use a non-burstable VM type to achieve consistent CPU performance.
    • Per Azure documentation, burstable performance machine sizes (such as B-series VMs) use a CPU credit mechanism which can result in variable performance.
  • Tailscale will generally perform better on Linux than other operating systems due to additional optimizations that have been implemented.

Using Tailscale with Azure

Network security groups

Tailscale uses various NAT traversal techniques to safely connect to other Tailscale nodes without manual intervention. Nearly all of the time, you do not need to open any firewall ports for Tailscale. However, if your virtual network and network security groups are overly restrictive about internet-bound egress traffic, refer to What firewall ports should I open to use Tailscale.

Public vs private subnets

Tailscale devices deployed to a public subnet with a public IP address will benefit from direct connections between nodes for the best performance.

Azure NAT Gateway

Tailscale uses both direct and relayed connections, opting for direct connections where possible. Azure NAT Gateway is known to impede direct connections causing connections to use Tailscale DERP relay servers. This does not cause connectivity issues, but can lead to lower throughput and performance than direct connections.

If you must deploy Tailscale such that internet-bound connections go through a Azure NAT Gateway (e.g. to reuse existing IP addresses that are allow-listed to third parties), contact your Tailscale account team to discuss more advanced deployment options that utilize public and private subnet routing on a single Azure VM.

Virtual network DNS resolution

To allow non-Azure devices on your tailnet to query Azure DNS private zones, create a Azure DNS private resolver for your virtual network and configure split DNS to forward queries for your virtual network’s internal domain name suffix to the IP address of the DNS private resolver’s inbound endpoint.

Virtual network peering

Virtual network peering is a common strategy for connecting multiple virtual networks together. You can deploy a subnet router (or a set for subnet router failover) within a virtual network to allow access to multiple virtual networks.

If you have virtual networks or subnets with overlapping IPv4 addresses, use 4via6 subnet routers to access resources with unique IPv6 addresses for each overlapping subnet.

Subnet routers

Subnet router failover

Multiple subnet routers can be deployed and configured to advertise the same routes to achieve subnet router failover. This ensures users of your network can continue to access resources if one routing device goes offline. Deploy two or more subnet routers across multiple availability zones to have better isolation and protection against zone failures.

Operating a subnet router within Azure AKS

Oftentimes organizations are using Tailscale to connect to and administer their Azure AKS clusters. While Tailscale can run within a container and be deployed to AKS, we recommend running your subnet routers externally to these clusters to ensure connectivity is available in the event your cluster is having issues. In other words, run your subnet router on dedicated VMs or an AKS cluster separate from than the cluster you’re administering.

Tailscale SSH session recording

Deploy multiple session recorder instances across multiple availability zones to improve resiliency against zone failures. If your organization operates across multiple regions, consider deploying SSH session recording nodes in each region you operate and configure SSH access rules to send recording information to the local region for your nodes.

Last updated