AWS reference architecture

This document details best practices and a reference architecture for Tailscale deployments on Amazon Web Services (AWS). The following guidance applies for all Tailscale modes of operation—devices, exit nodes, subnet routers, and the like.

For the purposes of this document, Tailscale device can refer to a Tailscale node, exit node, subnet router, and the like. Refer to Terminology and concepts for additional terms.

High-level architecture

Potential deployments of Tailscale to access AWS resources

Ways to deploy Tailscale to connect to and from AWS resources

Tailscale provides a few options for connecting to resources within AWS. At a high-level they are:

Agent-to-Agent connectivity—connect to "static" resources such as Amazon Elastic Compute Cloud (EC2) instances. This is recommended where you can install and run Tailscale directly on the resource you wish to connect to.
IP-based connectivity with a Tailscale subnet router—connect to managed AWS resources such as Amazon's Relational Database Service (AWS RDS) or Amazon Redshift. This is recommended where you cannot run Tailscale on the resource you are connecting to, or want to expose an existing subnet or services in a VPC to your tailnet.
DNS-based routing with a Tailscale app connector—connect to software as a service (SaaS) applications or other resources over your tailnet with DNS-based routing.
Kubernetes services and auth proxy with the Tailscale Kubernetes Operator—expose services in your Amazon Elastic Kubernetes Service (EKS) cluster and your EKS cluster control plane directly to your Tailscale network. This is recommended where you are connecting to resources running in a Kubernetes cluster, or to a Kubernetes cluster's control plane.
Lambda and other container services—access resources in your tailnet from Lambda functions and other container solutions.

Agent-to-agent connectivity

We recommend installing the Tailscale agent wherever possible—for example, setting up servers on EC2 instances. This generally provides the best and most scalable connectivity while enabling Tailscale agent-based functionality such as Tailscale SSH.

IP-based connectivity with subnet router

For managed resources where you cannot install the Tailscale agent, such as AWS RDS, Amazon Redshift, and similar services, you can run a subnet router within your VPC to access these resources from Tailscale. Subnet routers can also be used to connect to resources by using AWS PrivateLink and VPC endpoints.

Refer to Access AWS RDS privately using Tailscale for a general guide.

DNS-based routing with an app connector

App connectors let you route traffic bound for SaaS applications or managed services by proxying DNS for the target domains and advertising the subnet routes for the observed DNS results. This is useful for cases where the application has an allowlist of IP addresses which can connect to it; the IP address of the nodes running an app connector can be added to the allowlist, and all nodes in the tailnet will use that IP address for their traffic egress.

Kubernetes services and API server proxy with Tailscale Kubernetes Operator

The Tailscale Kubernetes Operator lets you expose services in your Kubernetes cluster to your Tailscale network, and use an API server proxy for secure connectivity to the Kubernetes control plane.

Lambda and other container services

Tailscale supports userspace networking where processes in the container can connect to other resources on your Tailscale network by using a SOCKS5 or HTTP proxy. This lets AWS Lambda, AWS App Runner, AWS Lightsail and other container-based solutions connect to the Tailscale network with minimal configuration needed.

Production best practices

Below are general recommendations and best practices for running Tailscale in production environments. Much of what is listed below is explained in greater detail throughout this document:

When possible deploy subnet routers, exit nodes, app connectors, and the like, to public subnets with public IP addresses to ensure direct connections and optimal performance.
Run subnet routers, exit nodes, app connectors, and the like, separately from the systems you are administering with Tailscale—for example, run your subnet routers outside of your Amazon EKS clusters.
Deploy dynamically scaled resources (for example, containers or serverless functions) as ephemeral nodes to automatically clean up devices after they shut down.

High availability and regional routing

Run multiple subnet routers and app connectors across multiple AWS availability zones to improve resiliency against zone failures with high availability failover and deploy across multiple regions for regional routing.
Run multiple Tailscale SSH session recorder nodes across multiple AWS availability zones to improve resiliency against zone failures with recorder node failover and deploy across multiple regions for regional routing.

Performance best practices

For general recommendations, refer to Performance best practices.

In-region load balancing

Deploy multiple overlapping connectors within a DERP region to take advantage of in-region load balancing to evenly spread load across the connectors on a best-effort basis, and enable in-region redundancy.

Recommended instance sizing

When choosing an EC2 instance size, consider whether the device will function as a normal device, or a device running as a subnet router, exit node, or app connector.

Normal usage

When installing Tailscale on an EC2 instance as a "normal" Tailscale device (for example, not a subnet router or exit node), you likely have already sized that instance to a suitable instance type for its workload and running Tailscale on it will likely add negligible resource usage.

Subnet routers, exit nodes, and app connectors

There are many variables that affect performance and workloads vary widely so we do not have specific size recommendations, but we do have general guidance for selecting an instance type for an EC2 instance running as a subnet router, exit node, or app connector:

In general, higher CPU clock speed is more important than more cores.
In general, instances with ARM-based AWS Graviton processors are quite cost effective for packet forwarding
Use a non-burstable instance type to achieve consistent virtual CPU (vCPU) and network performance.
- Per the AWS Burstable performance instances topic, burstable performance instances (such as T4g, T3a, T2, and the like) use a CPU credit mechanism which can result in poor performance with more than a single concurrent connection. Therefore, they should only be used for testing. The M7g.medium instance is a good place to start for production routers.
Use an instance type with greater than 16 vCPUs. For example, use at least 24 vCPUs to ensure consistent network performance.
- Per the AWS Amazon EC2 instance network bandwidth topic, instances with 16 vCPUs or less use a network I/O credit mechanism to burst beyond baseline bandwidth. This provides a marginal performance increase except for very high-load resources.
In certain workloads that are sensitive to throughput, and that require several concurrent high-speed connections, you may realize better network performance by choosing a subnet router instance with dedicated network capacity, which is limited to those with more than 16 vCPUs. We recommend discussing your requirements with our solutions team if you feel this applies to you.

Using Tailscale with AWS

Depending on how you are using AWS and Tailscale, there are considerations for your security groups and subnets.

Security groups

Tailscale uses various NAT traversal techniques to safely connect to other Tailscale nodes without manual intervention. Usually, you do not need to open any firewall ports for Tailscale. However, if your VPC and security groups are overly restrictive about internet-bound egress traffic, refer to What firewall ports should I open to use Tailscale?

Public vs private subnets

Tailscale devices deployed to a public subnet with a public IP address will benefit from direct connections between nodes for the best performance.

AWS NAT Gateway

Tailscale uses both direct and relayed connections, opting for direct connections where possible. AWS NAT Gateway is a Hard NAT and will prevent direct connections causing connections to use DERP relay servers. This does not cause connectivity issues, but can lead to lower throughput and performance than direct connections.

If you must deploy Tailscale such that internet-bound connections go through a AWS NAT Gateway (for example, to reuse existing IP addresses that are allow-listed to third parties), contact your Tailscale account team to discuss more advanced deployment options that use public and private subnet routing on a single EC2 instance.

Egress-only internet gateway

An egress-only IPv6 gateway attached to a private subnet will allow direct connections to peers that have IPv6 addresses. Nodes which only have IPv4 available will be reachable by DERP relay which have both IPv4 and IPv6 connections.

VPC DNS resolution

Private hosted zones and private addresses for AWS resources can only be resolved using private Route 53 Resolvers. The private Route 53 Resolvers can be accessed through a subnet router. To enable this:

Deploy a subnet router to your VPC.
Enable access to the private Route 53 Resolver by configuring access controls to allow TCP and UDP access on port 53 to the VPC+2 IP address of your VPC. For example, if your VPC's CIDR is 10.0.96.0/20, your VPC+2 IP address is 10.0.96.2.
Forward queries for internal AWS domains to the Amazon Route 53 Resolver by configuring split DNS to the VPC+2 IP address that is now reachable by your subnet router. If you have multiple VPCs, associate additional VPCs with a private hosted zone to enable DNS resolution across all of them.

VPC peering and transit VPCs

VPC peering and transit VPCs are a common strategy for connecting multiple VPCs together. You can deploy a subnet router (or a set for high availability) within a VPC to allow access to multiple VPCs.

If you have VPCs or subnets with overlapping IPv4 addresses, use 4via6 subnet routers to access resources with unique IPv6 addresses for each overlapping subnet.

Subnet routers

Operating a subnet router within Amazon EKS or Amazon ECS

Oftentimes organizations are using Tailscale to connect to and administer their EKS clusters, Amazon Elastic Container Service (ECS) deployments, and the like. While Tailscale can run within a container and be deployed to EKS or ECS, we recommend running your subnet routers externally to these clusters to ensure connectivity is available in the event your cluster is having issues. In other words, run your subnet router on dedicated EC2 instances or an EKS cluster separate from than the cluster you're administering.

Tailscale SSH session recording

Deploy multiple session recorder instances across multiple availability zones to improve resiliency against zone failures. If your organization operates across multiple regions, consider deploying SSH session recording nodes in each region you operate and configure SSH access rules to send recording information to the local region for your nodes.

AWS NAT Gateway is a hard NAT. All devices and services behind an AWS NAT Gateway will connect using DERP relay servers. For best performance, put devices in a public subnet to facilitate direct connections.