DNS (Domain Name Service) was invented in 1983. DNS is a system that lets you
turn names into IP addresses so that your computer can know how to connect with
websites like tailscale.com. This is a simple service, so the authors of 4.3 BSD
specified a simple configuration file called /etc/resolv.conf
:
$ cat /etc/resolv.conf
nameserver 192.168.122.1
In this case, it tells the DNS resolution function to use 192.168.122.1 as the DNS server. This means that when you do lookups for websites like tailscale.com, it will ask 192.168.122.1 to do that lookup for you:
$ nslookup tailscale.com
Server: 192.168.122.1
Address: 192.168.122.1:53
Non-authoritative answer:
Name: tailscale.com
Address: 18.205.143.78
If Tailscale were around in the old days, we could probably just modify /etc/resolv.conf safely, and that would be the end of it.
However, things like DHCP came along and added a bunch of needed complexity into
the equation. DHCP is a protocol that lets machines on a network discover what
config they should use by shouting aimlessly at everyone on the network until
someone tells them what they want. One of the things that DHCP provides is the
IP address of the network’s preferred DNS server. The contents of
/etc/resolv.conf
need to be managed by some program, and if there are
disagreements, the disagreeing programs (such as a DHCP client and Tailscale)
need to compete for DNS supremacy. Most distributions and custom setups started
using an ungoogleable program called resolvconf
to aid this.
resolvconf
will helpfully add a comment to the beginning of a
/etc/resolv.conf
letting you know that resolvconf
is managing it:
# Generated by resolvconf
resolvconf
is a loose convention for managing DNS, which is implemented in
slightly mutually-incompatible ways by multiple programs. The two common ones
are Debian’s resolvconf
and
openresolv.
When several things have opinions about the DNS configuration, you need some way
to arbitrate between them. Debian’s resolvconf
adopts the strategy of letting
everybody win, and installs a configuration that is a blend of all its inputs.
This is fine until you get into a situation like Tailscale, where you actually
do want to be able to override the DNS configuration entirely (e.g. because an
admin set a forced DNS configuration in the Tailscale admin console). Of course,
we think we’re more right than others, but the others think the same about
themselves, and Debian resolvconf refuses to pick a winner.
openresolv allows you to specify the priority order of DNS servers. Additionally it allows programs to specify an “exclusive” mode where it will always prefer that option and other options will be discarded. If two programs want to be in “exclusive” mode, the last one that provided a configuration wins, and we’re back to competing for DNS supremacy.
However, as Tailscale we actually want this behavior, so we use it to set DNS configuration when we can:
$ cat /etc/resolv.conf
# Generated by resolvconf
search christine.website.beta.tailscale.net akua.xeserv.us
nameserver 100.100.100.100
After a while people in FreeDesktop noticed that this constant battling for DNS
supremacy was very annoying (not to mention configuring Wi-Fi connections was
even more annoying) and they got together to create a better path forward. They
called this NetworkManager. It
uses a protocol called D-Bus
to allow other programs to tell it what to do. This is a marked improvement over
what resolvconf
does. To update /etc/resolv.conf
with resolvconf
you need
to pipe your desired configuration to resolvconf
and hope the thing you wanted
actually happens. NetworkManager’s API has a schema and allows introspecting,
which makes things easier on our end.
NetworkManager aimed to be the One Daemon To Rule Them All of network management
on Linux. Even though it has its own ways to manage /etc/resolv.conf
,
sometimes NetworkManager can be configured to use resolvconf
to manage
/etc/resolv.conf
. This happens on more distros than you would think.
NetworkManager did a very good job at hiding a lot of the hard parts and allows
users to configure the network with GUI tools.
NetworkManager was the standard and best of breed way of doing DNS configuration for a long time (some distros still prefer it to this day), however as things got more complicated there was a need for something a bit more powerful. The systemd project created a solution called systemd-resolved, which allows administrators to have more control over how DNS gets resolved on a per-network interface basis. Here’s the resolved status on one of our Linux machines:
$ resolvectl status
Global
Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Current DNS Server: 100.100.100.100
DNS Servers: 100.100.100.100 8.8.8.8 1.1.1.1
Fallback DNS Servers: 100.100.100.100 8.8.8.8 1.1.1.1
DNS Domain: akua.xeserv.us christine.website.beta.tailscale.net
Link 2 (enp5s0)
Current Scopes: LLMNR/IPv4 LLMNR/IPv6
Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Link 9 (tailscale0)
Current Scopes: LLMNR/IPv4 LLMNR/IPv6
Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Among other things, systemd-resolved allows you to use DNS over TLS. This is an extra ball of fun that is thankfully out of scope for this article. However, systemd-resolved also allows Tailscale to reliably configure it with its D-Bus API (not quite the same API as NetworkManager, of course).
Of course this assumes that we’re treating DNS as a globally consistent namespace, the way DNS was intended when it was first invented. This is not always the case.
Some networks or organizations have their own private DNS server with names that are unable to be resolved over the internet. This makes things a lot more complicated. For lack of a better term we will be calling this setup “split DNS” (if you have a better term in mind we are more than happy to take suggestions, but for the sake of this article we’re going to call it “split DNS”).
IP traffic is routed between other machines using a routing table. This routing table has a list of networks and instructions on what to do with them. To correctly handle a split DNS setup, you need a routing table for DNS, broken down by subdomain instead of by IP address. This is how Windows, macOS, and Linux with systemd-resolved handle these kinds of configurations. For example, you could have a DNS routing table that looks like this:
- If the domain ends in
.akua
, ask10.77.2.2
for the answer - If the domain ends in
.local
, ask Bonjour for the answer - Otherwise ask either
1.1.1.1
or8.8.8.8
for the answer
These setups are more common than you would think at first and are in use in
just about every household with a Mac in it. This lets you automatically
discover the IP for computername with the domain computername.local
. Most
corporate VPNs will also want this to have internal-facing services (such as
git, database or IRC servers) resolve to an IP address behind the VPN. This
prevents leaking requests to the public DNS service, and Linux lacking this
support out of the box (when running without systemd-resolved, that is) has been
a significant limitation.
/etc/resolv.conf
does not have support for routing DNS based on the domain
name, so in the most basic configuration, we implement the routing in an
in-process resolver within the Tailscale daemon, and tell the OS to send all its
DNS traffic to 100.100.100.100. This traffic gets handled locally by your
machine’s tailscaled process, and lets resolv.conf-based systems have split DNS.
We still have to occasionally battle for DNS supremacy, depending on what else
is trying to edit /etc/resolv.conf
. resolvconf
is a similar story, possibly
with a little less fighting over the configuration.
Then comes NetworkManager. NetworkManager has the ability to control
/etc/resolv.conf
, resolvconf
and optionally a DNS server called
dnsmasq
. The only mode that
allows split DNS is dnsmasq
mode. This means that Tailscale needs to care
about which mode NetworkManager is in, and we use this
code
to do it. We have some extra code in there to handle cases where we should be
using NetworkManager, but it fails to respond to pings (thank $DEITY that the
standard D-Bus way of doing things is to have every object implement a “Ping”
method), in which case we need to get into the trenches again.
As an aside, one major difficulty in all of this is that name resolution on
Linux systems is very poorly specified, and each of these methods results in
slightly different behavior. If we do a resolution for go.akua
, what will
happen? Will it go to the resolver for the public internet? Will it go to the
right split server? Will it get sent over Tor for some reason? Will it get sent
to the potentially dodgy DNS server on the public Wi-Fi hotspot at your local
coffee shop? Will it get sent over UDP, TCP or DNS over HTTPS? We don’t know.
This stuff is not documented and as a result, you need to figure out what it
does through blood, tears and heartbreak. For extra fun, the behavior of glibc
and musl differs here too. Please document your behaviors when you write new
software. This saves so many people so much time.
An example of how to do this right is systemd-resolved. It can do everything a modern split-DNS VPN needs natively, so in theory there’s no extra work (except see below, because reality is not quite as clean as we’d like). The systemd team painstakingly wrote down what they do, and made it unambiguously obvious how you should twiddle things to get what you want. This is the kind of documentation that infrastructure programs should strive to have.
Now, if you are in a place where you need to provide a DNS server on Linux, and have to figure out how you should configure the system’s resolver, here is how you do it.
Starting from the top, first you need to check if /etc/resolv.conf
exists at
all. If it doesn’t you can just overwrite it:
If it does exist, then you need to check who the owner of the file is. You can
check for the owner of /etc/resolv.conf
by looking for the magic words at the
top of /etc/resolv.conf
, such as these:
# Generated by resolvconf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
# Generated by NetworkManager
These will tell you which service manages your /etc/resolv.conf
file. If you
can’t find any owner you need to blow away /etc/resolv.conf
and hope for the
best.
If resolvconf
is in use, then you should too, assuming of course the
resolvconf
binary is available on your $PATH
:
If the config seems owned by NetworkManager, you need to check if NetworkManager is available over D-Bus, and if so, you can use it. Otherwise, you’re back to overwriting resolv.conf.
NetworkManager also adds a wrinkle to the resolvconf path: if the resolvconf-generated configuration comes from NetworkManager, we want to try and use NetworkManager rather than resolvconf, because NetworkManager is more capable. So, we do an extra detection pass to see if resolvconf is being fed by NetworkManager, and switch to NetworkManager if so.
And if resolvconf seems to be fed by NetworkManager, but we’re unable to talk to NetworkManager, we should fall back to using resolvconf.
If you’re using systemd-resolved, things should be smooth sailing… But there’s a wrinkle. It turns out that NetworkManager, up until very recently, configures systemd-resolved slightly incorrectly in a way that makes it impossible to override the default resolver if you’re talking to systemd-resolved yourself. This was fixed in December 2020 with NetworkManager 1.26.6 (relevant bug report).
So, if systemd-resolved is in use, we need to check if NetworkManager is also present, and whether it’s pushing its configuration into systemd-resolved. If so, we must use NetworkManager to configure DNS, even though its capabilities are slightly less than systemd-resolved.
As far as we understand, this setup will allow you to have a somewhat consistent way to configure DNS on Linux systems. We hope this will save you time when facts and circumstances force you to implement this logic in the future. You will also need to implement a “polyfill” for the DNS routing bits that your service needs, for every case where you don’t have a routing-aware DNS configuration (which on this graph is most of the cases).
If you decide that you want to make some new DNS configuration management service in the future, please make sure it’s documented. Including its interactions with the rest of this graph.
If you’re a Linux distro maintainer, you may be wondering what part of this hilarity you should inflict on your users. Our take is that you should use systemd-resolved, and if you need user-friendly network configuration, a very recent version of NetworkManager (1.26.6 or better). This will give your distro state-of-the-art DNS capabilities, and make implementers of networking software much happier. With this setup, the DNS configuration graph look like this:
The upcoming Tailscale 1.8 release implements all of the above, which should hopefully make DNS on Linux just work, no matter how your machine is choosing to do it.