Life on the edge: Networking challenges of AI deploymentsRead more
Get started
Login
WireGuard is a registered trademark of Jason A. Donenfeld.
© 2024 Tailscale Inc. All rights reserved. Tailscale is a registered trademark of Tailscale Inc.
Blog|April 15, 2021

The Sisyphean Task Of DNS Client Config on Linux

alt

DNS (Domain Name Service) was invented in 1983. DNS is a system that lets you turn names into IP addresses so that your computer can know how to connect with websites like tailscale.com. This is a simple service, so the authors of 4.3 BSD specified a simple configuration file called /etc/resolv.conf:

$ cat /etc/resolv.conf
nameserver 192.168.122.1

In this case, it tells the DNS resolution function to use 192.168.122.1 as the DNS server. This means that when you do lookups for websites like tailscale.com, it will ask 192.168.122.1 to do that lookup for you:

$ nslookup tailscale.com
Server:                192.168.122.1
Address:        192.168.122.1:53

Non-authoritative answer:
Name:        tailscale.com
Address: 18.205.143.78

If Tailscale were around in the old days, we could probably just modify /etc/resolv.conf safely, and that would be the end of it.

However, things like DHCP came along and added a bunch of needed complexity into the equation. DHCP is a protocol that lets machines on a network discover what config they should use by shouting aimlessly at everyone on the network until someone tells them what they want. One of the things that DHCP provides is the IP address of the network’s preferred DNS server. The contents of /etc/resolv.conf need to be managed by some program, and if there are disagreements, the disagreeing programs (such as a DHCP client and Tailscale) need to compete for DNS supremacy. Most distributions and custom setups started using an ungoogleable program called resolvconf to aid this.

resolvconf will helpfully add a comment to the beginning of a /etc/resolv.conf letting you know that resolvconf is managing it:

# Generated by resolvconf

resolvconf is a loose convention for managing DNS, which is implemented in slightly mutually-incompatible ways by multiple programs. The two common ones are Debian’s resolvconf and openresolv.

When several things have opinions about the DNS configuration, you need some way to arbitrate between them. Debian’s resolvconf adopts the strategy of letting everybody win, and installs a configuration that is a blend of all its inputs. This is fine until you get into a situation like Tailscale, where you actually do want to be able to override the DNS configuration entirely (e.g. because an admin set a forced DNS configuration in the Tailscale admin console). Of course, we think we’re more right than others, but the others think the same about themselves, and Debian resolvconf refuses to pick a winner.

openresolv allows you to specify the priority order of DNS servers. Additionally it allows programs to specify an “exclusive” mode where it will always prefer that option and other options will be discarded. If two programs want to be in “exclusive” mode, the last one that provided a configuration wins, and we’re back to competing for DNS supremacy.

However, as Tailscale we actually want this behavior, so we use it to set DNS configuration when we can:

$ cat /etc/resolv.conf
# Generated by resolvconf
search christine.website.beta.tailscale.net akua.xeserv.us
nameserver 100.100.100.100

After a while people in FreeDesktop noticed that this constant battling for DNS supremacy was very annoying (not to mention configuring Wi-Fi connections was even more annoying) and they got together to create a better path forward. They called this NetworkManager. It uses a protocol called D-Bus to allow other programs to tell it what to do. This is a marked improvement over what resolvconf does. To update /etc/resolv.conf with resolvconf you need to pipe your desired configuration to resolvconf and hope the thing you wanted actually happens. NetworkManager’s API has a schema and allows introspecting, which makes things easier on our end.

NetworkManager aimed to be the One Daemon To Rule Them All of network management on Linux. Even though it has its own ways to manage /etc/resolv.conf, sometimes NetworkManager can be configured to use resolvconf to manage /etc/resolv.conf. This happens on more distros than you would think. NetworkManager did a very good job at hiding a lot of the hard parts and allows users to configure the network with GUI tools.

NetworkManager was the standard and best of breed way of doing DNS configuration for a long time (some distros still prefer it to this day), however as things got more complicated there was a need for something a bit more powerful. The systemd project created a solution called systemd-resolved, which allows administrators to have more control over how DNS gets resolved on a per-network interface basis. Here’s the resolved status on one of our Linux machines:

$ resolvectl status
Global
           Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
    resolv.conf mode: stub
  Current DNS Server: 100.100.100.100
         DNS Servers: 100.100.100.100 8.8.8.8 1.1.1.1
Fallback DNS Servers: 100.100.100.100 8.8.8.8 1.1.1.1
          DNS Domain: akua.xeserv.us christine.website.beta.tailscale.net

Link 2 (enp5s0)
Current Scopes: LLMNR/IPv4 LLMNR/IPv6
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 9 (tailscale0)
Current Scopes: LLMNR/IPv4 LLMNR/IPv6
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Among other things, systemd-resolved allows you to use DNS over TLS. This is an extra ball of fun that is thankfully out of scope for this article. However, systemd-resolved also allows Tailscale to reliably configure it with its D-Bus API (not quite the same API as NetworkManager, of course).

Of course this assumes that we’re treating DNS as a globally consistent namespace, the way DNS was intended when it was first invented. This is not always the case.

Some networks or organizations have their own private DNS server with names that are unable to be resolved over the internet. This makes things a lot more complicated. For lack of a better term we will be calling this setup “split DNS” (if you have a better term in mind we are more than happy to take suggestions, but for the sake of this article we’re going to call it “split DNS”).

IP traffic is routed between other machines using a routing table. This routing table has a list of networks and instructions on what to do with them. To correctly handle a split DNS setup, you need a routing table for DNS, broken down by subdomain instead of by IP address. This is how Windows, macOS, and Linux with systemd-resolved handle these kinds of configurations. For example, you could have a DNS routing table that looks like this:

  • If the domain ends in .akua, ask 10.77.2.2 for the answer
  • If the domain ends in .local, ask Bonjour for the answer
  • Otherwise ask either 1.1.1.1 or 8.8.8.8 for the answer

These setups are more common than you would think at first and are in use in just about every household with a Mac in it. This lets you automatically discover the IP for computername with the domain computername.local. Most corporate VPNs will also want this to have internal-facing services (such as git, database or IRC servers) resolve to an IP address behind the VPN. This prevents leaking requests to the public DNS service, and Linux lacking this support out of the box (when running without systemd-resolved, that is) has been a significant limitation.

/etc/resolv.conf does not have support for routing DNS based on the domain name, so in the most basic configuration, we implement the routing in an in-process resolver within the Tailscale daemon, and tell the OS to send all its DNS traffic to 100.100.100.100. This traffic gets handled locally by your machine’s tailscaled process, and lets resolv.conf-based systems have split DNS. We still have to occasionally battle for DNS supremacy, depending on what else is trying to edit /etc/resolv.conf. resolvconf is a similar story, possibly with a little less fighting over the configuration.

Then comes NetworkManager. NetworkManager has the ability to control /etc/resolv.conf, resolvconf and optionally a DNS server called dnsmasq. The only mode that allows split DNS is dnsmasq mode. This means that Tailscale needs to care about which mode NetworkManager is in, and we use this code to do it. We have some extra code in there to handle cases where we should be using NetworkManager, but it fails to respond to pings (thank $DEITY that the standard D-Bus way of doing things is to have every object implement a “Ping” method), in which case we need to get into the trenches again.

As an aside, one major difficulty in all of this is that name resolution on Linux systems is very poorly specified, and each of these methods results in slightly different behavior. If we do a resolution for go.akua, what will happen? Will it go to the resolver for the public internet? Will it go to the right split server? Will it get sent over Tor for some reason? Will it get sent to the potentially dodgy DNS server on the public Wi-Fi hotspot at your local coffee shop? Will it get sent over UDP, TCP or DNS over HTTPS? We don’t know. This stuff is not documented and as a result, you need to figure out what it does through blood, tears and heartbreak. For extra fun, the behavior of glibc and musl differs here too. Please document your behaviors when you write new software. This saves so many people so much time.

An example of how to do this right is systemd-resolved. It can do everything a modern split-DNS VPN needs natively, so in theory there’s no extra work (except see below, because reality is not quite as clean as we’d like). The systemd team painstakingly wrote down what they do, and made it unambiguously obvious how you should twiddle things to get what you want. This is the kind of documentation that infrastructure programs should strive to have.

Now, if you are in a place where you need to provide a DNS server on Linux, and have to figure out how you should configure the system’s resolver, here is how you do it.

Starting from the top, first you need to check if /etc/resolv.conf exists at all. If it doesn’t you can just overwrite it:

A graph describing the above sentence

If it does exist, then you need to check who the owner of the file is. You can check for the owner of /etc/resolv.conf by looking for the magic words at the top of /etc/resolv.conf, such as these:

# Generated by resolvconf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
# Generated by NetworkManager

These will tell you which service manages your /etc/resolv.conf file. If you can’t find any owner you need to blow away /etc/resolv.conf and hope for the best.

If resolvconf is in use, then you should too, assuming of course the resolvconf binary is available on your $PATH:

A graph describing the above paragraphs

If the config seems owned by NetworkManager, you need to check if NetworkManager is available over D-Bus, and if so, you can use it. Otherwise, you’re back to overwriting resolv.conf.

NetworkManager also adds a wrinkle to the resolvconf path: if the resolvconf-generated configuration comes from NetworkManager, we want to try and use NetworkManager rather than resolvconf, because NetworkManager is more capable. So, we do an extra detection pass to see if resolvconf is being fed by NetworkManager, and switch to NetworkManager if so.

And if resolvconf seems to be fed by NetworkManager, but we’re unable to talk to NetworkManager, we should fall back to using resolvconf.

A graph describing the above paragraphs

If you’re using systemd-resolved, things should be smooth sailing… But there’s a wrinkle. It turns out that NetworkManager, up until very recently, configures systemd-resolved slightly incorrectly in a way that makes it impossible to override the default resolver if you’re talking to systemd-resolved yourself. This was fixed in December 2020 with NetworkManager 1.26.6 (relevant bug report).

So, if systemd-resolved is in use, we need to check if NetworkManager is also present, and whether it’s pushing its configuration into systemd-resolved. If so, we must use NetworkManager to configure DNS, even though its capabilities are slightly less than systemd-resolved.

A graph describing the above paragraphs

As far as we understand, this setup will allow you to have a somewhat consistent way to configure DNS on Linux systems. We hope this will save you time when facts and circumstances force you to implement this logic in the future. You will also need to implement a “polyfill” for the DNS routing bits that your service needs, for every case where you don’t have a routing-aware DNS configuration (which on this graph is most of the cases).

If you decide that you want to make some new DNS configuration management service in the future, please make sure it’s documented. Including its interactions with the rest of this graph.

If you’re a Linux distro maintainer, you may be wondering what part of this hilarity you should inflict on your users. Our take is that you should use systemd-resolved, and if you need user-friendly network configuration, a very recent version of NetworkManager (1.26.6 or better). This will give your distro state-of-the-art DNS capabilities, and make implementers of networking software much happier. With this setup, the DNS configuration graph look like this:

What we wish we could have

The upcoming Tailscale 1.8 release implements all of the above, which should hopefully make DNS on Linux just work, no matter how your machine is choosing to do it.

Share

Authors

Xe IasoXe Iaso
David AndersonDavid Anderson
Loading...

Try Tailscale for free

Schedule a demo
Contact sales
cta phone
mercury
instacrt
Retool
duolingo
Hugging Face