If you’re like most people, your answer to this is… “What? Why?”
When ssh was introduced back in the 1990s, its appeal was simple. Passwords are too short, too guessable, too phishable, too often stored incorrectly, too MITM-able, too brute-forceable. Also its primary competition was rsh’s classic “no authentication,” but we don’t talk about that.
Compared to that, ssh public-key authentication was a dream. Generate a keypair, don’t ever share the private key with anyone, and if an attacker ever steals your authorized_keys file, the joke’s on them, because all they can do with your public key is authorize you to break into their server. Suckers!
It’s pretty great. And that greatness might lead you to wonder: why, then, 25 years after we learned this important lesson, are all the most secure websites on the Internet still using password authentication? Well, some good reasons, some not-so-good reasons. TLS client-side certs in web browsers have been tried many times and failed many times. That’s a story for another day.
My story for today is about ssh and how even public keys, while much better than simple passwords, are still not a perfect solution.
The danger is credential theft, which is a fancy way of saying “someone stole your private keys.” Back in the 1990s, that problem was pretty far from our minds; Windows 98 didn’t even have the concept of a separate administrator account, never mind the idea of app sandboxing or the inkling that someone might intentionally want to load malware onto your computer and encrypt all your files for ransomware. Those were the days when some people thought ActiveX controls (essentially loading .exe files from web sites) might be a good idea. Actually, maybe even a great idea as long as there was an “are you sure?” dialog box first.
Come to think of it, how did anything ever work? What a time to be alive.
Anyway, public key authentication was an amazingly advanced level of protection at the time. Normal people who used ssh keys didn’t have to worry much about attacks, because there were much easier targets: the run faster than the bear solution.
Time has passed, though, and attackers are getting more advanced all the time, and some of the limits are starting to show.
Here’s a story someone told me last year: one company had an employee they fired for bad behavior. A few months later all their production servers got wiped out. They didn’t have enough forensics to identify how it happened, so they hired a consultant to help them reassemble it with a bit more logging for next time. A month later, it happened again! Everything gone.
But this time they had logs. And what the logs showed is that one of their employees had ssh’d into all the servers and smashed everything. The employee, however, had logged in from some other country where the employee demonstrably wasn’t. And it didn’t seem like an employee who couldn’t be bothered to hide their login would have bothered to use a fake IP. What happened?
What happened was that the rogue employee, months before they were fired, had waited until everyone went to lunch, walked around from one PC to the next, and collected the ssh private keys from any of them that weren’t locked. (Note in case you’re thinking of fixing this with a screen-locking policy: that’s not a bad start, but it’s not quite enough defense. Make sure to at least enable full-disk encryption on all your computers too, and you’re getting closer.) Then, a while after getting fired, he used a co-worker’s ssh key to login and destroy everything.
When they rebuilt the server cluster after the first breach, they had been super careful, constructed all the servers from scratch, avoided any possibility of installed backdoors… and then promptly dropped in the same set of ssh public keys as before. Because ssh keys are secure, right? If there’s one thing we know for sure about security, it’s that. It’s even in the name!
And they are secure, right up until they’re stolen. But the problem with static ssh keys is that if they are stolen, it’s undetectable. Many people avoid rotating their public keys for 10+ years. Someone might have stolen your private key 9 years ago, and is just waiting for the right moment to strike.
Key rotation is the simple answer to this problem. Like forced password changes (which are out of vogue now), rotating your ssh keys protects, “eventually,” against key theft, which is a lot better than no protection at all. Of course, it has some of the disadvantages of password rotation too: notably, if you have your ssh private key on several machines, you have to remember to copy it to all those places. And if you have your ssh public key on many machines—which is the whole point—now you have to correctly delete the old one and add the new one. It’s a lot of tedious, error-prone work, which is why people don’t like to do it.
An even more robust approach is to use some kind of hardware token that can sign short-lived ssh keys, and teach all your servers how to deal with those. That’s neat, but it’s hard to deploy (needs custom ssh settings). The deployment could be fixable—convince upstream ssh to do it in an easier way. But you still have the problem of the obnoxious hardware tokens, which then get into USB (which has too many connector types and mostly doesn’t work on a phone) or Bluetooth (which works maybe 97% of the time, which is simply not good enough for your emergency production login credentials).
An easier approach is to set up an ssh jumphost, where you can contain your ssh fanciness to a single server (and even if you just use normal ssh keys, you only have to rotate them in that one place). This works pretty well, but creates added latency (the jumphost isn’t always near the server you want to jump to; or if you have a jumphost in every region, now you have to maintain a bunch of jumphosts), and of course, means that you have to do everything through ssh; you can’t route traffic directly.
By the way, all these credential theft problems apply to more than just ssh: WireGuard and OpenVPN private keys can be stolen in the same ways, and are a liability unless you rotate them periodically.
The approach we take in our own infrastructure, modeled after TLS certificate infrastructure and Let’s Encrypt in particular, is to authenticate each device+person combination with a separate private key; that way, if a credential is stolen, we always know from where. Each device can rotate that device-specific key as often as it wants, since the private key only lives in one place, and the public keys are all distributed rapidly from our coordination server to every other device that needs them. Because of this highly consistent key rotation (which is hard with ssh keys or passwords), we always know exactly when a particular keypair has been retired, once and for all, so we can always detect if that keypair makes a surprise reappearance: it must mean credential theft, which means a breach, which can trigger forensics. After all that, we hide our ssh keys behind this network-layer key rotation, which means that unrotated ssh keys are mostly harmless; only an authorized person with non-stolen credentials can even try to exploit a stolen ssh key.
Anyway, back to our original question: how often should I rotate my ssh keys? More often than never! As often as you can. And make sure you retire old keys when you’re done with them.