When a CA gets distrusted: the mass reissuance runbook nobody has

The threat is real, and it keeps happening

Most teams treat their certificate authority like the electrical grid. Invisible until it fails, and quietly assumed to never fail. Then a root program pulls the plug.

This has already happened, more than once. In 2017 Google moved to distrust Symantec's entire CA operation after years of misissuance — and Symantec wasn't some bit player, it was one of the biggest commercial CAs on the internet. Hundreds of thousands of certificates had to be reissued off DigiCert's infrastructure before Chrome dropped trust. Then 2024: Mozilla and Google both went after Entrust over a long pattern of compliance failures and incident reports nobody found convincing. The Chrome Root Store stopped trusting Entrust TLS certs issued after a cutoff around November 2024. Any Entrust cert with a notBefore past that line throws errors in Chrome — even though the cert is perfectly valid by its own dates.

That last bit is what trips people up. A distrusted cert doesn't expire. It isn't revoked in the usual sense. It just stops being trusted by a browser or OS that carries a huge chunk of your traffic, on a date someone else picked for you.

And the grace periods keep shrinking. The Symantec wind-down stretched over a year of phased deprecation. Don't count on getting that again. Root programs have gotten faster and a lot less patient, and the whole industry is leaning the same direction: shorter lifetimes, shorter reaction windows, shorter everything.

Two failure modes, two clocks

Before you write a single line of runbook, split the two ways a CA can wreck your week. They run on completely different clocks.

Gradual root-program removal. A browser or OS vendor announces they're distrusting a CA, usually with a future cutoff tied to issuance dates. The Entrust pattern. You get weeks to months of warning, and the trigger is "certs issued after date X." Your existing certs may keep working until they expire — the pain is that you can't get new trusted ones from that CA. This is a planning problem.

Emergency Baseline Requirements revocation. Here the CA itself is forced to revoke certificates under the CA/Browser Forum Baseline Requirements. The timelines are nasty: 24 hours for key compromise and a handful of serious cases, 5 days for most other misissuance. When a CA hits one of these, they revoke your cert whether you're ready or not. Maybe you get an email. Maybe you get a few days. This is an incident, not a project.

If you only prepare for one, prepare for the second. The first gives you time to think. The second gives you a weekend if you're lucky, and you usually find out because someone's already paging you.

Inventory under fire

The first question in any distrust event is dead simple, and for most teams unanswerable in the moment: which of our certs chain to this CA?

If your plan is to SSH into boxes and run openssl by hand, you've already lost. A 24-hour revocation window cannot spend 18 hours on discovery.

What you need is the ability to filter your entire certificate estate by issuer. That means three fields, indexed and queryable across every cert you hold:

Issuer Organization (O) and Common Name (CN). "Entrust," "DigiCert," the intermediate's name. Your coarse filter.
Authority Key Identifier (AKI). The precise pointer to the issuing key. O and CN go ambiguous fast when a CA runs many intermediates or rebrands; the AKI is what actually ties your leaf to a specific issuing intermediate. When a root program says "certs chaining through this intermediate," the AKI is how you match it exactly.
notBefore date. Distrust cutoffs are date-keyed. "Entrust certs issued after November 2024" is an issuer-plus-date query, not an issuer query.

Now the annoying part. You need all of this across providers that share no schema. ACM exposes issuer and renewal metadata one way. Azure Key Vault models certificates as their own object type with issuer references. GCP Certificate Manager has its own resource model entirely. And then there's everything outside the cloud consoles — load balancers, legacy appliances, vendor-managed hosts — where the only way to learn the issuer is to connect and read the chain it actually presents.

If you don't already have a unified inventory with the issuer field indexed, a distrust event is the worst imaginable time to build one. You'll burn your revocation window writing discovery scripts instead of swapping certs. The teams that come through these clean are the ones who could already answer "show me every cert issued by X, sorted by notBefore" before the announcement landed. You do the discovery work in the boring times or you don't do it at all.

The reissuance mechanics

Say you've found the affected certs. Forty of them. Four hundred. Now you have to move them to a different CA, fast. This is where the gaps live, and there are four.

Your fallback CA has to already be validated. You cannot stand up a fresh ACME account with a new CA and validate every domain under a deadline. Domain validation takes time and DNS propagation does not care about your incident. The second CA needs accounts created and domains authorized ahead of time, so "switch issuers" is an API call rather than an onboarding flow.

Your CAA records have to permit the fallback issuer. This is the one that bites people. If your DNS CAA records only list your primary CA, the fallback gets refused at issuance — correctly, because that's the entire point of CAA. Both your primary and your fallback should already be authorized. Same story for ACME account binding: if your fallback needs external account binding, provision and store those EAB credentials before the incident, not during it.

Rate limits will throttle a mass reissuance. ACME endpoints enforce them. Let's Encrypt publishes theirs — caps on certificates per registered domain, new orders per account, over rolling windows. Fire 400 reissues in a tight loop and you'll hit a wall partway through, with the rest failing on rate-limit errors. So batch it. Spread issuance across the window, go public-facing and high-traffic first, and know your CA's specific limits before you start instead of discovering them at cert 200 of 400.

Then the deployment gap, which is the one that turns a recoverable incident into an actual outage. A cert gets reissued, gets stored, and never reaches the thing terminating TLS. Issuance is not deployment. ACM handles a lot of this when the cert is attached to a managed resource — but the moment a cert is sitting in Key Vault or some secrets store that a load balancer or ingress controller pulls on its own schedule, you've got a propagation step that can fail silently. Reissuing feels like done. The user staring at the old cert on the load balancer disagrees. Verify at the endpoint, not at the CA.

CA agility is the actual lesson

Every distrust postmortem lands in the same place. The teams that suffered were the ones for whom changing CAs was a project. The teams that shrugged it off had built so that changing CAs was a config flip.

That's the whole goal. "Change issuer" should be one line in a config, applied, rolled out on its own. On Kubernetes, cert-manager hands you this for free: define multiple ClusterIssuers and moving a workload from one CA to another is an annotation change, with reissuance and secret rollout handled for you. The architectural sin is the opposite — hardcoding full chains in app config, pinning specific intermediates in client code, baking a CA's intermediate into a Docker image. Every one of those turns a CA swap into a redeploy of things that should have nothing to do with your CA.

Don't pin to specific intermediates unless you have a real, threat-modeled reason and an automated way to rotate the pins. Pin a chain that later gets distrusted and you break yourself at the exact moment you most need room to move.

Here's the part nobody expects to like. The CA/Browser Forum's SC-081v3 ballot is dragging TLS lifetimes down: 200 days is already in effect as of March 2026, 100 days lands March 2027, 47 days arrives March 2029. Everyone reads this as more renewal volume, and sure, it is. But shorter lifetimes also make you more distrust-resilient. When your whole fleet rolls over every 47 days regardless, an "issued after date X" distrust barely registers — you're already reissuing constantly, and the affected population ages out in weeks instead of hanging around for a year. The automation you're forced to build for 47-day certs is the same automation that makes a CA swap a non-event. Build it once, get both.

Pre-write the runbook now

You will not design a good response while the building is on fire. Write the runbook today, while nothing is wrong.

Start with detection. Subscribe to the channels that announce distrust before it reaches you — the Mozilla dev-security-policy list, the Chrome Root Program announcements, your CA's own incident feeds. Then go past announcements and watch Certificate Transparency logs for the issuers in your estate. CT is where misissuance patterns first surface, so monitoring the CAs you actually depend on gives you a leading indicator that one is heading for trouble.

Then the decision tree. First branch: scheduled root-program removal with a future cutoff, or emergency BR revocation? Scheduled means plan the migration, prioritize by expiry, move deliberately. Emergency means trigger the mass-reissuance batch right now, public-facing first, verifying at endpoints as you go. Second branch: do we have a pre-validated fallback CA with CAA already permitting it? If yes, execute. If no, that's the gap you close this week — not during the incident.

And the flag-it-instantly piece. The moment a distrust lands, you want a saved query that spits back every affected cert. This is exactly where CertPulse's issuer-level inventory earns its keep: every cert across ACM, Key Vault, GCP, and your external endpoints in one place, filterable by issuer, with CT log monitoring already watching your domains. When the announcement drops, "which of our certs are affected" is a filter, not a fire drill. That's the line between a quiet afternoon of batched reissuance and a 2am outage you never saw coming.

The CA you trust today can be distrusted tomorrow by someone you've never met, on a timeline you don't control. You can't stop that. You can make sure it costs you an afternoon instead of an outage.

This is why we built CertPulse

CertPulse connects to your AWS, Azure, and GCP accounts, enumerates every certificate, monitors your external endpoints, and watches Certificate Transparency logs. One dashboard for every cert. Alerts when auto-renewal fails. Alerts when certs approach expiry. Alerts when someone issues a cert for your domain that you didn't request.

If you're looking for complete certificate visibility without maintaining scripts, we can get you there in about 5 minutes.

Start monitoring free See how it works

Back to blog