Operations

What happens when your certificate renews but doesn't deploy

March 25, 20267 min readCertPulse Engineering

Your certificate renewed successfully. The automation worked. Certbot exited 0, ACM shows a fresh cert with a not-after date comfortably in the future, cert-manager's Certificate resource says Ready. Everything looks green.

Your users are staring at a browser warning.

This failure mode doesn't show up in your renewal dashboards. It doesn't trigger your expiry alerts. The certificate renewed, but it never made it to the thing actually terminating TLS. That gap is where some of the most frustrating outages I've debugged have started.

Renewal and deployment are two separate operations

Most certificate automation treats these as a single atomic operation. They're not.

Renewal is obtaining a new certificate from your CA. Deployment is putting that certificate in front of traffic. Between those two steps, a dozen things can go wrong, and most of them are silent.

Certbot with a cron job handles renewal. ACM auto-renew handles renewal. Cert-manager reissuing before expiry handles renewal. But the certificate sitting in a file on disk, in an ACM ARN, or in a Kubernetes Secret is not the certificate your users see. Your users see whichever cert is loaded into the process terminating their TLS connection. If that process doesn't pick up the new cert, renewal was a no-op.

The dangerous part: every monitoring system pointed at the certificate store will tell you everything is fine. New cert exists. Valid. Months of runway. Meanwhile the load balancer or reverse proxy is still serving the old one, and that clock is ticking toward zero.

The ways this actually breaks

I've stopped being surprised by new variations of this. Here are the ones I keep running into.

ACM renewed, but the ALB never picked it up

ACM handles renewal automatically for certificates it issued. That part works. But ACM renewal and ALB listener association are separate concerns. If someone recreated a listener, swapped target groups, or deployed infrastructure through a pipeline that re-provisions the listener without referencing the current ACM ARN, you end up with a listener pointing at a stale certificate. ACM dutifully renewed. The ALB isn't using it. The ACM console shows valid. The ALB serves expired.

I traced one of these back to a Terraform apply where the listener configuration had drifted from state. Nobody caught it because the ACM dashboard looked healthy.

Certbot renewed, nginx never reloaded

The classic. Certbot's renew command runs on cron, obtains new PEM files, writes them to disk, exits. Nginx is still holding the old certificate in memory. Unless you've configured a deploy hook that reloads after successful renewal, the new cert sits on the filesystem doing nothing. Renewal logs say success. Nginx keeps serving the old cert until someone manually reloads or the old cert expires and users start seeing errors.

I walked into one environment where the deploy hook had been configured correctly for months. Then someone reorganized the cron jobs and the hook silently stopped firing. Certs kept renewing. Logs stayed clean. Nobody knew the reload wasn't happening until a 2am page.

Cert-manager issued the cert, but the ingress controller served stale state

Kubernetes adds its own flavor. Cert-manager issues a new certificate and updates the Secret. The ingress controller—nginx-ingress, Traefik, whatever—has its own cache and its own reload cycle. If it doesn't watch the Secret for changes, or if there's a race between the Secret update and the controller's sync loop, the new cert exists in the cluster but isn't being served.

I've seen this persist indefinitely when an ingress controller pod restarted without the correct RBAC to read the updated Secret and fell back to cached state. Certificate resource: Ready. Actual TLS handshake: not ready.

CDN edge caches holding the old certificate

If you terminate TLS at a CDN—Cloudflare, CloudFront, Fastly—there's another layer of indirection. You upload or associate a new certificate, but edge nodes worldwide are still serving the cached old one. Propagation delays vary, and there's always a window. Usually this resolves itself. But I've seen configurations where a custom certificate was pinned at the distribution level and the "renewal" happened in the certificate store without the distribution being updated to reference it. Origin had the new cert. Edge didn't.

Why expiry monitoring misses this

Most certificate monitoring checks the certificate object in your CA or store. ACM says 90 days left. Your vault shows a fresh cert. Dashboard is green. None of that tells you what certificate is actually being presented during the TLS handshake.

The certificate in ACM and the certificate served by your ALB can be different objects. The certificate on disk and the certificate in nginx's memory can be different. Expiry monitoring that checks the source of truth but not the endpoint will miss every failure mode I just described.

No error was thrown. No process crashed. The renewal succeeded. The deployment didn't happen, and nothing noticed. When someone reports that their certificate isn't updating on their load balancer, nine times out of ten the cert renewed fine. The last mile broke.

The only reliable way to know what your users see: probe the endpoint. Connect to the hostname and port your users connect to, complete the TLS handshake, inspect the certificate that comes back. Compare its serial number or fingerprint against what you expect. If they don't match, something in the deployment pipeline is broken, and you need to know before your users tell you through a support ticket.

Building a verification loop that catches this

The fix is simple in concept: after any renewal event, probe the live endpoint and verify the new certificate is actually being served.

Probe from outside your infrastructure. Connecting from inside your VPC might bypass CDN layers, skip edge caches, or hit a different listener than your users do. Resolve the public DNS name, connect to the public endpoint, complete the full handshake. Mimic a real client.

You need something to compare against. When a renewal happens, record the expected serial number or not-after date. When the probe returns, compare. If the served cert's serial doesn't match the new one after a reasonable propagation window, fire the alert. That window depends on your stack—an nginx reload takes effect in seconds, CDN propagation might take hours.

Don't just probe once. Certificates can regress. A deployment, a rollback, a config change can swap a fresh cert back to an old one. Continuous probing catches regression, not just initial deployment failures.

This is the distinction between inventory monitoring and endpoint monitoring. Inventory monitoring tells you what certificates you have. Endpoint monitoring tells you what certificates your users see. You need both. If you're only doing one, endpoint monitoring is the one that prevents outages.

Shorter lifetimes make this a recurring problem

The CA/Browser Forum's SC-081v3 ballot is ratcheting down maximum TLS certificate lifetimes. The 200-day phase started this month. 100-day hits March 2027. By March 2029, maximum lifetime drops to 47 days with 10-day DCV reuse windows.

Think about what this does to deployment failures. With 398-day certificates, a broken deployment pipeline could go unnoticed for months—the old cert still had plenty of life. With 47-day certs, that same broken pipeline gives you at most 47 days before users see an error. In practice much less, because renewal happens before expiry, so the old cert might only have days left when the new one is issued.

If you're renewing roughly every 30 days to stay ahead of a 47-day expiry, and your deployment pipeline fails silently even 5% of the time, that's a failure roughly every 20 cycles. Not a rare edge case. A recurring operational problem.

This is why endpoint monitoring stops being optional. When renewal volume goes up by an order of magnitude, every silent failure mode in your pipeline will eventually fire. You can't wait for user reports.

What to do about it

Treat renewal and deployment as two separate things that need two separate verification steps. Monitor the certificate your endpoint actually serves, not the one in your CA or store. Probe from outside, compare what you get against what you expect, alert on divergence. Tools like CertPulse are built around exactly this—scanning live endpoints on a schedule and comparing what's served against your certificate inventory so you catch the gap before your users do.

The shorter lifetimes under SC-081v3 are going to expose every latent deployment bug in your pipeline. Teams with endpoint monitoring will catch these in minutes. Teams relying on expiry dashboards alone will catch them when the on-call phone rings.

This is why we built CertPulse

CertPulse connects to your AWS, Azure, and GCP accounts, enumerates every certificate, monitors your external endpoints, and watches Certificate Transparency logs. One dashboard for every cert. Alerts when auto-renewal fails. Alerts when certs approach expiry. Alerts when someone issues a cert for your domain that you didn't request.

If you're looking for complete certificate visibility without maintaining scripts, we can get you there in about 5 minutes.

What happens when your certificate renews but doesn't deploy | CertPulse