Security

OCSP stapling is probably broken on half your endpoints

April 2, 202611 min readCertPulse Engineering

If you asked me to bet on it, I'd say fewer than half the endpoints in your infrastructure are returning a valid OCSP stapled response right now. Not because you didn't configure it. Because OCSP stapling fails silently, and almost nobody monitors it.

This isn't theoretical. OCSP stapling determines whether your visitors' browsers can efficiently verify that your certificate hasn't been revoked. When it breaks—and it breaks constantly—you're pushing revocation checking back onto the client, where it's slow, unreliable, and increasingly just skipped entirely. Your certificate is technically valid. Your TLS handshake completes. Everything looks green. Revocation checking is quietly not happening.

OCSP stapling is a privacy and reliability control, not just a performance trick

Most teams think of OCSP stapling as an optimization. Shave a round trip off the TLS handshake, cool, nice to have. That framing massively undersells the problem.

Without stapling, the browser has to contact the CA's OCSP responder directly to check whether your certificate has been revoked. That means the CA sees every visitor's IP address and which of your domains they're hitting. Every single page load. It also means your site's availability now depends on a third-party service you don't control. If the CA's OCSP responder is slow or down, your visitors eat that latency—or, more commonly, the browser gives up and accepts the certificate without checking. This is called "soft-fail," and it's the default behavior in every major browser except Firefox in certain configurations.

So without working OCSP stapling, you get the worst of both worlds: your visitors' browsing habits leak to the CA when the responder is reachable, and revocation checks silently don't happen when it's not. The certificate revocation system that's supposed to protect your users from compromised certificates? Functionally decorative.

When stapling works, your server fetches the OCSP response, caches it, and includes it in the TLS handshake. The browser gets proof of non-revocation without contacting anyone. No privacy leak, no third-party dependency, no soft-fail. That's how this is supposed to work. The problem is keeping it working.

The ways stapling breaks, and why you won't notice

I've seen OCSP stapling fail in environments that had it correctly configured for years. The failure modes are quiet and varied, and they all share one trait: the TLS handshake still succeeds.

Stale cached responses are the most common culprit. OCSP responses have a validity period—typically somewhere between a few hours and seven days depending on the CA. Your web server fetches a response, caches it, staples it to handshakes. When the cached response expires, the server needs to fetch a fresh one. If that fetch fails because of a network blip, DNS resolution issue, or responder timeout, many servers silently stop stapling rather than serve a stale response. Nginx is particularly bad here. It fetches the OCSP response lazily on the first request after startup, and if that initial fetch fails, it just serves handshakes without any stapled response until the next attempt. There's no error in the default log level. The handshake works fine. You just lost stapling.

Firewall rules blocking outbound OCSP are another quiet killer. Your server needs to make outbound HTTP requests to the CA's OCSP responder URL, embedded in the certificate's Authority Information Access extension. In hardened environments, outbound traffic from web servers is often restricted to known endpoints. I've seen teams lock down egress and not realize they've blocked OCSP responder URLs. Certificate works. Stapling doesn't. Nobody notices for months, if ever.

Intermediate certificate chain issues trip people up too. For stapling to work, your server typically needs the full certificate chain available, including the intermediate. Some configurations serve the leaf certificate fine for the TLS handshake but don't have the chain set up in a way that lets the stapling mechanism resolve the issuer. This bites people especially hard after certificate rotations where the intermediate changes and the stapling config still references a stale chain file.

Configuration that looks right but isn't. In nginx, enabling stapling requires both the directive to turn it on and a resolver directive so nginx can look up the OCSP responder hostname. Miss the resolver, and nginx can't resolve the responder URL, so stapling silently fails. In Apache, the stapling cache needs adequate size and a reasonable timeout. The defaults fall over on servers handling many virtual hosts with different certificates. The config syntax is correct. The module is loaded. The feature just isn't functioning.

Restart and reload timing can bite you too. Some servers only fetch the OCSP response at startup or on configuration reload, not on a background timer. If the OCSP responder is unreachable at that exact moment—maybe you're deploying at 3am and the responder is having its own maintenance window—stapling won't activate, and it may not retry until the next restart.

Every one of these scenarios produces a valid TLS connection with no stapled OCSP response. Your uptime monitor says everything is fine. Your certificate isn't expired. But certificate revocation checking for your visitors has silently degraded.

Detecting broken stapling across your fleet

Checking a single endpoint once is easy. Open a connection, look at the handshake, see if there's an OCSP response. The hard part is doing this continuously across every endpoint you operate and actually alerting when stapling drops.

A healthy endpoint returns an OCSP response with a "good" status, a validity window that extends into the future, and a response that chains back to the correct issuer. An unhealthy endpoint either returns no OCSP response at all (the server isn't stapling) or returns one that's expired or otherwise broken.

Here's the distinction most monitoring misses: "certificate is valid" and "OCSP stapling is healthy" are independent conditions. A certificate can be perfectly valid with months until expiry while OCSP stapling has been broken for weeks. If your monitoring only checks certificate expiration and basic connectivity, you have a blind spot.

A single check after deploying a new certificate isn't enough either. Stapling can break days or weeks later when a cached OCSP response expires and the refresh fails. It can break after a firewall rule change that has nothing to do with TLS. It can break after a server restart that happens to coincide with an OCSP responder outage. You need continuous monitoring, not point-in-time verification.

What you want is a system that probes your endpoints on a regular schedule, checks whether the stapled OCSP response is present and valid (not just whether the TLS handshake succeeds), and alerts when stapling degrades. This is the gap between certificate lifecycle management and basic uptime checks. CertPulse's endpoint monitoring captures the full TLS handshake details on every scan, including OCSP stapling status—which is how we've seen just how widespread these silent failures are across real infrastructure.

For fleet-wide visibility, track stapling status as a distinct health signal, separate from certificate validity and endpoint reachability. When stapling breaks on one endpoint behind a load balancer but not others, you want to catch the inconsistency before a visitor does.

Must-Staple: the promise and the production incidents

The OCSP Must-Staple extension (RFC 7633) was supposed to fix soft-fail. If a certificate includes the Must-Staple flag, the browser should hard-fail the connection when no valid stapled response is present. No more silently skipping revocation checks.

In theory, exactly what you want. In practice, Must-Staple has caused enough production outages that adoption effectively stalled.

The failure chain: you issue a certificate with Must-Staple. Stapling works fine. Six weeks later, your OCSP response cache expires and the CA's responder has a brief outage. Your server can't refresh the stapled response. Now every browser that honors Must-Staple (inconsistent across browsers and versions) hard-fails the connection. Your site is down because a third-party OCSP responder you don't control had a bad hour.

Let's Encrypt supported Must-Staple for years and watched this exact pattern play out. The extension creates a hard dependency on OCSP infrastructure availability, turning the CA's responder into a single point of failure for your production environment. Most teams that tried Must-Staple in production either ripped it out after an incident or decided the operational risk wasn't worth it.

Browser vendors noticed. Chrome never enforced Must-Staple, treating it as informational. Firefox had the most complete implementation but still caught user complaints from broken sites. The extension exists in the spec, but the ecosystem effectively decided it was too dangerous to enforce.

The takeaway: if your OCSP stapling infrastructure isn't rock-solid—and we've just established that it usually isn't—Must-Staple doesn't improve security. It converts a silent monitoring gap into a production outage. Fix the monitoring gap first.

Where revocation is heading as certificates get shorter

The CA/Browser Forum's ballot SC-081v3 is compressing TLS certificate lifetimes from 398 days down to 47 days by March 2029, with intermediate steps at 200 days (already active as of March 2026) and 100 days hitting in March 2027. This changes the calculus on revocation checking in ways the industry is still working through.

The core question: does OCSP even matter when a certificate lives for 47 days?

Think about the timeline of a revocation event. You discover a key compromise, revoke the certificate, the revocation information propagates through OCSP responders and CRL distribution points, browsers eventually learn about it. With year-long certificates, that propagation window is a tiny fraction of the certificate's remaining lifetime, so revocation checking provides real protection. With 47-day certificates, the math shifts. If propagation takes a day or two and the certificate only has a few weeks left anyway, the window of protection shrinks proportionally. Still not zero—a compromised certificate is dangerous for every hour it's trusted—but the risk profile is different.

Browser vendors are already placing their bets. Mozilla has been building CRLite, a system that compresses the entire set of revoked certificates into a compact filter that ships with Firefox updates. Instead of checking revocation per-certificate at connection time, Firefox downloads a complete revocation dataset periodically and checks locally. No OCSP requests, no privacy leaks, no soft-fail. The tradeoff is freshness: the filter updates on a schedule, not in real time. Mozilla argues this is acceptable given how rarely OCSP was actually checked in real time anyway.

Apple took a similar path with their own CRL aggregation, pushing revocation data to devices rather than having devices pull it per-connection. Chrome has been openly skeptical of OCSP for years—calling it a privacy and performance cost with minimal real-world security benefit—and has never performed online OCSP checks by default.

The direction is clear: the industry is moving away from per-connection OCSP and toward pushed revocation data and shorter certificate lifetimes as the primary defenses against key compromise. OCSP isn't disappearing overnight. It's still in every certificate, and stapling still helps with handshake performance and the minority of clients that do check. But it's becoming a secondary mechanism.

What this means for your monitoring strategy

If I were rethinking certificate monitoring today, here's where I'd put my attention.

Monitor stapling as a distinct signal now. Even as the industry transitions, OCSP stapling failures today point to operational problems—broken egress rules, stale configurations, misconfigured chains—that affect your TLS posture beyond just revocation. A broken stapling setup is a canary for other certificate management problems.

Skip Must-Staple. Unless you have extremely mature OCSP monitoring and redundant stapling infrastructure, the operational risk outweighs the security benefit. Browser enforcement remains inconsistent anyway.

Prepare for renewal velocity, not just renewal automation. As certificate lifetimes shrink, every certificate-adjacent operation happens more often—including OCSP response refresh cycles. Shorter certificates mean your stapling cache turns over faster, your automation runs more frequently, and the blast radius of a broken renewal pipeline grows. The teams that struggle won't be the ones who can't automate renewal. They'll be the ones who can't see when renewal—or its downstream effects, like stapling—silently broke across hundreds of endpoints.

Track the full handshake, not just the expiry date. Certificate validity is the bare minimum. What you actually need is visibility into TLS handshake details across your entire estate: protocol versions, cipher suites, chain completeness, and yes, OCSP stapling status. CertPulse was built around this gap—the space between "certificate exists and isn't expired" and "TLS is actually healthy" is where real operational risk lives.

Watch the browser vendor roadmaps. CRLite, Apple's CRL push, Chrome's continued non-participation in online OCSP—these are signals about where revocation checking is heading. Your monitoring should evolve with these changes rather than assume today's OCSP infrastructure remains the primary mechanism forever.

OCSP stapling is one of those things that works perfectly in a conference talk demo and breaks in production in ways nobody notices for months. If you haven't checked yours recently, go look. I'll bet you find at least one surprise.

This is why we built CertPulse

CertPulse connects to your AWS, Azure, and GCP accounts, enumerates every certificate, monitors your external endpoints, and watches Certificate Transparency logs. One dashboard for every cert. Alerts when auto-renewal fails. Alerts when certs approach expiry. Alerts when someone issues a cert for your domain that you didn't request.

If you're looking for complete certificate visibility without maintaining scripts, we can get you there in about 5 minutes.

OCSP stapling is probably broken on half your endpoints | CertPulse