9 Months of CertPulse: Metrics, Mistakes, and What 1,400 Monitored Certificates Taught Us About Real-World TLS

CertPulse has monitored 1,400 production TLS certificates across customer fleets for nine months, and this report includes the parts that don't flatter us. This is the build-in-public report we promised at launch. What follows is aggregate fleet data, 134 real expiry incidents, three engineering bugs we shipped, four product features that flopped, and the time we nearly let one of our own certificates expire. No funnel screenshots, no growth-hacking theater.

We're not bragging about 1,400 certificates, and this report explains why: we onboard deliberately because every new fleet breaks an assumption in our scanner.

Why We're Publishing This: The Build-in-Public Contract

CertPulse publishes these numbers because build-in-public posts worth reading report failures with the same precision as wins. This report is aggregate-only: no customer names, no domains, no certificate fingerprints, no per-account figures. Every statistic is rolled up across the full 1,400-certificate fleet so you can benchmark your own inventory without anyone getting doxxed.

The honest reason CertPulse monitors 1,400 certificates and not 14,000: we are nine months old and onboard deliberately. Every new fleet surfaces an edge case that breaks a scanner assumption, and we'd rather find those at 1,400 certificates than at 14,000. Most build-in-public content from devops SaaS founders skips the operational mess and jumps to MRR charts. The mess is the useful part.

What this report delivers:

Aggregate fleet statistics — certificate distribution you can compare against your own inventory
Root-cause data — the breakdown of 134 real expiry incidents, not a survey
Specific bugs — three shipped engineering mistakes with the actual fix and what each cost
Honest metrics — the devops SaaS metrics we kept, and the ones we stopped reporting because they only felt good

If a section doesn't leave you with something you can act on, we failed at writing it.

The Fleet: What 1,400 Real Certificates Actually Look Like

Across the monitored fleet, Let's Encrypt issues 58% of certificates, paid public CAs (DigiCert, Sectigo, GlobalSign) cover 31%, and private/internal CAs account for 11%. Roughly 64% of certificates sit on an automated renewal path; the other 36% still depend on a human remembering. That manual third is where almost every incident in the next section originates.

The full distribution across 1,400 certificates:

Dimension	Breakdown
Issuers	Let's Encrypt 58%, paid public CAs 31%, private/internal CAs 11%
Validity periods	Mostly 398-day certs; sub-90-day certs grew from 19% to 34% since the SC-081v3 lifetime reductions
Wildcard vs SAN	22% wildcard, 78% SAN or single-name
Automation	64% ACME or cloud-native auto-renewal, 36% manual

Then the long tail. After scanning 1,400 production certificates, we found nine self-signed certs running in production, all on internal services someone swore were "behind the VPN." One certificate carried 47 SANs — a single renewal failure away from taking down 47 hostnames at once. A handful of mismatched chains passed in Chrome and failed in curl. If your fleet looks cleaner than this, either you're smaller than you think or you haven't scanned the hosts you forgot about. The weird stuff is normal. Plan for it.

What Actually Broke: Expiry Incidents We Caught (and One We Didn't)

CertPulse flagged 134 near-miss expiries in nine months, defined as a certificate inside 14 days of expiry with no renewal in progress. Median lead time between alert and confirmed fix was six days. Six days is comfortable — and it's the gap that disappears once 47-day certificates make every renewal cycle tighter.

Root-cause distribution across 134 incidents:

Automation silently failing — 41%. The renewal job ran, exited zero, and never deployed the new certificate. This is the renew-but-don't-deploy gap, the single most common failure we see.
Ownership gaps — 34%. The certificate had no owner, or the owner left. The spreadsheet said "Bob."
DNS validation breakage — 18%. A CAA record change or a moved DNS zone quietly killed the ACME challenge.
Other — 7%.

Now the one CertPulse missed. A customer's internal service ran TLS on port 8443. Our scanner, at the time, only probed port 443 unless told otherwise. The certificate was never in our inventory, so it couldn't be flagged, and it expired. Nobody got paged because, as far as CertPulse knew, the certificate didn't exist. The detection gap wasn't in our alerting logic — it was in discovery. CertPulse has since made port ranges configurable per target and now defaults to scanning common alternate TLS ports. You can't alert on what you never found.

Engineering Mistakes We Shipped

CertPulse shipped three TLS scanner bugs worth naming because the fixes are reusable. All three produced wrong numbers while looking completely healthy — the worst kind. Wrong-and-loud gets fixed in a day. Wrong-and-quiet ships to customers.

The load-balancer double-count. The scan scheduler hit each resolved IP independently, so a certificate served from four load balancer nodes got counted four times. One customer's dashboard showed 312 certificates when they had 78. Fix: deduplicate by SHA-256 fingerprint plus SAN set before anything reaches the inventory count.
SNI multiplexing treated as separate certs. When several virtual hosts shared one IP, SNI certificate detection counted hostnames instead of distinct certificates. A box with one wildcard cert and 30 vhosts reported 30 certificates. Fix: same dedup, keyed on the certificate itself rather than the requesting hostname.
The retry loop that got us WAF-blocked. An over-aggressive retry path hammered failed connections with no backoff. A customer's WAF read it as a scan attack and blocked our scanner IP, silently creating scan gaps across their fleet for about 19 hours. Fix: exponential backoff, honor HTTP 429, cap retries at three, and alert internally when a target starts refusing us.

The load-balancer bug cost a support escalation and a fair "why don't your numbers match reality" email. The WAF block cost a customer most of a day of monitoring coverage. Both were our fault — and both are the kind of thing a vendor whitepaper will never tell you about.

Product Mistakes: The Features Nobody Used

CertPulse shipped four features that missed, and the usage data was unambiguous once we looked. SaaS feature adoption on these "nice to have" features ran close to zero.

The per-cert risk score. A 0–100 number blending expiry, key strength, and chain health. Fewer than 4% of users ever sorted or filtered by it. People wanted "is this going to break," not a credit score for a certificate. Removed.
The weekly email digest. Roughly 12% of recipients unsubscribed within two digests — and unsubscribing also cost us trust on the alert emails that actually mattered. Nobody wants a newsletter from their monitoring tool. Removed.
The six-option integrations page. 94% of configured integrations were Slack or PagerDuty. Webhooks took third. The other three options were build effort spent on nothing.
The alert threshold default. We shipped 30/14/7/1-day alerts on by default. Too noisy. Support tickets asked how to turn it down — exactly the alert fatigue failure mode we'd written a whole post warning other people about. The new default is tiered and quieter; the noisy preset is opt-in.

The lesson isn't "talk to users." It's that we had the right answer published on our own blog and still shipped the wrong default. Writing about a problem doesn't inoculate you against it.

Dogfooding: How We Almost Shipped Our Own Expired Cert

A certificate-monitoring company nearly let its own certificate expire. A CertPulse internal subdomain came within three days of lapsing because the expiry alert routed to a Slack channel, #cert-alerts, that nobody had opened in weeks. The alert fired exactly as designed — into a room with no one in it.

This is the part of dogfooding that actually teaches you something: the system worked and the outcome still nearly failed, because an alert delivered is not an alert received. CertPulse made two fixes:

Acknowledgement tracking. Every alert now carries an explicit ack state. An unacknowledged critical alert escalates to PagerDuty after a set window instead of sitting politely in Slack.
A routing audit. We mapped every alert destination to a channel or rotation with a named human responsible for it. Two of our own routes pointed at channels created during a long-dead project.

After auditing our own alert routing, the pattern was clear: alerts go to die in three places — muted channels, shared inboxes, and the rotation of someone who left. We had one of each. If you run any monitoring at all, audit where your alerts land before you trust that they land anywhere.

The SaaS Metrics That Matter (and the Ones We Stopped Tracking)

CertPulse tracks three operational SaaS metrics that change decisions: median scan time is 90 seconds (down from 47 minutes after a concurrency rewrite), infrastructure cost runs about $14 per 1,000 certificates monitored per month, and the false-positive rate fell from 8% at launch to 1.2% today.

The numbers we watch:

Infra cost per 1,000 certs: ~$14/month. Linear and boring, which is what you want from a cost structure.
False-positive rate: 1.2%. Every false positive spends a customer's trust, so this metric gates releases.
Monthly churn: 3.1%. Top stated reason is "we moved to cloud-native cert management." We'd rather know that than pretend it isn't happening.

The numbers CertPulse stopped reporting internally: total scans run (big, rising, decision-free), "certificates monitored" as a headline figure (the double-count bug showed how easily that lies), and blog page views. None ever changed what we built next. A metric that can only go up and never forces a choice is decoration. If you can't name the decision a metric informs, stop putting it on the dashboard.

What's Next and What We're Still Unsure About

CertPulse's roadmap for the next two quarters comes straight out of this data: SNI and load-balancer deduplication baked into discovery, private CA discovery so the 11% of internal certificates stop being a blind spot, and post-quantum readiness reporting as customers start asking which certificates use vulnerable algorithms.

What we genuinely don't have answers to:

Pricing as cert lifetimes shrink. Per-cert pricing made sense at 398-day validity. When renewals happen 8x more often, the cost of monitoring a certificate barely changes but the renewal-tracking work multiplies. We don't know the right model yet.
Whether to build renewal at all. Staying monitoring-only keeps CertPulse honest and vendor-neutral. Adding ACME renewal would close the loop on the 41% of incidents caused by silent automation failure — and would also make us a thing we can't impartially monitor.

We'd rather end with real open questions than a confident roadmap slide. If you run a fleet and have an opinion on either question, we want to hear it.

FAQ

How does CertPulse anonymize customer data in build-in-public posts? Every figure in CertPulse's build-in-public reporting is aggregated across the full 1,400-certificate fleet. CertPulse never publishes customer names, domains, certificate fingerprints, or per-account numbers. A distribution percentage cannot be traced back to an individual fleet.

What's the most common cause of certificate expiry incidents? According to CertPulse's data from 134 near-miss expiries, automation silently failing accounts for 41% of incidents. The renewal job runs, exits successfully, and the new certificate never deploys to the load balancer or CDN. Exit code zero is not proof of a deployed certificate.

How much does it cost to monitor 1,000 TLS certificates? CertPulse's infrastructure cost is roughly $14 per 1,000 certificates per month at current scale. Cost scales close to linearly with fleet size, since scanning is the dominant expense and parallelizes cleanly.

Why does CertPulse monitor only 1,400 certificates after nine months? Deliberate onboarding pace. Each new fleet surfaces an edge case that breaks a scanner assumption — TLS on non-standard ports, 47-SAN certificates. CertPulse would rather find those at 1,400 certificates than at 14,000.

Does CertPulse handle certificate renewal? No. CertPulse is a monitoring tool, not a renewal tool. Whether to add ACME renewal is an open question, since building it would compromise CertPulse's ability to impartially monitor renewal automation.

This is why we built CertPulse

CertPulse connects to your AWS, Azure, and GCP accounts, enumerates every certificate, monitors your external endpoints, and watches Certificate Transparency logs. One dashboard for every cert. Alerts when auto-renewal fails. Alerts when certs approach expiry. Alerts when someone issues a cert for your domain that you didn't request.

If you're looking for complete certificate visibility without maintaining scripts, we can get you there in about 5 minutes.

Start monitoring free See how it works

Back to blog