Security

Certificate Transparency logs aren't just for browsers — here's how to monitor them for your domains

March 21, 20269 min readCertPulse Engineering

Most engineers encounter Certificate Transparency logs indirectly — their browser checks SCTs during the TLS handshake, and if something's wrong, the connection fails. That's the browser enforcement story, and it's well-understood. But CT logs are also a public, append-only record of nearly every certificate issued by a publicly trusted CA. It's an intelligence feed. Most teams aren't watching it.

I've worked in environments where a developer spun up a staging environment on api-staging.example.com, got a Let's Encrypt cert for it, and nobody on the platform team knew it existed until a penetration tester found it six months later. That cert showed up in CT logs the day it was issued. We just weren't looking.

What CT logs actually tell you

Unauthorized issuance is the most obvious signal. If someone — an attacker, a rogue employee, a misconfigured automation — obtains a certificate for your domain from any publicly trusted CA, it shows up in CT logs. This is how you catch domain takeover and cert misissuance early. CAA records help prevent this, but they're advisory to CAs, not enforced by cryptographic proof. CT is your verification layer.

Typosquatting and phishing infrastructure. Attackers register domains like exarnple.com or example-login.com and get legitimate DV certs for them. To your users, these sites look credible — they've got the padlock. Monitoring CT logs for certificates that look suspiciously similar to your domains catches this infrastructure as it's being provisioned, often before the phishing campaign launches.

Shadow IT and forgotten assets. Certificates issued for subdomains you didn't know about (jenkins.corp.example.com, legacy-api.example.com, test.acquisitionco.com) map your actual attack surface. Every cert in a CT log represents a service someone thought was important enough to put TLS on. That's a useful inventory signal.

How CT works under the hood

CT is defined in RFC 6962 (and its successor, RFC 9162 for CT v2). The architecture is straightforward: when a CA issues a certificate, it submits it to one or more CT logs. Each log is an append-only Merkle tree — same data structure behind git commits and blockchain ledgers. The log operator returns a Signed Certificate Timestamp (SCT), which the CA embeds in the cert or delivers via OCSP stapling.

What matters: you can't quietly remove or modify entries. Auditors can verify this via consistency proofs. The logs are publicly readable — anyone can query any log with no authentication and no rate limiting on reads (within reason). And the system is distributed across multiple independent operators. Google runs Argon and Xenon (sharded by expiry year), Let's Encrypt runs Oak, Cloudflare runs Nimbus.

You have two approaches for monitoring.

Polling means hitting each log's REST API (/ct/v1/get-entries). You track your position in each log, fetch new entries in batches, and process them. Complete coverage of a specific log, but you have to track state across dozens of active logs, handle log sharding, and deal with the fact that some logs contain hundreds of millions of entries.

Streaming uses services like CertStream that aggregate entries from multiple logs and expose them as a WebSocket or SSE feed. Near-real-time firehose of all newly logged certificates. Simpler to consume, but you're trusting a third party for completeness and uptime.

Building a basic CT watcher

You have two practical paths. The streaming approach uses a service like CertStream, which aggregates entries from multiple logs and exposes them as a WebSocket feed. You connect, filter incoming certificates against your domain list, and alert on matches. You'll start seeing results within seconds — CertStream processes roughly 300-500 certificates per minute on a quiet day, spiking much higher during Let's Encrypt's bulk renewal windows.

The polling approach hits each log's REST API directly. You track your read position in each log, fetch new entries in batches, parse the Merkle leaf to extract the X.509 certificate, and check the SAN list against your domains. The full implementation needs tree head tracking, batch sizing to stay within rate limits, and persistent state so you don't re-process entries after a restart. That's for a single log. There are currently around 40+ active logs you'd need to cover.

Streaming is simpler to get running. Polling gives you complete control and doesn't depend on a third party. But "works on my laptop" and "works reliably in production" are different things.

The false positive problem

Naive substring matching is the first thing everyone tries and the first thing everyone regrets. Monitoring for "example" will match example-widgets.com, bad-example.org, myexample.net, and thousands of other unrelated domains. I've seen teams set up substring alerts and disable them within 48 hours because they were generating hundreds of notifications a day.

Here's what actually works.

Registered domain extraction. Before any matching, pull out the registered domain using a public suffix list. You want to compare exarnple.com against example.com, not login.cdn.exarnple.com against example.com. The publicsuffix package in Go or tldextract in Python handles this, including tricky TLDs like .co.uk and .com.au.

Levenshtein distance with a tight threshold catches most typosquats (exarnple, exmple, exampl3) without matching everything on the internet. Edit distance of 1-2 is the sweet spot. Distance 3 starts getting noisy. Start at 2 and only widen if you're missing things.

Homoglyph detection. Attackers use characters that look visually similar: exаmple.com with a Cyrillic 'а' instead of Latin 'a'. Levenshtein won't catch this because it's a single substitution, but a homoglyph table will. The confusables.txt dataset from Unicode is the canonical source. Normalize both strings through a confusable mapping before comparing.

Keyword-plus-TLD scoring. If your brand is "acme", then acme-login.com is more suspicious than acme-plumbing-supplies.com. Weighting matches that combine your brand with security-sensitive keywords (login, secure, verify, account, update) cuts noise significantly. Score the domain based on whether it contains your brand name, whether it includes security-related keywords, and whether it uses hyphenated brand combinations — a classic phishing pattern. Domains that genuinely target your users tend to score high on multiple signals at once.

Where DIY breaks down

Building a CT monitor that works on your laptop is a weekend project. Running one that's reliable enough to be a security control is a different beast.

CertStream's WebSocket connection drops. Your consumer process gets OOM-killed. The network blips. Every minute your consumer is down, certificates are being logged that you'll never see. You need reconnection logic with position tracking, which means falling back to polling the logs directly for the window you missed, which means you need the polling infrastructure anyway.

Log coverage is a real problem. Google alone operates multiple sharded logs (Argon, Xenon, split by year). Apple, Cloudflare, DigiCert, Sectigo each run their own. A certificate only needs to appear in logs that the subscribing browser trusts, so different CAs submit to different logs. If you're only watching Google's logs via CertStream, you might miss certificates logged exclusively elsewhere.

Volume adds up. CT logs collectively see north of 10 million new entries per day. Fuzzy matching with Levenshtein distance calculations against a list of 50 domains means 500 million string comparisons daily. Not impossible, but enough to matter for cost and architecture. Pre-filter on TLD, string length, or other cheap checks before hitting the expensive comparisons.

Logs also have a lifecycle. They use temporal shards (e.g., argon2025h1 covers certs expiring in the first half of 2025). New shards spin up, old ones freeze and become read-only. Your monitor needs to track which logs are currently active. The Chrome CT log list is the canonical source, and it changes.

And you need to persist your last-read tree position for each log. Lose that state and you either re-process millions of entries or accept a gap in coverage. Neither is great.

This is where most teams either accept the coverage gaps or decide the operational overhead isn't justified for a monitoring function. Tools like CertPulse handle the CT firehose as part of a broader certificate monitoring setup — watching CertStream for your configured domains with fuzzy matching and scoring already tuned, surfacing matches alongside your cloud provider inventory and endpoint scan results. Build or buy, the point is that somebody is watching.

Getting started

If nobody on your team is monitoring CT logs today, go to crt.sh. It's a free CT search engine run by Sectigo. Search for %.yourdomain.com and look at what's already been logged. You'll probably find certificates you didn't know about.

Then set up a basic CertStream consumer. Run it for a week. See what hits. This tells you your baseline noise level and whether fuzzy matching at distance 2 works for your specific domains.

Figure out what you're actually trying to detect. Unauthorized issuance of your exact domains is tight matching, low noise, straightforward. Catching phishing infrastructure is fuzzy matching, noisier, needs real tuning. Different problems, different engineering requirements.

Then decide whether to invest in reliability or hand it off. With SC-081 driving certificate lifetimes down to 47 days, the volume of legitimate certificate activity is about to jump — which means more noise to filter and more urgency to catch the signal. You need coverage you can count on. That means building the polling infrastructure with proper state management and log tracking, or using a service that's already solved those problems.

CT logs are one of the few truly public signals in the certificate ecosystem. The data is free, it's real-time, and it tells you things about your own infrastructure that you might not know yet.

This is why we built CertPulse

CertPulse connects to your AWS, Azure, and GCP accounts, enumerates every certificate, monitors your external endpoints, and watches Certificate Transparency logs. One dashboard for every cert. Alerts when auto-renewal fails. Alerts when certs approach expiry. Alerts when someone issues a cert for your domain that you didn't request.

If you're looking for complete certificate visibility without maintaining scripts, we can get you there in about 5 minutes.