Security

Hunting Threats in Certificate Transparency Logs: Catching Phishing Domains, Rogue Certs, and Brand Impersonation Before They Land

May 18, 202612 min readCertPulse Engineering

Certificate Transparency is best understood not as a compliance tool but as a free, real-time threat intelligence feed. Most teams file CT under compliance: it's what browsers check so a CA can't quietly mis-issue a cert for your domain, and what your security team queries once a year to find forgotten certificates. That framing is half the picture. Certificate transparency monitoring is also one of the cheapest threat intelligence feeds available, because every certificate an attacker provisions for a phishing site targeting your brand lands in a public, append-only log — usually hours before the campaign goes live. This post is about reading that feed the way an attacker would hate you reading it.

CT Logs Are a Free Threat Intel Feed You're Probably Not Reading

CT logs are a real-time record of nearly every publicly trusted TLS certificate issued on the internet, and they give you advance warning of brand impersonation for free. Industry data indicates tens of millions of entries are logged per day across the major CT log operators. If an attacker registers a lookalike domain and provisions TLS for it — and almost all of them do — the certificate's existence becomes public the moment it's logged.

Most teams that touch CT use it inward, querying crt.sh to find shadow IT or forgotten certs nobody renewed. That's the inventory use case. The recon use case is different:

  • Inventory question: "What certs exist for domains I own?" This is housekeeping.
  • Recon question: "What certs exist for domains that look like mine, or claim to be mine?" This is an early-warning system.

Timing is what makes the recon use case valuable. A phishing operator's sequence is predictable: register the domain, point DNS, request a Let's Encrypt cert over ACME, stand up the cloned login page, then send the lure emails. The certificate request happens before the emails go out — often by hours, sometimes by days. That gap is your detection window. CT log threat intelligence works because attackers need valid TLS to look legitimate, and valid TLS means a public log entry they cannot opt out of.

Takeaway: if you have a brand worth impersonating and you're not watching CT logs for lookalikes, you're ignoring a free feed attackers cannot suppress.

The Three Attacks CT Logs Actually Reveal

CT monitoring surfaces three distinct threats, and conflating them is the most common mistake in writeups on this topic. Two mean "someone is pretending to be you" and one means "someone issued a certificate that authorizes itself as you." These are different incidents with different urgency and different runbooks.

Threat What the log entry shows Severity
Lookalike phishing domain Brand-new domain you don't own, with a SAN containing a string close to your brand, almost always issued by Let's Encrypt or another free CA Common, low urgency
Unauthorized cert for a domain you control Your real domain in the SAN, issued by a CA you never use, on a date you didn't request anything Rare, active incident
Subdomain takeover prep A legitimate subdomain of yours getting a fresh cert, frequently right after you decommissioned the service it pointed to Rare, active incident

Lookalike phishing domains include typosquats (paypa1.com), combosquats (paypal-security-team.com), homoglyph swaps using Unicode confusables, and TLD swaps like .com to .app or .io. This class is the bulk of typosquatting detection volume.

Unauthorized certs reach attackers through a compromised registrar, DNS hijack, or CA mis-issuance — this is unauthorized certificate detection, and it is an active incident, not a phishing nuisance. Subdomain takeover prep starts when an attacker finds a dangling CNAME pointing at a deprovisioned cloud resource, claims that resource, and provisions a cert for the subdomain.

The volume skew matters. Research shows roughly 80% of phishing pages now serve over HTTPS, so lookalike domains will dominate your alerts. Unauthorized issuance against your own domains is rare but severe. Build your pipeline so the rare-and-severe case is never buried under the common-and-annoying one.

Building a Certificate Transparency Monitoring Pipeline: certstream, the Static API, and Why Polling Beats Streaming

Polling the CT static-tile API directly beats consuming a streaming firehose for any monitoring you need to be reliable. A practical pipeline has three stages: ingest CT entries, normalize the SANs into a clean candidate list, and run those candidates through a matching engine. The ingest stage is the hard architectural decision.

Your three ingest options:

  • certstream-style live feed. A WebSocket firehose aggregating entries from many logs. Easy to prototype against in ten lines of Python. The problem: the public certstream server and naive consumers drop entries under load. When the feed bursts to several hundred entries per second, a slow consumer falls behind and the server skips ahead. You get no error — you get silent gaps, the worst failure mode for a security control.
  • The CT static-tile API (RFC 6962 logs and the newer static-CT logs). You poll each log's signed tree head, then fetch entry ranges as tiles. More code and more state to track, but you control the pace and can prove you've seen every entry between two tree sizes. No silent drops.
  • crt.sh queries. Excellent for ad-hoc investigation and backfill. Not built to be your primary real-time pipeline; crt.sh is a shared service and it will rate-limit you.

Polling the static API directly is the design that survives contact with production. You tail each log independently, checkpoint the tree size you've consumed, and dedupe across logs, because the same precertificate and final certificate get submitted to multiple logs. Expect heavy duplication: a single issuance can produce four or more log entries. The certificates themselves are small, but if you keep full history for correlation, plan for tens of gigabytes per month. Most teams keep 90 days hot and discard the rest.

Writing Match Rules That Don't Drown You in False Positives

Score and rank candidates — never alert on a binary substring match. Pure substring matching ("does the SAN contain acme?") fails immediately: you'll page yourself about acmecorp-unrelated-startup.io and every other company that shares a token with you. The matching logic, not the ingest, is the actual engineering problem here.

A scoring engine that holds up in practice combines four signals:

  • Levenshtein / edit distance against your brand terms, weighted so a one-character edit on a short brand scores high and a three-edit distance on a long string scores low.
  • Homoglyph and confusable normalization. Map Unicode confusables to a canonical form before comparing, and decode punycode. An xn-- domain that normalizes to your brand with a Cyrillic а swapped in is a near-certain hit. Raw punycode IDN detection without normalization misses these entirely.
  • Pre-generated permutation lists. Run a tool like dnstwist against your brand domains ahead of time to enumerate typos, bitsquats, TLD swaps, and combosquat patterns. A six-character brand can generate well over 2,000 permutations, so this is a fast lookup, not a live computation.
  • Keyword-in-SAN rules for combosquatting, scored lower because yourbrand-login and yourbrand-blog both match but only one is hostile.

The tradeoff is unavoidable: tight rules miss combosquatting and creative homoglyph attacks; loose rules wake you up about unrelated companies. After building CertPulse's expiry alerting, we learned a lesson that applies directly here — an alert nobody trusts is worse than no alert. Route matches by confidence:

  • High-confidence → page a responder.
  • Medium-confidence → daily digest a human triages.
  • Low-confidence → searchable log.

Binary alerting on CT matches is how this capability gets switched off within a month.

From Detection to Response: What to Do When a Match Fires

The response depends entirely on which threat type fired. Lookalike impersonation is an abuse-reporting workflow you can run from a laptop. An unauthorized cert for a domain you own is a security incident with a clock on it. Keep the runbooks separate so responders don't apply the wrong urgency.

Runbook A: a lookalike phishing domain

  1. Visit it safely. Use an isolated browser or sandbox, never a corp machine. Confirm whether the page is live and whether it's actually cloning your login.
  2. Preserve evidence. Screenshot the cert details, the page, and the hosting fingerprints — you'll need them for the reports.
  3. Report in parallel. Submit to Google Safe Browsing, file abuse with the hosting provider and registrar, and send the cert to the issuing CA's abuse channel. Brief your SOC so inbound user reports get matched to a known case.
  4. Notify support and comms if the clone is convincing, so they're ready when a customer asks.

The uncomfortable truth: takedowns are slow. Provider abuse desks measure response in days, and many phishing sites are live for under 24 hours anyway. You will rarely beat the campaign by getting the site removed. Detection speed is the edge you actually have — catching the cert at issuance and warning users before the lure lands beats a takedown that completes after the operator has already moved on.

Runbook B: an unauthorized cert for your own domain

This is an incident. Treat it like one.

  1. Audit your CAA records immediately. If the issuing CA isn't in your CAA set, that's either a CAA misconfiguration or a CA that ignored it — both need filing.
  2. Pull DNS and registrar change logs around the issuance timestamp. A cert means someone passed a domain control check, which means they controlled your DNS, your registrar account, or a web path long enough to answer an ACME challenge.
  3. Assume the access that produced the cert may still exist. Rotate registrar and DNS credentials, review API tokens.
  4. Report the mis-issuance to the CA and decide on revocation. Revocation is weak protection given how little clients check it, but the paper trail matters.

If you don't run CAA records today, add them before you finish reading this. They're the cheapest control that turns "any CA can issue for us" into "only the CAs we named."

The Limits: What Certificate Transparency Monitoring Won't Catch

Certificate transparency monitoring only sees publicly trusted WebPKI certificates, so anything outside that scope leaves no trace in the logs you're watching. Most posts on the subject oversell CT as a complete answer. It is not.

Where CT monitoring goes blind:

  • Self-signed certs and plain HTTP. An attacker who skips TLS, or uses a self-signed cert, never appears in a log. Roughly one in five phishing pages still runs without valid HTTPS.
  • Private and internal CA mis-issuance. A compromised internal CA issuing a rogue cert for an internal service won't show up. Private CA logging is yours to build; nobody publishes it for you.
  • Log inclusion lag. Logs have a Maximum Merge Delay — commonly 24 hours — between accepting a cert and merging it into the visible tree. Your detection window is real but not instant.
  • Rising noise. Shorter certificate lifetimes mean far more renewals and log entries. The move toward 47-day certificates multiplies issuance volume, and post-quantum migration adds more churn. Your matching engine has to scale with that.

CT monitoring is one input. Pair it with passive DNS to catch domains that resolve before they ever get a cert, and with broader brand monitoring for impersonation that never touches a certificate at all. Treat CT as your earliest reliable signal for the attacks that do use TLS, not as the whole sensor grid.

FAQ

How fast can certificate transparency monitoring detect a phishing domain?

Detection is bounded by the log's Maximum Merge Delay — typically up to 24 hours between cert issuance and visibility — plus your own polling interval. In practice you'll see most lookalike certs within an hour or two of issuance, which still beats the average phishing campaign's email send by a useful margin.

Is certstream reliable enough for production monitoring?

No. certstream is fine for prototyping and demos, but for a security control you depend on, consumers drop entries silently under load and you get no error when gaps appear. Poll the CT static-tile API directly with tree-size checkpoints so you can prove complete coverage.

What's the difference between a typosquat and a combosquat?

A typosquat is a misspelling of your domain (gogle.com). A combosquat keeps your brand intact and adds words around it (google-account-verify.com). Combosquats are harder for typosquatting detection because edit-distance scoring won't flag them — you need keyword-in-SAN rules and permutation lists for that class.

Does finding an unauthorized cert mean my domain is compromised?

Usually yes, at least partially. A publicly trusted cert means someone passed a domain control check, which requires control of your DNS, registrar account, or a web path. Treat unauthorized certificate detection as evidence of access that may still be active, and rotate credentials accordingly.

Will CAA records stop attackers from getting certs for my domain?

Partially. CAA records restrict which CAs may issue for your domain, and compliant CAs honor them — they stop accidental and casual mis-issuance and give you a violation to report when ignored. They do not stop an attacker who already controls your DNS, since that attacker can edit the CAA record too.

Closing

Certificate transparency monitoring shifts CT from a compliance checkbox into an attacker-visibility feed. The certificates are public, the attackers can't opt out, and the detection window opens before the phishing emails do. The work is honest engineering: poll the logs directly instead of trusting a firehose, score matches instead of alerting on substrings, and keep two separate runbooks because impersonation and unauthorized issuance are different incidents. Pair it with passive DNS and brand monitoring, accept that it won't catch everything, and use the head start it gives you. Detection speed is the only real edge you get — so build the pipeline that's fast and quiet enough to actually keep running.

This is why we built CertPulse

CertPulse connects to your AWS, Azure, and GCP accounts, enumerates every certificate, monitors your external endpoints, and watches Certificate Transparency logs. One dashboard for every cert. Alerts when auto-renewal fails. Alerts when certs approach expiry. Alerts when someone issues a cert for your domain that you didn't request.

If you're looking for complete certificate visibility without maintaining scripts, we can get you there in about 5 minutes.