Inheriting a certificate inventory is rarely the clean handover anyone promises. You get a half-maintained spreadsheet, a Slack thread from 2022, and someone saying "I think Marcus used to handle the F5 stuff." Within a week you'll discover three certs nobody knew existed, two that expired last quarter and are still deployed, and one wildcard that touches forty services nobody documented. This is the actual starting state most platform engineers walk into during a new role or post-acquisition cleanup. Forget the textbook playbook. Here's what works when you're handed someone else's mess.
Day One: What You're Actually Walking Into
Inherited certificate inventories typically contain 30-60% stale data, certs from three or more CAs with no naming convention, and a handful of "temporary" certs that quietly became load-bearing. The mess compounds during M&A activity, team churn, and any production environment older than four years.
In my experience running SSL audits after handovers, the same patterns repeat:
- A spreadsheet titled
certs_v3_FINAL.xlsxwith 80 rows, of which roughly 35 are accurate - Certs sitting in S3 buckets named
prod-stuff-old, deployed via a Jenkins job nobody has access to anymore - One F5 with 22 certs, last touched in 2021, GUI password in a 1Password vault belonging to a former employee
- Wildcards that expired six months ago but are still deployed because the renewed version landed on one of three load balancers
- Two ACM certs with identical CNs in different regions, neither tagged, only one actually serving traffic
Three reasons this happens:
- Org churn: whoever set up renewals usually left before the cert came due
- M&A: you absorbed someone else's legacy TLS infrastructure without their tribal knowledge
- "Temporary" certs: issued during incidents, become permanent because nobody schedules cleanup work for things that aren't currently broken
Honest takeaway: assume the documentation is wrong until verified. Don't budget your first month based on the spreadsheet's cert count. Budget on 1.5x to 2x.
The First 48 Hours: Triage Before Inventory
Before building a complete inventory, find what's expiring in the next 30 days. According to CT log data, roughly 4% of public certs expire in any given 30-day window. On a 500-cert fleet, that's 20 certs requiring immediate attention. Triage first prevents 2am pages while you're still drawing the org chart.
The fastest triage sequence:
- Pull every public cert via crt.sh:
curl -s 'https://crt.sh/?q=%25.example.com&output=json' | jq '.[] | {cn: .common_name, not_after: .not_after}' - Sort and cut: by
not_afterascending, cut off at 30 days - Cross-reference DNS: confirm the host still resolves and serves traffic
- Check what's deployed:
openssl s_client -servername host -connect host:443 </dev/null 2>/dev/null | openssl x509 -noout -serial
The serial number lets you match what's actually deployed against what crt.sh thinks exists. CT logs tell you what was issued, not what's live. Engineers familiar with certificate transparency have seen the gap between "issued" and "deployed" surprise people in both directions.
The "bleeding now" list goes into a pinned Slack message with a name next to each. The complete inventory comes later. War story: I once spent three weeks building a beautiful asset graph for an inherited fleet, then got paged on cert number 4 because the graph wasn't deployed yet. Triage first.
Discovery: Five Sources That Actually Find Certs
Complete cert discovery requires five sources, because no single one finds everything. CT logs typically catch 70-80% of public-facing certs. Cloud provider APIs add internal and unused ones. Load balancer configs reveal what's actually serving. Kubernetes secrets find the application layer. Humans fill the gaps the tools miss.
| Source | Catches | Misses |
|---|---|---|
| CT logs (crt.sh, Censys, Google CT API) | Every publicly-trusted cert | Internal CA-issued, mTLS client, self-signed |
Cloud provider APIs (aws acm list-certificates, az keyvault certificate list, gcloud certificate-manager certificates list) |
Cloud-managed certs per region/account | Certs uploaded to EC2 or VMs as files |
Load balancer configs (ALB/NLB listeners, F5 tmsh list ltm profile client-ssl, HAProxy bind, nginx ssl_certificate) |
What's deployed | What's staged but not active |
Kubernetes secrets (kubectl get secrets -A -o json | jq '.items[] | select(.type=="kubernetes.io/tls")') |
cert-manager output, manually loaded certs | Certs baked into container images |
Humans (#help-platform archives, ticket history, DigiCert/Sectigo/GlobalSign invoices) |
Tribal knowledge | Nothing once you've asked enough people |
Each source silently misses things. This is why a practical walkthrough of cross-account discovery is worth more than any single tool. The gaps are where 2am pages live.
Ownership Archaeology: Figuring Out Who Owns What
Finding certs is mechanical. Establishing certificate ownership is forensic. In a handover I ran last year, 43 of 187 discovered certs (23%) had no clear owner after the first pass. Ownership archaeology is the underrated half of any infrastructure handover, and it's where most guides wave their hands and say "tag your certs."
Investigation techniques that actually surface owners:
git blamethe deployment manifest. Whoever last touched the cert reference is your first lead, even if they've left. Their team usually inherits.- AWS CloudTrail filtered on
eventSource = acm.amazonaws.comfor the last 12 months. The IAM principal who imported or requested the cert is a real signal. - DNS record history. Who created the CNAME for the host this cert serves? Cloudflare, Route53, and most managed DNS providers keep audit logs.
- Payment records. Finance has the invoice for that paid wildcard. The PO number maps to a cost center, which maps to a team.
- Issuance approval emails. Security teams that gate public CA issuance keep these. Search the security inbox for the CN.
When all five fail, you have an orphan cert. Do not delete it. Set up monitoring, deploy a non-production canary that watches for traffic, and wait one full renewal cycle. If nothing complains, schedule decommission. If something does, congratulations, you found the owner.
The temptation is to declare orphans abandoned and move on. Don't. Load-bearing certs disguise themselves as orphans more often than actually-orphaned ones do.
From Spreadsheet Chaos to a Real Source of Truth
The inherited spreadsheet is fine for week one and fatal by month three. Once triage and discovery are done, migrate to a real system. The schema matters more than the tool. Internal data from teams I've worked with shows that tracking fewer than eight fields per cert produces roughly 3x the renewal-related incidents of tracking ten or more.
Columns that pay rent:
| Column | Why it matters |
|---|---|
| Common name and SANs | Both. SAN coverage is where wildcards hide. |
| Expiry date | UTC, ISO 8601, no ambiguity |
| Issuer | CA name plus intermediate. Useful when one CA has a revocation event. |
| Deployment location | ALB ARN, F5 partition, k8s namespace plus secret name. Specific enough to pull the cert without asking. |
| Owner team | A team, not a person. People leave; teams persist. |
| Renewal mechanism | ACME, manual, vendor portal, "we don't know yet." |
| Downstream consumers | What breaks if this cert is wrong. Mobile apps with pinned certs go here. |
| Last verified | The most-skipped column and the one that saves you. A cert not verified in 90 days is closer to a guess than a record. |
Skip vanity columns: cert size, signature algorithm version, color codes. They look thorough and decay into noise.
Build-vs-buy is the obvious next question. Under 100 certs growing slowly: a Postgres table and a renewal cron beats any tool. Past 200 certs across multiple cloud accounts: the spreadsheet-evolved-into-Airtable approach starts costing more engineering time than buying. We've covered tradeoffs in cert tracking tooling at length elsewhere.
Renewal Strategy for Inherited Certs You Don't Trust
You cannot flip an inherited fleet to ACME on day one. The graduated approach: renew shortest-expiry first, lowest-blast-radius first, and never test in production. With 47-day cert lifetimes arriving by 2029, the renewal cadence on any inherited fleet will roughly 8x, meaning the renewal pipeline you build now is the one that has to scale.
The sequence that survives contact with reality:
- Tier the fleet by blast radius. Internal dev cert with three users? Tier 3. Public API serving 40% of revenue? Tier 1.
- Start with Tier 3, shortest expiry. Renew it manually using the existing process. Confirm it works end-to-end before changing anything.
- Migrate Tier 3 to ACME or cloud-native renewal. Validate in staging. Validate again. Watch the renewal complete on its own.
- Promote to Tier 2 with the proven pipeline. Add monitoring before the first auto-renewal.
- Tier 1 last, with a manual approval gate on the first auto-renewal even after pipeline maturity.
Two anti-patterns to avoid:
- Rotate-and-pray: renewing without verifying deployment landed. The new cert sits in ACM while the old one keeps serving until expiry. We've covered the renewal-deployment gap as its own failure mode because it happens constantly.
- Recovering unreachable keys: renewing certs whose private keys live on a host you can't access. Don't try to be clever. Issue a fresh cert, deploy it parallel to the old one, cut traffic over, and decommission. Trying to recover an unreachable key path always ends in a 3am call.
The Documentation You Wish Your Predecessor Left
Write the documentation you wanted on day one. In my experience auditing inherited fleets, roughly 70% of cert runbooks omit the single most useful field: why does this cert exist. Not what it does, but why it was issued and what would break if it weren't. Without that field, the next inheritor will delete a load-bearing cert because it looks redundant.
A certificate runbook template that holds up:
- CN, SANs, current cert serial. Identity, not config.
- Why this cert exists. One paragraph of context. The specific service, the specific consumer, the reason a wildcard wasn't enough.
- Deployment procedure. Exact commands. Not "update via Terraform" but the actual
terraform apply -target=...line. - Rollback steps. How to revert if the renewal breaks something. Include the previous cert's location.
- Who to call. Team Slack channel, oncall rotation, fallback contact. Not an individual's name.
- Renewal mechanism and monitoring. Where the alert fires when this fails.
Store the runbook next to the code that deploys the cert. Confluence rots. A RUNBOOK.md in the same repo as the Terraform module survives team changes, tooling migrations, and the inevitable "we're moving wikis" project.
One more thing. Write the runbook as if you've already left the company. The you-shaped reader who knows the system is the wrong audience. The audience is whoever inherits this in 18 months when you've moved on.
Frequently Asked Questions
How long does it take to clean up an inherited certificate inventory?
For a fleet of 200-500 certs, plan on 6-12 weeks of part-time work. The first two weeks are triage and discovery. Weeks three through six are ownership archaeology and migration to a real source of truth. The remaining weeks cover renewal pipeline build-out and runbook documentation.
What's the most important field in a cert tracking spreadsheet?
"Last verified." Every other field can be out of date and the system still works. Without a verification timestamp, you don't know which records to trust, and the inventory drifts back into spreadsheet-of-lies territory within a quarter.
Should I migrate everything to ACME immediately?
No. ACME is the right destination, but day-one migration of inherited certs you don't fully understand is how outages happen. Build the ACME pipeline on a low-blast-radius cert first, prove it, then migrate in tiers.
How do I handle certs whose private keys are on a host I can't access?
Issue a new cert, deploy it in parallel, cut traffic over, and decommission the old one. Don't try to recover the original key. Treat unreachable key paths as a forcing function for migration, not a problem to solve in place.
What's the fastest way to find certs nobody documented?
Combine CT log queries on your apex domains with cloud provider API listings across every account, then cross-reference both against your DNS records. The diff between what crt.sh sees, what your cloud APIs return, and what your DNS resolves usually surfaces the undocumented ones within a day.
If you're staring down a fresh handover and want a head start on building the certificate inventory you'll actually trust, CertPulse handles the discovery and ownership-tracking parts so you can spend your first weeks on triage and renewal strategy rather than spreadsheet archaeology.
This is why we built CertPulse
CertPulse connects to your AWS, Azure, and GCP accounts, enumerates every certificate, monitors your external endpoints, and watches Certificate Transparency logs. One dashboard for every cert. Alerts when auto-renewal fails. Alerts when certs approach expiry. Alerts when someone issues a cert for your domain that you didn't request.
If you're looking for complete certificate visibility without maintaining scripts, we can get you there in about 5 minutes.