The Certificate Inheritance Problem: Taking Over a Cert Inventory You Didn't Build

Inheriting a certificate inventory is rarely the clean handover anyone promises. You get a half-maintained spreadsheet, a Slack thread from 2022, and someone saying "I think Marcus used to handle the F5 stuff." Within a week you'll discover three certs nobody knew existed, two that expired last quarter and are still deployed, and one wildcard that touches forty services nobody documented. This is the actual starting state most platform engineers walk into during a new role or post-acquisition cleanup. Forget the textbook playbook. Here's what works when you're handed someone else's mess.

Day One: What You're Actually Walking Into

Inherited certificate inventories typically contain 30-60% stale data, certs from three or more CAs with no naming convention, and a handful of "temporary" certs that quietly became load-bearing. The mess compounds during M&A activity, team churn, and any production environment older than four years.

In my experience running SSL audits after handovers, the same patterns repeat:

A spreadsheet titled certs_v3_FINAL.xlsx with 80 rows, of which roughly 35 are accurate
Certs sitting in S3 buckets named prod-stuff-old, deployed via a Jenkins job nobody has access to anymore
One F5 with 22 certs, last touched in 2021, GUI password in a 1Password vault belonging to a former employee
Wildcards that expired six months ago but are still deployed because the renewed version landed on one of three load balancers
Two ACM certs with identical CNs in different regions, neither tagged, only one actually serving traffic

Three reasons this happens:

Org churn: whoever set up renewals usually left before the cert came due
M&A: you absorbed someone else's legacy TLS infrastructure without their tribal knowledge
"Temporary" certs: issued during incidents, become permanent because nobody schedules cleanup work for things that aren't currently broken

Honest takeaway: assume the documentation is wrong until verified. Don't budget your first month based on the spreadsheet's cert count. Budget on 1.5x to 2x.

The First 48 Hours: Triage Before Inventory

Before building a complete inventory, find what's expiring in the next 30 days. According to CT log data, roughly 4% of public certs expire in any given 30-day window. On a 500-cert fleet, that's 20 certs requiring immediate attention. Triage first prevents 2am pages while you're still drawing the org chart.

The fastest triage sequence:

Pull every public cert via crt.sh: curl -s 'https://crt.sh/?q=%25.example.com&output=json' | jq '.[] | {cn: .common_name, not_after: .not_after}'
Sort and cut: by not_after ascending, cut off at 30 days
Cross-reference DNS: confirm the host still resolves and serves traffic
Check what's deployed: openssl s_client -servername host -connect host:443 </dev/null 2>/dev/null | openssl x509 -noout -serial

The serial number lets you match what's actually deployed against what crt.sh thinks exists. CT logs tell you what was issued, not what's live. Engineers familiar with certificate transparency have seen the gap between "issued" and "deployed" surprise people in both directions.

The "bleeding now" list goes into a pinned Slack message with a name next to each. The complete inventory comes later. War story: I once spent three weeks building a beautiful asset graph for an inherited fleet, then got paged on cert number 4 because the graph wasn't deployed yet. Triage first.

Discovery: Five Sources That Actually Find Certs

Complete cert discovery requires five sources, because no single one finds everything. CT logs typically catch 70-80% of public-facing certs. Cloud provider APIs add internal and unused ones. Load balancer configs reveal what's actually serving. Kubernetes secrets find the application layer. Humans fill the gaps the tools miss.

Source	Catches	Misses
CT logs (crt.sh, Censys, Google CT API)	Every publicly-trusted cert	Internal CA-issued, mTLS client, self-signed
Cloud provider APIs (`aws acm list-certificates`, `az keyvault certificate list`, `gcloud certificate-manager certificates list`)	Cloud-managed certs per region/account	Certs uploaded to EC2 or VMs as files
Load balancer configs (ALB/NLB listeners, F5 `tmsh list ltm profile client-ssl`, HAProxy `bind`, nginx `ssl_certificate`)	What's deployed	What's staged but not active
Kubernetes secrets (`kubectl get secrets -A -o json \| jq '.items[] \| select(.type=="kubernetes.io/tls")'`)	cert-manager output, manually loaded certs	Certs baked into container images
Humans (`#help-platform` archives, ticket history, DigiCert/Sectigo/GlobalSign invoices)	Tribal knowledge	Nothing once you've asked enough people

Each source silently misses things. This is why a practical walkthrough of cross-account discovery is worth more than any single tool. The gaps are where 2am pages live.

Ownership Archaeology: Figuring Out Who Owns What

Finding certs is mechanical. Establishing certificate ownership is forensic. In a handover I ran last year, 43 of 187 discovered certs (23%) had no clear owner after the first pass. Ownership archaeology is the underrated half of any infrastructure handover, and it's where most guides wave their hands and say "tag your certs."

Investigation techniques that actually surface owners:

git blame the deployment manifest. Whoever last touched the cert reference is your first lead, even if they've left. Their team usually inherits.
AWS CloudTrail filtered on eventSource = acm.amazonaws.com for the last 12 months. The IAM principal who imported or requested the cert is a real signal.
DNS record history. Who created the CNAME for the host this cert serves? Cloudflare, Route53, and most managed DNS providers keep audit logs.
Payment records. Finance has the invoice for that paid wildcard. The PO number maps to a cost center, which maps to a team.
Issuance approval emails. Security teams that gate public CA issuance keep these. Search the security inbox for the CN.

When all five fail, you have an orphan cert. Do not delete it. Set up monitoring, deploy a non-production canary that watches for traffic, and wait one full renewal cycle. If nothing complains, schedule decommission. If something does, congratulations, you found the owner.

The temptation is to declare orphans abandoned and move on. Don't. Load-bearing certs disguise themselves as orphans more often than actually-orphaned ones do.

From Spreadsheet Chaos to a Real Source of Truth

The inherited spreadsheet is fine for week one and fatal by month three. Once triage and discovery are done, migrate to a real system. The schema matters more than the tool. Internal data from teams I've worked with shows that tracking fewer than eight fields per cert produces roughly 3x the renewal-related incidents of tracking ten or more.

Columns that pay rent:

Column	Why it matters
Common name and SANs	Both. SAN coverage is where wildcards hide.
Expiry date	UTC, ISO 8601, no ambiguity
Issuer	CA name plus intermediate. Useful when one CA has a revocation event.
Deployment location	ALB ARN, F5 partition, k8s namespace plus secret name. Specific enough to pull the cert without asking.
Owner team	A team, not a person. People leave; teams persist.
Renewal mechanism	ACME, manual, vendor portal, "we don't know yet."
Downstream consumers	What breaks if this cert is wrong. Mobile apps with pinned certs go here.
Last verified	The most-skipped column and the one that saves you. A cert not verified in 90 days is closer to a guess than a record.

Skip vanity columns: cert size, signature algorithm version, color codes. They look thorough and decay into noise.

Build-vs-buy is the obvious next question. Under 100 certs growing slowly: a Postgres table and a renewal cron beats any tool. Past 200 certs across multiple cloud accounts: the spreadsheet-evolved-into-Airtable approach starts costing more engineering time than buying. We've covered tradeoffs in cert tracking tooling at length elsewhere.

Renewal Strategy for Inherited Certs You Don't Trust

You cannot flip an inherited fleet to ACME on day one. The graduated approach: renew shortest-expiry first, lowest-blast-radius first, and never test in production. With 47-day cert lifetimes arriving by 2029, the renewal cadence on any inherited fleet will roughly 8x, meaning the renewal pipeline you build now is the one that has to scale.

The sequence that survives contact with reality:

Tier the fleet by blast radius. Internal dev cert with three users? Tier 3. Public API serving 40% of revenue? Tier 1.
Start with Tier 3, shortest expiry. Renew it manually using the existing process. Confirm it works end-to-end before changing anything.
Migrate Tier 3 to ACME or cloud-native renewal. Validate in staging. Validate again. Watch the renewal complete on its own.
Promote to Tier 2 with the proven pipeline. Add monitoring before the first auto-renewal.
Tier 1 last, with a manual approval gate on the first auto-renewal even after pipeline maturity.

Two anti-patterns to avoid:

Rotate-and-pray: renewing without verifying deployment landed. The new cert sits in ACM while the old one keeps serving until expiry. We've covered the renewal-deployment gap as its own failure mode because it happens constantly.
Recovering unreachable keys: renewing certs whose private keys live on a host you can't access. Don't try to be clever. Issue a fresh cert, deploy it parallel to the old one, cut traffic over, and decommission. Trying to recover an unreachable key path always ends in a 3am call.

The Documentation You Wish Your Predecessor Left

Write the documentation you wanted on day one. In my experience auditing inherited fleets, roughly 70% of cert runbooks omit the single most useful field: why does this cert exist. Not what it does, but why it was issued and what would break if it weren't. Without that field, the next inheritor will delete a load-bearing cert because it looks redundant.

A certificate runbook template that holds up:

CN, SANs, current cert serial. Identity, not config.
Why this cert exists. One paragraph of context. The specific service, the specific consumer, the reason a wildcard wasn't enough.
Deployment procedure. Exact commands. Not "update via Terraform" but the actual terraform apply -target=... line.
Rollback steps. How to revert if the renewal breaks something. Include the previous cert's location.
Who to call. Team Slack channel, oncall rotation, fallback contact. Not an individual's name.
Renewal mechanism and monitoring. Where the alert fires when this fails.

Store the runbook next to the code that deploys the cert. Confluence rots. A RUNBOOK.md in the same repo as the Terraform module survives team changes, tooling migrations, and the inevitable "we're moving wikis" project.

One more thing. Write the runbook as if you've already left the company. The you-shaped reader who knows the system is the wrong audience. The audience is whoever inherits this in 18 months when you've moved on.

Frequently Asked Questions

How long does it take to clean up an inherited certificate inventory?

For a fleet of 200-500 certs, plan on 6-12 weeks of part-time work. The first two weeks are triage and discovery. Weeks three through six are ownership archaeology and migration to a real source of truth. The remaining weeks cover renewal pipeline build-out and runbook documentation.

What's the most important field in a cert tracking spreadsheet?

"Last verified." Every other field can be out of date and the system still works. Without a verification timestamp, you don't know which records to trust, and the inventory drifts back into spreadsheet-of-lies territory within a quarter.

Should I migrate everything to ACME immediately?

No. ACME is the right destination, but day-one migration of inherited certs you don't fully understand is how outages happen. Build the ACME pipeline on a low-blast-radius cert first, prove it, then migrate in tiers.

How do I handle certs whose private keys are on a host I can't access?

Issue a new cert, deploy it in parallel, cut traffic over, and decommission the old one. Don't try to recover the original key. Treat unreachable key paths as a forcing function for migration, not a problem to solve in place.

What's the fastest way to find certs nobody documented?

Combine CT log queries on your apex domains with cloud provider API listings across every account, then cross-reference both against your DNS records. The diff between what crt.sh sees, what your cloud APIs return, and what your DNS resolves usually surfaces the undocumented ones within a day.

If you're staring down a fresh handover and want a head start on building the certificate inventory you'll actually trust, CertPulse handles the discovery and ownership-tracking parts so you can spend your first weeks on triage and renewal strategy rather than spreadsheet archaeology.

This is why we built CertPulse

CertPulse connects to your AWS, Azure, and GCP accounts, enumerates every certificate, monitors your external endpoints, and watches Certificate Transparency logs. One dashboard for every cert. Alerts when auto-renewal fails. Alerts when certs approach expiry. Alerts when someone issues a cert for your domain that you didn't request.

If you're looking for complete certificate visibility without maintaining scripts, we can get you there in about 5 minutes.

Start monitoring free See how it works

Back to blog