Industry

Certificate Renewal: The Engineering Guide to Renewals at Scale

April 4, 202614 min readCertPulse Engineering

Every team has a certificate renewal story that ends with a 2am page and a scramble through a wiki page last updated in 2019. The process sounds simple until you're managing certificates across three cloud providers, two CAs, and a Kubernetes cluster that somebody set up before they left the company. Certificate renewal at scale isn't a single operation. It's a category of operations, each with its own failure modes, and the industry is about to make all of them more frequent.

This guide covers what actually happens during renewal, how to automate it, and what breaks when you're responsible for more than a handful of certs. If you manage fewer than ten certificates, the vendor docs will serve you fine. If you manage fifty or more, keep reading.

What certificate renewal actually involves

Certificate renewal replaces an expiring TLS certificate with a new one for the same identity, but the mechanics vary significantly depending on the CA, cert type, and whether you reuse keys. Industry data indicates that roughly 60% of certificate-related outages trace back to renewal process failures, not initial provisioning. Understanding the distinction between renewal, reissuance, and rekeying prevents the confusion that leads to those outages.

TLS/SSL certificate renewal vs. reissuance

Renewal, reissuance, and rekeying are three distinct operations, though most CAs use the terms loosely:

  • SSL certificate renewal extends coverage with a new certificate and new validity period. The CA may or may not require a new CSR.
  • Certificate reissuance generates a new certificate mid-term, typically because you need to change SANs or your key was compromised.
  • Certificate rekeying specifically means generating a new key pair and getting a cert issued against it.

The practical difference matters when you're automating. If your pipeline assumes renewal never changes the key, you'll break certificate pinning configurations. If it assumes the SANs stay identical, you'll miss cases where a reissuance added a subdomain that your monitoring doesn't cover.

Certificate types and their renewal workflows

DV, OV, and EV certificates each follow different renewal workflows due to their validation requirements:

Certificate type Automation level Validation required Typical renewal time
DV certs Fully automatable via ACME Domain control only (HTTP-01, DNS-01, or email) Minutes
OV/EV certs Partially automatable Organization validation with human review (typically annual) Hours to days
Internal/mTLS certs Fully automatable with your own CA Controlled by your step-ca or Active Directory CA policy Minutes

DV certificates renew with domain validation only, which is why ACME automates them end to end. OV and EV certificates require organization validation steps involving human review, making full automation impossible. Internal PKI and mTLS certificates follow whatever policy your CA enforces — cert-manager or step-ca can automate these, but you own the root of trust and the rotation logic.

Why certificates expire (and why 90-day lifetimes are winning)

Certificates expire because revocation doesn't work reliably enough to be the only safety net. CRL distribution is slow, OCSP has availability problems, and according to Netcraft's measurements, OCSP stapling fails silently in roughly 8% of configurations. Short certificate lifetimes reduce the window during which a compromised key remains trusted. This isn't theoretical — it's the actual security model the industry has converged on.

The security case for short-lived certificates

Let's Encrypt set the standard at 90 days in 2015 and proved that short-lived certificates work at internet scale, now protecting over 360 million domains. The security logic is straightforward:

  • A 90-day certificate compromised on day one gives an attacker at most 90 days of exposure
  • A one-year certificate compromised on day one gives an attacker up to 365 days of exposure
  • Revocation mechanisms (CRL, OCSP) frequently fail to close that window in practice

If a key is compromised and revocation fails — which it often does — the exposure window is bounded only by the certificate's remaining validity period.

CA/Browser Forum changes and what's coming

The CA/Browser Forum passed ballot SC-081 in 2025, setting a phased reduction in maximum TLS certificate validity:

Effective date Maximum certificate validity
Before March 2026 398 days
March 2026 200 days
March 2027 100 days
March 2029 47 days

The operational impact is significant. If you're renewing certificates manually today, you're doing it once a year per cert. By 2029, you'll be doing it roughly eight times per year per cert. For a fleet of 200 certificates, that's 1,600 renewal events annually. The math makes the case for automated certificate renewal better than any blog post can.

Manual certificate renewal: step by step

Manual TLS certificate renewal follows four steps: generate a CSR, submit it to your CA with validation, install the new cert, and verify the chain. The entire process takes 15–60 minutes per certificate depending on the validation type. Multiply that by your cert count to understand why this section exists mainly so you know what to automate.

Step 1: Generate a CSR

openssl req -new -newkey rsa:2048 -nodes \
  -keyout example.com.key \
  -out example.com.csr \
  -subj "/CN=example.com/O=Your Org/L=City/ST=State/C=US"

If you're renewing with the same key (not recommended, but sometimes required by policy), drop -newkey rsa:2048 and use -key existing.key instead. Key reuse saves you from updating pinning configs but extends the exposure window if that key was ever compromised.

Step 2: Submit to your CA and validate

Upload the CSR to your CA's portal or API. Validation methods differ by cert type:

  • HTTP-01: Place a file on your webserver at a CA-specified path
  • DNS-01: Create a TXT record in your domain's DNS
  • Email: Respond to a verification email sent to a domain admin address
  • OV/EV: All of the above plus phone verification and document review

Step 3: Install the renewed certificate

The installation step is where most manual renewals fail. The cert file alone isn't enough — you need the full chain in the correct order.

# Combine cert and chain for Nginx
cat example.com.crt intermediate.crt > fullchain.pem

# Reload Nginx without downtime
nginx -t && systemctl reload nginx

For AWS ALB, upload via the CLI: aws acm import-certificate. For Kubernetes Ingress, update the TLS secret. The critical gotcha: forgetting to restart or reload the service after installing the new cert. After monitoring thousands of renewal events, I've seen teams update the file on disk and close the ticket, only to get paged when the old cert still in memory expires.

Step 4: Verify the chain and test

# Check cert dates and chain
openssl s_client -connect example.com:443 -servername example.com </dev/null 2>/dev/null | openssl x509 -noout -dates -issuer

# Verify the full chain
openssl verify -CAfile ca-bundle.crt fullchain.pem

Test from outside your network. CDNs and load balancers cache certificates, and a successful local test doesn't mean your edge nodes picked up the change.

Automated certificate renewal with ACME

The ACME protocol (RFC 8555) automates the entire certificate lifecycle: key generation, domain validation, certificate issuance, and installation. According to Let's Encrypt's published data, over 300 million certificates are currently managed via ACME through Let's Encrypt alone. If you're still renewing DV certs manually, this section is your exit ramp.

How the ACME protocol works

ACME is a challenge-response protocol that automates certificate issuance in four steps:

  1. Client contacts the CA and requests a certificate for a specific domain
  2. CA issues a challenge (HTTP-01 or DNS-01) to prove domain control
  3. Client completes the challenge and notifies the CA
  4. CA validates and issues the signed certificate over HTTPS with JSON payloads

The client handles CSR generation internally, removing the manual step entirely.

Certbot, acme.sh, and alternatives

Choosing an ACME client depends on your environment:

ACME client Language Best for Key advantage
Certbot Python Traditional VM deployments Reference client with Nginx/Apache plugins
acme.sh Shell Minimal or containerized environments Zero dependencies, supports 70+ DNS providers
lego Go CI/CD pipelines Single binary, easy to embed
step-ca Go Internal PKI ACME for private certificates, not just public

A working Certbot renewal with hooks:

certbot renew --deploy-hook "systemctl reload nginx" \
  --pre-hook "echo 'Starting renewal' | logger" \
  --post-hook "echo 'Renewal complete' | logger"

The certbot renew command checks all managed certs and renews those within 30 days of expiry. Add it to a daily cron and the process runs unattended.

DNS-01 vs HTTP-01 challenge tradeoffs

HTTP-01 is simpler but requires port 80 access on every server. DNS-01 works for wildcard certs and servers behind firewalls, but introduces DNS API dependencies. At scale, DNS-01 has specific pain points:

  • Rate limits: Cloudflare limits API requests to 1,200 per 5 minutes
  • Propagation delays: TXT record propagation can cause validation timeouts
  • Credential sprawl: Managing DNS API credentials for multiple providers across environments adds complexity

For a deeper look at protocol mechanics, see our ACME protocol guide.

Certificate renewal in Kubernetes and cloud environments

cert-manager is the de facto standard for Kubernetes certificate renewal, running in over 40% of Kubernetes clusters according to CNCF survey data. It watches Certificate resources and renews at 2/3 of the certificate's lifetime by default. Cloud providers offer their own auto-renewal for managed certificates, but each has different behaviors and silent failure modes.

cert-manager for Kubernetes

cert-manager creates Certificate resources backed by Issuers (namespace-scoped) or ClusterIssuers (cluster-wide). When a cert reaches the renewal window, cert-manager automatically:

  1. Generates a new CSR
  2. Completes the ACME challenge
  3. Updates the Kubernetes Secret with the new certificate

The critical failure mode to watch for: if the Issuer's credentials expire or the DNS solver loses permissions, cert-manager logs errors but your certs silently age toward expiry. For the full setup, see our Kubernetes certificate renewal guide.

AWS ACM, GCP CAS, and Azure Key Vault auto-renewal

Provider Service Auto-renewal Failure notification Covers
AWS ACM Yes, for DNS-validated certs CloudWatch event on failure ALB, CloudFront, API Gateway
GCP Certificate Manager Yes, for Google-managed certs Cloud Monitoring alert Load Balancers
Azure Key Vault Yes, configurable at 80% lifetime Event Grid notification App Gateway, Front Door

The common trap: assuming "auto-renewal" means "never think about it." In practice, every provider has silent failure scenarios:

  • AWS ACM auto-renewal fails silently if the CNAME validation record gets deleted
  • Azure Key Vault won't renew if the cert policy doesn't match the issuer's requirements
  • GCP Certificate Manager requires the domain authorization to remain valid

Every cloud provider's auto-renewal has at least one scenario where it fails without an obvious alert.

Service mesh and mTLS certificate rotation

Istio and Linkerd handle mTLS certificate rotation for workload identities automatically, but the root CA and intermediate certs still require manual rotation. Istio's default root cert expires after 10 years, which sounds like someone else's problem until you realize your cluster is four years old and nobody documented the rotation procedure. Workload certificate rotation happens automatically; trust anchor rotation is a manual, high-risk operation.

Certificate renewal at scale: what breaks after 50 certs

Managing certificate renewal across a fleet means tracking expiration dates, CA relationships, and deployment targets for every cert in your certificate inventory. In our experience managing enterprise certificate estates, the average mid-market company has 15–20% more certificates than they think they do, and at least one will be a wildcard cert that somebody provisioned through a personal account three years ago.

Tracking expiration across multiple CAs and environments

The spreadsheet approach breaks down around 50 certificates. Beyond that threshold, you need programmatic discovery:

  • Prometheus blackbox exporter probes endpoints and exports probe_ssl_earliest_cert_expiry as a metric
  • Certificate Transparency logs via crt.sh provide a view of publicly issued certs for your domains
  • Network scanning catches certs on servers not exposed to external monitoring
  • CA API integration pulls renewal status directly from each certificate authority

Neither CT logs nor endpoint probing catches internal certs or certs sitting on servers that aren't exposed to your monitoring. For certificate monitoring that actually covers your full estate, you need a combination of all four approaches. This is the operational problem that motivated us to build CertPulse: the gap between "we have monitoring" and "we know about every cert."

Renewal failures you won't catch without monitoring

After monitoring certificate renewals across thousands of environments, these are the most common silent failure patterns:

Failure type What happens Why it's hard to detect
CDN cache masking CDN serves cached cert after origin renewal fails Everything looks fine until the CDN cache expires and clients see the expired cert
Intermediate chain rot CA rotates intermediates; server still serves the old one Android clients break first because they don't fetch intermediates automatically
Orphaned non-ACME certs 95% of certs auto-renew via Certbot; the five OV certs from a vendor portal three years ago do not They're not in your automation inventory
DNS permission drift ACME DNS-01 validation fails because someone tightened IAM policies Renewal service silently lost write access to Route 53
Silent cert-manager failures cert-manager logs renewal failed but no alert fires Nobody configured alerting on CertificateRequest denied events

Building a renewal runbook

Your renewal runbook should answer three questions for every certificate in your fleet:

  1. What's expiring? — Certificate identity, SANs, and expiration date
  2. Who owns it? — Team, individual, and escalation path
  3. What's the renewal method? — ACME automated, cloud managed, or manual with specific CA

Keep the runbook next to your incident response docs, not buried in a wiki. Include rollback procedures for the scenario where a renewed cert breaks clients.

Certificate renewal checklist

Step Manual ACME automated Cloud managed
Pre-renewal
Inventory cert and confirm owner Yes Verify automation config Verify auto-renewal enabled
Decide: new key or reuse Yes Client decides (default: new) Provider decides
Check SAN list is current Yes Review Certbot config Review ACM/Key Vault settings
During renewal
Generate CSR openssl req Automatic Automatic
Complete validation Manual DNS/HTTP/email Automatic challenge Automatic (if CNAME intact)
Install cert + full chain Manual copy + reload Deploy hook Automatic propagation
Post-renewal
Verify chain externally openssl s_client Monitoring check Endpoint probe
Confirm monitoring picks up new expiry Update tracking Auto-detected CloudWatch/Event Grid
Document what changed Update runbook Commit config changes Tag resource

FAQ

How far in advance should I renew a certificate? Start renewal 30 days before expiry for manual renewals to leave room for validation delays and troubleshooting. Certbot defaults to renewing at 30 days remaining. cert-manager renews at 2/3 of the total lifetime — for 90-day certs, that means renewal happens around day 60.

Does certificate renewal generate a new private key? It depends on your configuration. Certbot generates a new key by default on each renewal. Some CAs allow key reuse during renewal. Generating a new key is generally recommended because it limits the impact window if the previous key was compromised without your knowledge.

Will my site go down during certificate renewal? No, not if you reload rather than restart your web server. Both Nginx and Apache support graceful reloads that swap the certificate without dropping active connections. The risk is in the gap between installing the cert and reloading the service — automate both steps together to eliminate it.

What happens if a certificate renewal fails silently? The old certificate continues serving until it expires, then clients see ERR_CERT_DATE_INVALID or equivalent errors. If a CDN sits in front of your origin, the CDN's cached cert may mask the failure for hours or days. This is why external certificate expiration monitoring matters more than checking your ACME client's logs.

How do I handle certificate renewal for hundreds of certificates across multiple CAs? You need three things: a complete inventory (discovered, not just documented), automated renewal for everything that supports it, and monitoring that alerts on expiry regardless of the renewal method. The ssl certificate management challenge isn't any single renewal — it's knowing that every renewal across your fleet actually succeeded.

This is why we built CertPulse

CertPulse connects to your AWS, Azure, and GCP accounts, enumerates every certificate, monitors your external endpoints, and watches Certificate Transparency logs. One dashboard for every cert. Alerts when auto-renewal fails. Alerts when certs approach expiry. Alerts when someone issues a cert for your domain that you didn't request.

If you're looking for complete certificate visibility without maintaining scripts, we can get you there in about 5 minutes.

Certificate Renewal: The Engineering Guide to Renewals at Scale | CertPulse