EngineeringMarch 28, 2026· 11 min read· By Marcus Webb

Webhook Delivery Guarantees: At-Least-Once vs Exactly-Once Semantics

Most API platforms promise webhook delivery but rarely guarantee it. This deep-dive covers retry strategies, idempotency keys, exponential backoff, and how to build a webhook infrastructure with 99.99% delivery confidence.

Handling Failures

Network drops and receiver downtime are inevitable. A robust webhook system must implement exponential backoff and rely on idempotency keys to ensure messages are delivered without duplication.

At-least-once delivery is the practical standard for most webhook systems: every event is guaranteed to be delivered at least one time, but may occasionally be delivered more than once due to retries. Exactly-once delivery — guaranteeing that each event is processed exactly one time regardless of network conditions — requires coordination between sender and receiver and is significantly more complex to implement. Understanding which guarantee your system provides (or needs) is the starting point for architecture decisions.

The cost of getting webhook delivery wrong is high and often invisible. A webhook that silently drops events during receiver downtime produces data inconsistencies that may not surface for days. A webhook that delivers duplicates without idempotency handling can trigger double-charges, duplicate notifications, or corrupted state. Building delivery guarantees correctly is a reliability investment that pays dividends in data integrity.

Retry Architecture: Exponential Backoff and Jitter

The fundamental retry pattern for webhook delivery is exponential backoff with jitter. On a failed delivery attempt, wait before retrying — and increase the wait time exponentially with each subsequent failure. A typical schedule: attempt 1 immediately, attempt 2 after 1 minute, attempt 3 after 5 minutes, attempt 4 after 30 minutes, attempt 5 after 2 hours, attempt 6 after 12 hours, attempt 7 after 24 hours, then abandon.

Jitter — adding a random offset to each retry delay — is critical when you're delivering webhooks to many receivers simultaneously. Without jitter, all receivers whose webhooks failed at time T will retry at T+1m, T+5m, etc. in synchronized waves, creating load spikes that can cause the very failures they're trying to recover from. Adding ±30% random jitter to each retry delay spreads the load and breaks this synchronization.

Implement circuit breakers per receiver endpoint. If a receiver has failed 5 consecutive deliveries, pause webhook delivery to that endpoint for a backoff period rather than continuing to send. This protects your delivery infrastructure from being overwhelmed by a single broken receiver and provides a clear signal to the receiver that they need to investigate their endpoint health.

Idempotency Keys: Making Retries Safe

Idempotency means that processing the same webhook event multiple times has the same effect as processing it once. Without idempotency, retries that successfully deliver a duplicate event can corrupt state: a payment webhook delivered twice could trigger two charges; an 'order created' webhook delivered twice could create duplicate orders.

Implement idempotency on both sides of the webhook. As a sender: include a unique event ID in every webhook payload (e.g., a UUID generated when the event is created). This ID should be stable across retries — the same event, whether delivered once or five times, always carries the same ID. As a receiver: before processing a webhook, check whether the event ID has been seen before. If it has, return 200 OK (acknowledge receipt) but skip processing. Store processed event IDs in a durable store with a TTL of at least 7 days to cover your retry window.

The receiver's idempotency check should be the first operation in your webhook handler — before any database writes, API calls, or business logic. This prevents any race condition where a duplicate delivery starts processing before the idempotency check completes. A Redis SETNX or database UPSERT with a unique constraint on the event ID are both clean implementations.

Monitoring Webhook Health in Production

Webhook delivery failures are silent by default — unless you instrument them. Build a delivery health dashboard that tracks: delivery success rate per receiver endpoint (target >99.5%), retry rate (what percentage of deliveries required more than one attempt), and dead letter queue volume (events that exhausted all retries without successful delivery).

Dead letter queue handling is often neglected. When an event exhausts all retries, it should be written to a dead letter queue — not silently dropped. This queue gives your operations team visibility into missed events and provides raw material for manual redelivery or investigation. Alert when the dead letter queue grows beyond a threshold; a spike in DLQ volume often indicates a systematic receiver issue rather than transient network problems.

For receiving webhooks from third parties — like PulsAPI's alert webhooks — implement a dedicated health check endpoint that confirms your receiver is processing correctly. Test your webhook receiver with PulsAPI's built-in test event functionality after any deployment or configuration change. A broken receiver that fails silently is worse than no webhook integration at all, because it creates false confidence that alerts are being delivered.

About the Author

Marcus WebbHead of Product

Start monitoring your stack

Aggregate real-time operational data from every service your stack depends on into a single dashboard. Free for up to 10 services.

Create Free Dashboard

Back to all articles