Circuit Breakers for Third-Party APIs: A Developer's Guide
When a third-party API degrades, your application shouldn't degrade with it. Here's how to implement circuit breakers that protect your users from cascading failures during vendor outages.
Why Your Application Needs Circuit Breakers
When a third-party API becomes slow or unavailable, applications without circuit breakers exhibit a predictable failure pattern: outbound requests start timing out, each timeout holds a thread or connection for the full timeout duration, connection pools fill up, and suddenly your entire application is degraded because 10% of your requests are waiting on a failing external dependency. This cascading failure is often more damaging to your users than the original vendor outage — and it's entirely preventable.
The circuit breaker pattern, popularized by Michael Nygard's 'Release It!' and integral to Netflix's Hystrix library, prevents this cascade. A circuit breaker wraps calls to an external dependency and monitors for failures. When the failure rate exceeds a threshold, the circuit 'opens' — subsequent calls fail immediately without actually calling the dependency, protecting your resources and returning a degraded-mode response to users. After a cooldown period, the circuit enters a 'half-open' state and allows a test request through; if it succeeds, the circuit closes and normal operation resumes.
Circuit breakers are not a substitute for monitoring — they're a complement to it. PulsAPI tells you when a vendor is having issues and routes alerts to your on-call team. Circuit breakers protect your application in the seconds before your team can respond, and during maintenance windows when you've pre-emptively opened them. Both layers are necessary: monitoring drives human response, circuit breakers drive automated resilience.
Implementing a Circuit Breaker: The Three States
The Closed state is normal operation. Requests pass through to the external API, and the circuit breaker records successes and failures. While closed, the breaker tracks a rolling window of recent requests — typically the last 10 to 100 calls or the last 60 to 120 seconds. If the failure rate within this window exceeds your threshold (commonly 50%), the circuit transitions to Open.
The Open state is the protection mode. All calls to the dependency fail immediately without network contact. Your application code catches the circuit-open exception and executes its fallback: return a cached response, show a degraded UI, queue the operation for retry, or return a structured error that your frontend handles gracefully. The Open state has a configurable timeout — typically 30 to 120 seconds — after which the circuit transitions to Half-Open to test recovery.
The Half-Open state is the recovery probe. A single request is allowed through to the dependency. If it succeeds, the circuit closes and normal operation resumes. If it fails, the circuit returns to Open for another timeout period. The Half-Open state prevents thundering herd — without it, every client would simultaneously retry at the same moment the timeout expires, potentially overwhelming a recovering service. A single probe request is far more respectful of the recovering service's capacity.
Configuring Thresholds and Fallbacks
Threshold configuration determines how sensitive your circuit breaker is. Too aggressive (opens on a single failure) and it trips on transient network blips, degrading your application unnecessarily. Too lenient (opens only when 80% of requests fail) and it allows significant degradation before activating. The right threshold depends on your dependency: for a payment processor, open on 30% failure rate over the last 20 requests. For a non-critical analytics API, you might tolerate 60% failure before opening.
Fallback design is where circuit breakers deliver their real value. A circuit breaker without a meaningful fallback just converts slow failures into fast failures — slightly better, but not genuinely resilient. For each third-party dependency, design a fallback that preserves user experience as much as possible: for a payment processor outage, disable checkout and show a clear message; for a search API outage, fall back to basic database queries; for a recommendation engine outage, show a static curated list; for an email delivery outage, queue emails locally for delivery on recovery.
Combine circuit breaker state with your monitoring setup for maximum operational clarity. When a circuit opens, that event should be logged and surfaced in your incident tooling — it's a signal that something is wrong with a dependency, even if your end-users aren't yet seeing degradation. Cross-reference circuit open events with PulsAPI's status data for the same vendor: if your Stripe circuit breaker opens at the same moment PulsAPI shows Stripe API degradation, attribution is immediate. If your circuit opens but PulsAPI shows Stripe as fully operational, the problem is likely in your integration layer, not the vendor.
Language and Framework Implementations
In Java and the JVM ecosystem, Resilience4j is the modern standard for circuit breakers, replacing the deprecated Hystrix. It provides thread-safe circuit breaker state machines, configurable failure rate thresholds, slow call rate tracking, and built-in metrics integration with Micrometer and Prometheus. Spring Boot applications can use Resilience4j's Spring Boot starter for annotation-based circuit breakers with near-zero boilerplate: a single @CircuitBreaker annotation and a fallback method is enough to protect an external API call.
In Node.js, the opossum library is a well-maintained circuit breaker implementation that handles both async functions and promises. For Python, pybreaker provides a straightforward implementation, and tenacity handles retry logic with backoff strategies that complement circuit breaker patterns. Go developers often implement circuit breakers manually given the language's simplicity, but gobreaker is a solid library option. In all ecosystems, the key implementation detail is ensuring that circuit breaker state is shared across all instances of your application — a per-instance circuit breaker provides less protection in horizontally scaled deployments.
For teams using service meshes like Istio or Linkerd, circuit breaking can be configured at the infrastructure level rather than the application level. Istio's DestinationRule resource supports connection pool limits and outlier detection that implement circuit breaker behavior for any service-to-service communication — including egress calls to external APIs when routed through the mesh. This approach moves circuit breaking out of application code entirely, making it uniformly applied across all services without per-service implementation work. The tradeoff is less granular fallback behavior — mesh-level circuit breaking can protect your infrastructure, but application-level fallbacks still require code.
About the Author
James is CTO of PulsAPI. Before PulsAPI he was a staff engineer at a Series C infrastructure company where third-party outages were a constant operational pain. He started PulsAPI to solve the problem once and for all.
Start monitoring your stack
Aggregate real-time operational data from every service your stack depends on into a single dashboard. Free for up to 10 services.