EngineeringMarch 26, 2026· 7 min read· By James Okafor

The Missing Layer in Your Observability Stack: Third-Party Cloud Dependencies

You have logs, metrics, and traces covered. But most observability stacks have a blind spot: the cloud services your application depends on but doesn't control.

The Three Pillars... and the Missing Fourth

The observability community has converged on three pillars: logs (what happened), metrics (how the system is performing), and traces (how requests flowed through your system). Tools like Datadog, Honeycomb, and Grafana have made these three pillars accessible and powerful for engineering teams of any size.

But there's a fourth dimension that the three pillars don't cover: the health of the third-party systems your application depends on. Your logs, metrics, and traces tell you everything about your code. They tell you almost nothing about why Stripe's webhook delivery suddenly started failing, or why AWS Lambda invocations are timing out in eu-west-2.

For most applications, 30 to 60 percent of production incidents have a third-party root cause. Yet the observability investments most teams make cover the remaining 40 to 70 percent. This isn't a failure of tools — it's a conceptual gap in how teams think about what they need to observe.

What Existing Observability Misses

When Stripe has a partial outage, your Datadog dashboard will show elevated error rates in your payments service and increased latency on Stripe API calls. But it won't tell you that these errors are caused by a Stripe incident, that 50,000 other businesses are experiencing the same thing, that Stripe's engineering team acknowledged it 8 minutes ago, or that the estimated resolution time is 45 minutes.

That contextual gap changes everything about how you respond. With only internal observability, you're debugging your own code for a problem that isn't in your code. With third-party observability, you immediately know this is an external issue, can shift from debugging to customer communication, and have a reasonable sense of how long to expect it to last.

Your existing APM and logging tools are also limited to what they can observe: your own services, your own infrastructure. Third-party status data has to come from the third parties themselves — their status pages, their APIs, supplemented by community signals from other engineers experiencing the same issues.

Adding the Fourth Pillar to Your Stack

The fourth pillar — external dependency observability — requires different tools and a different approach than the first three. You can't instrument third-party services with your agent. You can't query their internal metrics. You can monitor their public status signals, crawl their status pages, and aggregate community reports from engineers who share your dependencies.

PulsAPI is purpose-built for this fourth pillar. It's not a replacement for Datadog or Grafana — it's the layer that sits alongside them and fills the gap. When your internal metrics alert on symptoms, PulsAPI tells you whether those symptoms have a known external cause. When it does, your response changes from investigation to communication.

The practical setup is straightforward: add PulsAPI to your observability stack, configure monitoring for every service your application depends on, and integrate PulsAPI alerts into the same on-call tooling (PagerDuty, Slack, etc.) you use for internal alerts. The goal is that when an incident starts, your on-call engineer sees both the internal symptom (from Datadog) and the external cause (from PulsAPI) in the same notification feed.

Making the Case Internally

If you're trying to get buy-in for adding third-party observability to your stack, the most compelling argument is a recent incident retrospective. Look at your last 10 production incidents and identify how many had a third-party root cause. Estimate the time spent on triage before the external cause was identified. Multiply by your team's engineering cost per hour. For most teams, that number justifies any reasonable monitoring tool cost within a single incident.

The secondary argument is proactive communication. Engineering teams that can tell customers 'we're aware of a payment processing delay due to our provider, we're monitoring and will update you' before customers notice — rather than after — have measurably better retention and lower support volume during incidents. That's a product quality argument, not just an engineering efficiency one.

Third-party observability is not a nice-to-have for teams that run on cloud services. It's as fundamental as having error tracking. The only reason it's underinvested is that it doesn't fit neatly into the traditional three pillars framework — but that's an artifact of history, not of what engineering teams actually need.

About the Author

James OkaforCTO

James is CTO of PulsAPI. Before PulsAPI he was a staff engineer at a Series C infrastructure company where third-party outages were a constant operational pain. He started PulsAPI to solve the problem once and for all.

Start monitoring your stack

Aggregate real-time operational data from every service your stack depends on into a single dashboard. Free for up to 10 services.

Create Free Dashboard

EngineeringHow Community Reports Catch Outages 15 Minutes Before Official Status Pages Update6 min read EngineeringCloud Outage Report: Which Services Had the Most Downtime in Q1 20268 min read

Back to all articles