APM vs Infrastructure Monitoring vs Status Page Monitoring: What Each One Actually Sees
APM watches your code, infrastructure monitoring watches your servers, and status page monitoring watches everyone else's services. They overlap less than most teams assume — and the gaps are where outages live.
Three Tools, Three Disjoint Views of Reality
Application Performance Monitoring (APM), infrastructure monitoring, and status page monitoring sound like adjacent tools that should overlap heavily. In practice, they look at completely different layers and the union of all three is still incomplete. Most production incidents fall somewhere in the gaps between them.
APM (Datadog APM, New Relic, Dynatrace, Honeycomb, Sentry) instruments your application code. It traces a request through your services, measures the latency of each function call and database query, and surfaces errors in the code paths your traffic actually takes. Its scope is your code and the services your code calls directly.
Infrastructure monitoring (Datadog Infra, Prometheus + node_exporter, CloudWatch, Grafana) watches the hardware and orchestration layer underneath your code. CPU, memory, disk, network, container health, Kubernetes pod state. Its scope is the boxes your code runs on and the platform that schedules them.
Status page monitoring (StatusGator, IsItDownRightNow, PulsAPI) watches the third-party services you depend on but do not operate. AWS, Stripe, Twilio, Auth0, GitHub, your CDN, your DNS provider. Its scope is everyone else's outages — and how those outages map onto your product.
What APM Actually Sees (and Misses)
APM is unmatched for diagnosing problems inside code that you wrote. A spike in API latency? APM shows you which method calls grew slower, which database query is the culprit, which downstream service is taking 300ms longer than yesterday. For applications with complex internal flows, no other tool answers 'why is this request slow?' as efficiently.
What APM misses: anything that happens before the request reaches your code. A misbehaving CDN that adds 2 seconds of latency before forwarding the request, a load balancer holding connections, a DNS resolution that intermittently times out, a TLS handshake that occasionally fails — APM sees the request only after all of these complete. From APM's perspective, the request just took a while to arrive, with no obvious cause.
APM also has a blind spot for cost-of-traffic effects. A service might be running fine for the 5% of users APM happens to instrument and broken for the 95% behind a feature flag, a different region, or a CDN cache that returns stale data. APM samples real traffic but doesn't know what traffic should be flowing — only what is. Pair APM with active probes (which know what should be flowing) to close that gap.
What Infrastructure Monitoring Actually Sees (and Misses)
Infrastructure monitoring catches a category of failure that APM is structurally blind to: when the boxes themselves are unhealthy. Memory pressure leading to OOM kills, disk filling up, a node losing network connectivity, a Kubernetes scheduler failing to place pods, container runtime crashes. These manifest as application errors eventually, but the root cause is visible at the infrastructure layer first — sometimes by 5–15 minutes.
Infrastructure monitoring also handles capacity planning, which APM cannot. CPU at 60% sustained means the next traffic spike will hurt; memory growing 100MB per hour means a leak will cause an outage in 18 hours; disk at 85% on a logging volume means you have 48 hours before logs start failing. None of these are 'currently broken' signals, but all of them predict near-future incidents.
What infrastructure monitoring misses: the actual user experience. A node can be at 30% CPU, 40% memory, and 100% network healthy while the application running on it is returning 500 errors for every request because of an internal logic bug or a third-party dependency outage. Infrastructure monitoring will show all green and your customers will be furious. This is the most common 'green dashboard, broken product' scenario in 2026 — infrastructure-only visibility on a dependency-heavy stack.
What Status Page Monitoring Actually Sees (and Misses)
Status page monitoring exists because APM and infrastructure monitoring are blind to everything outside your network. When AWS S3 in us-east-1 has an issue, your APM will show elevated error rates somewhere in your stack and your infrastructure monitoring will show all green — but neither will tell you the cause is a third-party service you don't control. You'll spend the first 20 minutes of the incident debugging your own code.
Status page monitoring (PulsAPI, StatusGator, and similar services) aggregates the official status pages of every service you depend on, plus community-reported issues, plus its own active probes against those services. When Stripe's payment API starts failing, you see it in seconds — usually before Stripe themselves update their status page, because PulsAPI's probes detect the issue independently. That attribution speed turns a 30-minute 'is it us?' triage into a 30-second 'it's them, here's the workaround' alert.
What status page monitoring misses: anything that's happening only inside your stack. A bug in your code, a memory leak on your servers, a misconfigured deployment — these are invisible to a tool watching third parties. Status monitoring is exclusively about the dependency layer; it doesn't try to replace APM or infrastructure tools, just fills the gap they leave.
The Three-Layer Setup That Actually Works
The teams that triage incidents fastest run all three and use them in a specific order. When an alert fires, the on-call engineer asks three questions in sequence: (1) is a third-party service degraded right now? (status page monitoring answers in seconds); (2) is the underlying infrastructure healthy? (infrastructure dashboards answer in a minute); (3) where in our code is the request slowing down or failing? (APM answers in 2–5 minutes once the first two are ruled out).
Skipping any layer makes the next layer's work harder. If you don't know whether Stripe is up, you'll waste time chasing internal causes for a third-party outage. If you don't know whether your nodes are healthy, you'll waste time blaming application code for an infrastructure issue. If you don't have APM, you'll know something is wrong but won't be able to pinpoint where.
The cost of running all three is real but bounded. APM and infrastructure monitoring are typically a single combined vendor (Datadog, New Relic) or a self-hosted Prometheus + Grafana setup. Status page monitoring is a separate, lightweight tool because it has different data sources and a different operational model — it watches services you don't own. PulsAPI is purpose-built for this layer and is designed to integrate with whatever APM and infrastructure tools you already run, so the three views show up side by side in a single incident dashboard.
If you're auditing your current setup, the fastest tell that you're missing a layer: count how often, post-incident, the root cause was 'a third-party service had an issue we didn't know about.' If that number is more than zero per quarter, you have a status page monitoring gap. If post-incident reviews regularly land on 'we ran out of memory and didn't see it coming,' you have an infrastructure monitoring gap. If they land on 'a slow database query took 30 minutes to identify,' you have an APM gap. Each layer pays for itself by closing one specific category of blind spot.
About the Author
Marcus leads product at PulsAPI, where he focuses on making operational awareness effortless for engineering teams. Previously at Datadog and PagerDuty.
Start monitoring your stack
Aggregate real-time operational data from every service your stack depends on into a single dashboard. Free for up to 10 services.