Every modern application depends on third party APIs. Payment processing through Stripe, email delivery through SendGrid, maps through Google, authentication through OAuth providers. The average SaaS product we ship integrates with 5 to 12 external services. The difference between an application that runs smoothly and one that wakes your team up at 3 AM is how those integrations are built.
Third party APIs will go down. They will change their response formats. They will rate limit you during your busiest hour. They will have regional outages that affect half your users. The question is not whether these things will happen, but whether your system handles them gracefully.
The Wrapper Pattern: Never Call an API Directly
The single most important pattern for third party integration is the service wrapper. Never scatter direct API calls throughout your codebase. Every external service gets its own module with a consistent interface.
This means your payment processing module exposes functions like `createCharge`, `refundCharge`, and `getCustomer`, not Stripe specific method calls. Your email module exposes `sendTransactional` and `sendBulk`, not SendGrid specific payloads. When Stripe changes their API or you switch from SendGrid to Postmark, you update one file instead of hunting through 40 controllers.
On a project we delivered through our full stack development work, a client needed to switch payment processors from Braintree to Stripe mid project. Because we had wrapped Braintree behind a clean interface, the migration took 3 days instead of the 3 weeks their previous team estimated. The rest of the application had no idea which processor was behind the wrapper.
Retry Logic: The Details Matter
A naive retry strategy, try again immediately 3 times, is worse than no retry at all. It amplifies load on an already struggling service and can trigger rate limits that make your outage longer.
Exponential backoff with jitter is the standard we implement on every integration:
- First retry after 1 second plus random 0 to 500ms
- Second retry after 2 seconds plus random 0 to 500ms
- Third retry after 4 seconds plus random 0 to 500ms
- Maximum of 3 to 5 retries depending on the operation
The jitter is critical. Without it, if 1,000 requests fail simultaneously, all 1,000 retry at exactly the same intervals, creating a thundering herd that overwhelms the recovering service. Jitter spreads retries across time windows and dramatically improves recovery behavior.
Not all errors are retryable. A 400 Bad Request will return 400 forever. Retrying it wastes time and resources. A 429 Too Many Requests should be retried after the `Retry-After` header interval. A 500 Internal Server Error is a candidate for retry. A 401 Unauthorized means your credentials are wrong, and retrying will get you rate limited. We maintain explicit retry policies per status code, not a blanket retry on failure approach.
Circuit Breakers: Stop Calling a Dead Service
When a third party API is down, continuing to send requests to it harms your application in two ways: your users wait for timeout periods on every request, and you consume connection pool resources that could serve healthy requests.
A circuit breaker tracks failure rates and trips open after a threshold. In the open state, requests fail immediately without attempting the API call, typically returning a cached response or graceful degradation. After a cooldown period, the circuit enters half open state and allows a single test request through. If it succeeds, the circuit closes and normal operation resumes.
We typically configure circuit breakers with:
- Failure threshold: 50 percent failure rate over a 30 second window
- Open duration: 30 to 60 seconds before testing recovery
- Minimum sample size: At least 10 requests before evaluating the failure rate
This pattern saved a client significant downtime when their geocoding provider had a 45 minute outage. Instead of 45 minutes of degraded performance with slow timeouts on every request, users saw cached location data within 2 seconds of the circuit tripping, and fresh data resumed automatically when the service recovered.
Webhook Handling: The Forgotten Integration Point
Webhooks are how third party services push data to your application: payment confirmations, subscription changes, delivery notifications. They are also the most poorly implemented integration point in most codebases we audit.
Webhook handlers must be idempotent. Services will send the same webhook multiple times. Stripe explicitly documents this. If your handler processes a payment confirmation twice and credits the user twice, you have a financial bug that might not surface for weeks.
Every webhook handler we build follows this pattern:
1. Verify the signature immediately. Reject unsigned or tampered webhooks before any processing.
2. Store the raw payload with the webhook ID. This gives you an audit trail and replay capability.
3. Check for duplicate delivery using the webhook ID. If you have seen it before, acknowledge receipt and skip processing.
4. Process asynchronously when possible. Return a 200 response quickly and handle the business logic in a background job. If your handler takes 15 seconds, the webhook provider may time out and retry, leading to duplicate processing.
Rate Limit Management
Every API has rate limits. The question is whether you discover them during development or during a production incident.
Proactive rate limit tracking means monitoring your consumption against known limits and throttling outbound requests before you hit the ceiling. On projects where we handle system architecture, we implement a token bucket or sliding window rate limiter on the client side that matches the provider's published limits.
For APIs with aggressive rate limits, we use request queuing. Instead of firing API calls synchronously from request handlers, we enqueue them and process at a controlled rate. This adds latency but prevents the cascade where hitting a rate limit causes retries, which cause more rate limit hits.
Testing Integrations Without the Third Party
You cannot depend on third party APIs for your test suite. They are slow, they have rate limits, and they are flaky in CI environments. But you also cannot skip testing integrations entirely.
Our approach uses three layers:
1. Unit tests with recorded responses. Tools like Polly.js or VCR record real API responses and replay them in tests. This validates your parsing and error handling against real data shapes.
2. Contract tests that verify your wrapper is expectations match the provider's actual API. These run less frequently (daily or weekly) and catch breaking changes in third party APIs before they hit production.
3. Staging environment with sandbox APIs. Stripe Test Mode, SendGrid sandbox, and similar environments let you test full flows without real side effects. Every integration we ship has a sandbox configuration.
We covered related testing considerations in our guide on how to build a SaaS product, where integration testing is a recurring theme.
Graceful Degradation: What Happens When the API Is Gone
The best integrations degrade gracefully instead of failing catastrophically. This means defining, for every third party dependency, what the user experience should be when that service is unavailable.
- Payment processing down: Queue the charge attempt and notify the user that processing is delayed. Do not lose the order.
- Email service down: Queue emails for delivery when service recovers. Show a UI notification instead of a confirmation email.
- Search provider down: Fall back to basic database search with reduced quality but functional results.
- Analytics service down: Buffer events locally and flush when service recovers. Never let analytics failures affect user experience.
This sounds obvious, but in our experience, fewer than 20 percent of applications we audit have defined degradation paths for their critical integrations. Most simply show an error page or silently fail.
Versioning and Migration
Third party APIs deprecate versions regularly. Stripe deprecates API versions roughly every 12 to 18 months. If you are calling the API with a hardcoded version, you will eventually face a forced migration.
Pin your API version explicitly in your wrapper configuration, not in scattered API calls. When a new version is available, you update the version in one place, update your response parsing, run your contract tests, and deploy. This is another benefit of the wrapper pattern: version migrations are isolated changes.
For teams weighing whether to build custom integrations or buy a pre built solution, the answer almost always favors custom wrappers for critical integrations. Pre built connectors trade initial speed for long term fragility. When you need to handle edge cases, retry logic, or graceful degradation, you need code you control.
Monitoring and Alerting
Every third party integration needs its own monitoring:
- Response time percentiles (p50, p95, p99). A third party slowing down will slow down your application.
- Error rate by status code. Distinguish between your bugs (400s) and their outages (500s).
- Rate limit consumption. Alert at 80 percent of your limit, not 100 percent.
- Webhook delivery latency. If webhooks are arriving late, your data is stale.
If you are building an application that depends on multiple third party services and want integrations that survive real world conditions, let us know what you are working on. We build integrations that handle failure as a normal operating condition, not an afterthought.