Load Testing: Find Your Breaking Point Before Users Do

Veld Systems||7 min read

Every application has a breaking point. A number of concurrent users, a volume of requests per second, or a size of database query result set where performance degrades from acceptable to unusable. The question is whether you discover that number in a controlled test or during your biggest traffic day of the year.

We load test every system we build before it goes to production, and we regularly load test systems we manage for clients. The results are always illuminating. Applications that feel fast with 50 users often collapse at 500. Database queries that return in 20 milliseconds with 10,000 rows take 8 seconds with 1 million rows. APIs that handle 100 requests per second gracefully start dropping connections at 300.

Finding these limits before your users do is one of the most valuable investments you can make in reliability.

What Load Testing Actually Measures

Load testing is not just "hit the application with a lot of traffic and see what happens." There are distinct types of load tests, each answering a different question:

Baseline testing establishes your normal performance characteristics. What are response times at typical traffic levels? What does CPU, memory, and database utilization look like during a normal business day? You need this baseline before you can identify degradation.

Stress testing gradually increases load until the system breaks. This tells you your capacity ceiling, the exact point where response times become unacceptable or errors start appearing. Knowing this number lets you set autoscaling thresholds and capacity planning targets.

Spike testing simulates sudden traffic surges, a product launch, a viral social media post, or a marketing campaign that performs better than expected. The question here is not just whether the system handles the load, but how quickly it recovers after the spike subsides.

Soak testing runs sustained load over an extended period (12 to 72 hours) to identify memory leaks, connection pool exhaustion, disk space accumulation, and other issues that only manifest over time. A system that handles 200 requests per second for 5 minutes might fail at the same rate after 6 hours because of a slow memory leak.

Breakpoint testing increases load in steps until the system fails completely, then measures how the system behaves during and after failure. Does it degrade gracefully? Does it crash hard and require a restart? Does it recover automatically when load decreases?

Choosing the Right Tools

The load testing tool market is crowded, but for most teams we recommend starting with one of these:

k6 is our default choice. It uses JavaScript for test scripts, integrates with CI/CD pipelines, and can generate significant load from a single machine. For distributed testing, k6 Cloud handles orchestration across multiple regions.

Artillery is another strong option, particularly for teams already invested in Node.js tooling. It supports HTTP, WebSocket, and Socket.io protocols out of the box.

Locust is the go to for Python teams. It defines user behavior in Python code, which makes complex test scenarios easy to write if your team is already comfortable with Python.

Gatling is the standard in JVM based environments. It produces excellent reports and handles complex scenarios well.

The tool matters less than the practice. Pick one your team will actually use and maintain. A simple k6 script that runs in every CI/CD pipeline is worth more than a complex Gatling suite that nobody runs because it takes an hour to set up.

Writing Realistic Test Scenarios

The most common load testing mistake is testing endpoints in isolation. Real users do not send 10,000 GET requests to your homepage. They log in, browse products, add items to a cart, check out, view their order history, and update their profile. A realistic load test simulates this entire user journey.

Model your test scenarios on actual user behavior:

- What percentage of traffic is authenticated versus anonymous? Most applications see 60 to 80% of traffic from logged in users

- What is the distribution of requests across endpoints? Your API probably has a few hot endpoints that receive 80% of traffic

- What is the typical think time between actions? Real users pause between clicks. If your test fires requests with zero delay, you are testing a scenario that will never happen

- What data sizes are realistic? Test with production scale data, not an empty database

For Traderly, our load testing scenarios modeled the actual user flow: browse listings, search with filters, view item details, make an offer, complete a purchase. We weighted each action based on analytics data. This revealed that the search endpoint, not the checkout flow, was the bottleneck, because every product listing page triggered multiple filtered queries against a catalog with hundreds of thousands of items.

Where Bottlenecks Actually Live

After running hundreds of load tests across different applications, here is where performance bottlenecks typically appear, ranked by frequency:

Database queries, number one by a wide margin. Unindexed queries, N+1 problems, missing connection pooling, and queries that scan full tables under load. A query that returns in 5 milliseconds with 10 concurrent users can take 2 seconds with 500 concurrent users if it is locking rows or scanning without indexes. This is why we are strong advocates for PostgreSQL with proper indexing and query optimization.

Connection pool exhaustion. Your application server has a connection pool to the database. When all connections are in use, new requests queue. If the queue exceeds its limit, requests start failing. Default connection pool sizes (often 10 to 20) are far too small for applications handling hundreds of concurrent users.

Memory leaks. Applications that allocate memory without properly releasing it will eventually exhaust available RAM. This often manifests as gradually increasing response times followed by a sudden crash. Soak tests catch these.

Third party API rate limits. If your application calls external APIs on the hot path (payment processing, geolocation, email verification), those APIs have rate limits. Your load test will hit them before your users do, which is exactly the point.

Serialization bottlenecks. Rendering large JSON responses, compressing payloads, or processing file uploads can become CPU bound under load. This is especially common in Node.js applications where a single CPU intensive operation blocks the event loop.

Setting Performance Budgets

A load test without acceptance criteria is just a benchmarking exercise. Define your performance budgets before you test:

- p50 response time (median): Under 200 milliseconds for API calls, under 1 second for page loads

- p95 response time: Under 500 milliseconds for API calls, under 2.5 seconds for page loads

- p99 response time: Under 1 second for API calls, under 5 seconds for page loads

- Error rate: Under 0.1% at expected load, under 1% at peak load

- Throughput: Minimum requests per second the system must handle

These are starting points. Adjust based on your application type and user expectations. A real time trading platform needs sub 100 millisecond p99 latency. A content management system can tolerate higher latency. Your website performance goals should inform these budgets.

Integrate these budgets into your CI/CD pipeline as automated gates. If a deployment causes p95 latency to exceed your budget, the pipeline fails and the deployment does not proceed. This catches performance regressions before they reach production.

Load Testing in CI/CD

Load tests should not be something you run once before launch and never again. They should run automatically on every significant change.

Our recommended approach:

- On every pull request: Run a lightweight load test (60 seconds, moderate traffic) against a preview environment. This catches obvious regressions

- On every deployment to staging: Run a full load test (5 to 10 minutes, production level traffic) and compare results against the previous baseline

- Weekly or monthly: Run extended soak tests and stress tests against a production mirror to catch gradual degradation

Store results historically. A slow, steady increase in p95 latency over 3 months is just as dangerous as a sudden spike, but you will only notice it if you are tracking the trend.

Capacity Planning from Load Test Data

Load test results feed directly into capacity planning decisions. If your stress test shows the system handles 500 concurrent users on 2 application servers, and your growth projections suggest you will hit 500 concurrent users in 6 months, you know exactly when you need to scale.

This data also informs cloud infrastructure decisions. If your load test shows linear scaling (doubling servers doubles capacity), horizontal autoscaling is the right strategy. If your bottleneck is a single database instance, throwing more application servers at the problem will not help. You need database read replicas, caching layers, or query optimization.

Do Not Wait for the Traffic Spike

The worst time to discover your application cannot handle load is during a product launch, a marketing campaign, or an organic traffic surge. Those are the moments when performance matters most, and they are the moments when you have the least time to fix problems.

Load testing is insurance that pays for itself the first time it catches a problem. A few hours of testing can prevent hours of downtime, thousands of dollars in lost revenue, and immeasurable damage to user trust.

We run load testing as part of our cloud and DevOps engagements because we believe every production system should know its limits before users find them. If you do not know what happens to your application at 10x your current traffic, let us find out for you.

Ready to Build?

Let us talk about your project

We take on 3-4 projects at a time. Get an honest assessment within 24 hours.