Every product decision you make gets filtered through your database schema. A well designed schema makes features easy to build, queries fast to execute, and migrations painless to run. A poorly designed schema does the opposite, and the pain compounds over time. Every new feature has to work around the decisions you made in month one, and the workarounds get uglier as the product grows.
We have inherited enough codebases to know that schema problems are rarely obvious when they are introduced. They show up months later when a query that used to take 20 milliseconds now takes 4 seconds. Or when a "simple" feature requires migrating a table with 50 million rows. Or when you realize your multi tenant data is not actually isolated and you have a security incident on your hands.
This is not a tutorial on SQL. It is a guide to the architectural decisions that separate schemas you can live with from schemas that slowly strangle your product.
Start With Your Access Patterns, Not Your Data Model
The most common schema design mistake is starting with the data model. You think about the entities in your system (users, orders, products, invoices) and create a table for each one with all the columns you can think of. This produces a normalized, theoretically clean schema that is miserable to query for your actual use cases.
Design your schema around how your application reads and writes data, not around an abstract entity relationship diagram.
Ask these questions before creating a single table:
- What are the 10 most common queries this application will run?
- Which of those queries need to be fast (under 50ms)?
- What data is always accessed together?
- What data is written frequently vs. read frequently?
- What data needs to be filtered, sorted, or aggregated?
The answers to these questions should drive your schema decisions. If your application always displays a user's name alongside their most recent order, and that query runs on every page load, your schema should make that query trivial, not require a three table join with a subquery.
Normalization Is a Starting Point, Not a Religion
Database normalization (eliminating redundancy by splitting data into related tables) is taught as the correct way to design schemas. And it is a good starting point. Third normal form prevents update anomalies and keeps your data consistent.
But normalization has a cost: joins. Every time you split data into a separate table for normalization, you add a join to every query that needs that data. Joins are not free. They get more expensive as tables grow, especially when they involve multiple large tables.
The right approach is to normalize by default and denormalize intentionally. When you identify a read path that is critical to your product's performance and it requires expensive joins, consider denormalizing. Store the user's name on the order record. Store the product price at the time of purchase on the line item. Store the organization name on every row that belongs to that organization.
Denormalization trades storage (cheap) for query performance (valuable). The cost is that you need to keep denormalized data in sync, which means updating multiple records when the source data changes. This is a manageable tradeoff when you do it deliberately for specific access patterns. It becomes unmanageable when you do it haphazardly across your entire schema.
Keys, Indexes, and Constraints
Use UUIDs for primary keys in any table that will be exposed through an API. Sequential integer IDs leak information (how many users you have, how many orders were placed between two dates) and create enumeration attacks where someone can guess valid IDs by incrementing. UUIDs solve both problems. The performance difference between UUID and integer primary keys in PostgreSQL with proper indexing is negligible for datasets under hundreds of millions of rows.
Index based on your queries, not your columns. Do not create an index on every column. Indexes speed up reads but slow down writes and consume storage. For every index you create, ask: what query does this serve? If you cannot point to a specific query, the index probably should not exist. Use EXPLAIN ANALYZE to validate that your indexes are actually being used.
Use composite indexes deliberately. A composite index on (tenant_id, created_at) is not the same as two separate indexes on tenant_id and created_at. The composite index is useful when you query with both columns together (which, in a multi tenant SaaS, you almost always do). Understand the column order matters: the index is most effective when you filter on the leftmost columns.
Never skip foreign key constraints. We have seen teams omit foreign keys "for performance" and end up with orphaned records, broken joins, and data inconsistencies that take weeks to clean up. Foreign keys are not just for referential integrity. They document the relationships in your schema and prevent entire categories of bugs. The write performance impact is minimal compared to the debugging time you save.
Add CHECK constraints for business rules. If a price should never be negative, add a CHECK constraint. If a status column should only contain specific values, use an ENUM type or a CHECK constraint. Pushing business rules into the database means they are enforced regardless of which application, migration script, or admin tool touches the data. This is part of what we cover in our PostgreSQL vs MongoDB comparison, where schema enforcement is a core differentiator.
Multi Tenancy From Day One
If you are building a SaaS product, every table that contains tenant data needs a tenant_id column from the very first migration. Not "we will add it later." Not "we will handle it in the application layer." Every table, every query, every index.
Row Level Security (RLS) in PostgreSQL is the strongest pattern for tenant isolation. It enforces tenant scoping at the database level, which means even a bug in your application code cannot leak data across tenants. Your application sets the current tenant on each request, and every query automatically filters to that tenant.
The schema decisions that support this:
- Every tenant scoped table has a tenant_id column with a foreign key to your tenants table
- Every index on a tenant scoped table includes tenant_id as the first column
- RLS policies reference the tenant_id and the current session variable
- Your migration process includes adding RLS policies for every new table
We have migrated products from application level tenant filtering to database level RLS, and it is always painful. The query changes are manageable, but finding every place in the codebase where a query was missing the tenant filter is terrifying. Architect for this from the start.
Timestamps, Soft Deletes, and Audit Trails
Every table should have created_at and updated_at columns. These are invaluable for debugging, analytics, and data reconciliation. Use database defaults (now() for created_at, a trigger for updated_at) so they are set automatically regardless of which code path inserts or updates the row.
Think carefully before implementing soft deletes. The pattern of adding a deleted_at column and filtering it out of every query seems simple, but it has cascading effects. Every query, every index, every unique constraint, and every count needs to account for deleted records. If you need soft deletes, consider moving deleted records to an archive table instead. This keeps your active tables clean and your queries simple.
Build audit trails for sensitive operations. If your product handles financial data, healthcare records, or any regulated domain, you need an immutable log of who changed what and when. This is a separate table (or set of tables) that captures the old and new values for every significant change. PostgreSQL triggers are excellent for this, they capture changes at the database level regardless of which application made them.
Schema Migration Strategy
Your schema will change. Products evolve, requirements shift, and you will need to add columns, create tables, and modify constraints in production. How you handle migrations is as important as the initial schema design.
Every migration must be reversible. Write both the up and down migration. Even if you never run the down migration in production, the discipline of writing it forces you to think about what the migration actually changes and whether it can be safely rolled back.
Never run destructive migrations without a maintenance window. Dropping columns, changing column types, and adding NOT NULL constraints to existing columns can lock tables and break running queries. For large tables, these operations can take minutes or hours. Plan accordingly with your cloud and DevOps setup to minimize downtime.
Use expand and contract for breaking changes. If you need to rename a column, do not rename it in one migration. Add the new column, backfill the data, update the application to use the new column, and then drop the old column in a separate migration. This lets you deploy the application change and the schema change independently, which is essential for zero downtime deployments.
Test migrations against production sized data. A migration that runs in 200ms on your development database with 100 rows might lock a production table with 10 million rows for 20 minutes. Always test migrations against a copy of production data before deploying.
The Schema Review Checklist
Before you ship any new table or significant schema change, run through this:
- Does every tenant scoped table have a tenant_id with a foreign key and an RLS policy?
- Are the primary keys appropriate for the use case (UUIDs for API exposed entities)?
- Do the indexes match the actual query patterns?
- Are business rules enforced with constraints, not just application code?
- Is the migration reversible and tested against production sized data?
- Are created_at and updated_at present with proper defaults?
If you are designing a database schema for a new product, or your existing schema is creating problems you did not anticipate, let us know. We will review your data model and make sure it supports where your product is headed, not just where it is today.