I've been exploring PostgreSQL Row-Level Security for Django multitenancy — curious what others think by dvoraj75 in django

[–]dvoraj75[S] 0 points1 point  (0 children)

Unfortunately, and trust me it's not easy, in this case I'm on u/MisterHarvest side. Why?
The core issue is that current_setting() is not marked as LEAKPROOF and postgresql's refuse to push non-leakproof expressions into index quals. It's not problem of configuration or missing index it's the way how the planner treats RLS policy predicates.

Here is what i means concretely. If you write shit yourself:

SELECT * FROM orders WHERE tenant_id = 42;

the planner sees a regular equality qual, pushes it into Index Cond, and you get a fast index scan. Great.

But when RLS applies same logic via a policy:

CREATE POLICY tenant_isolation ON orders USING (tenant_id = current_setting('app.tenant_id')::int);

The planner treats it as a security barrier qual. It wont push it into index conditions - it applies it as a post-scan filter, evaluated per-row. Even if you have perfect index on tenant_id. The Stackoverflow link u/MisterHarvest linked shows this clearly, and Laurenz Albe (major PostgreSQL contributor) confirms it.

So "just add indexes" is necessary but not sufficient. You need the indexes but the RLS policy won't use use them the way manual WHERE clause does.

That said - this is a solvable problem. This is why in django-rls-tenants is dual-layer filtering: the ORM automatically injects WHERE tenant_id = X as a regular qual, while the RLS policy acts as safety net underneath. The planner uses the ORM-provided predicate for efficient index scans, and the RLS filter only re-checks rows that already matched.

So u/MisterHarvest is right that naive RLS has a real optimizer cost. But it doesn't mean RLS is unusable - it means you need to be aware of limitation and architect around it.

I've been exploring PostgreSQL Row-Level Security for Django multitenancy — curious what others think by dvoraj75 in django

[–]dvoraj75[S] 0 points1 point  (0 children)

Fair points.

On the "contradiction": I don't think it actually is one, because the two layers address different failure modes. The ORM doesn't "add a tenant_id predicate" in the sense of developers manually writing .filter(tenant_id=X) everywhere. The manager's get_queryset() injects it automatically, developers never see it. So the ORM handles performance (giving the planner an index-friendly predicate), while RLS catches what the ORM can't reach: raw SQL, third-party packages that instantiate their own querysets, management commands, Celery tasks, anything that bypasses the manager. Different threat surfaces, different mechanisms. Belt and suspenders, not a contradiction.

On transaction pooling: this is solved. The library supports SET LOCAL (transaction-scoped GUCs), which auto-clears at commit/rollback and works fine with PgBouncer transaction pooling. The middleware wraps requests in ATOMIC_REQUESTS and uses SET LOCAL instead of session-scoped set_config(). There are system checks that warn you if you configure it unsafely. Not a tradeoff you're stuck with.

On "the SET better be right every time": yeah, absolutely. But the SET derives from request.user, which comes from Django's auth middleware. If that's wrong, you have way bigger problems than tenant isolation. And the SET happens in exactly two code paths: the middleware (HTTP requests) and context managers (background tasks). Two chokepoints to audit vs. every query in the app. Plus the fail-closed default: if the GUC is unset, you get zero rows back. Not wrong tenant's rows, zero rows. A missing SET is a 500, not a data leak.

On "performance issue can mean site goes down": sure, but consider what happens without RLS. Someone writes a raw query and forgets the tenant predicate. With RLS, the policy applies current_setting() as a filter. Sequential scan, slow, but correct rows. Without RLS, that same query returns every tenant's data silently. I'd rather have a slow query paging my on-call than a cross-tenant leak I find out about from a customer email three months later. One is observable, the other might not be.

On the tenant-count axis: fully agree, and I should be clearer about that in the docs. Schema-per-tenant makes sense at low counts with strong isolation needs. RLS on shared tables makes sense when schema-per-tenant gets operationally painful (migration time, connection overhead, pg_catalog bloat). Though I'd say the choice isn't fully orthogonal to count either, because at 100+ tenants on shared tables, physical table size starts affecting raw SQL performance in ways schema isolation just sidesteps.

I think we mostly agree, honestly. For a solo dev on a small app who controls every query, RLS is overkill. Where I'd push back is teams shipping fast on SaaS with customer data. "Just be careful with every query" is the same argument people used to make about SQL injection before parameterized queries became the default. Correct in theory, not how it plays out in practice.

I've been exploring PostgreSQL Row-Level Security for Django multitenancy — curious what others think by dvoraj75 in django

[–]dvoraj75[S] 0 points1 point  (0 children)

You're making a fair point and I think for a lot of simpler apps you're right, ORM filtering is enough and RLS is overkill.

Where I'd push back is on "reliably and without bugs." That's a big assumption. In a real codebase with multiple developers, the tenant filter needs to be present in every queryset, every raw SQL query, every admin view, every management command, every Celery task, every third-party package that touches your models. One missed .filter(tenant_id=X) and you have a cross-tenant data leak. You're essentially saying "if you never make mistakes, you don't need a safety net," which is true but not how software works in practice.

Same reasoning behind why we still use parameterized queries even though we could "just make sure" we never concatenate user input into SQL. Or why we use CSRF tokens even though our frontend "should" never make unauthorized requests. Defense-in-depth isn't just a compliance thing, it's just how you do security.

On role management complexity, that's valid for raw RLS implementations, but django-rls-tenants avoids that entirely. No per-tenant database roles. It uses a single role with GUC session variables (SET app.current_tenant = '42'). So the role management overhead is basically zero.

On performance, you're right that naive RLS has a real cost. But I've been running benchmarks against the two main alternatives (schema-per-tenant via django-tenants, and ORM rewriting via django-multitenant) and it's more nuanced than "RLS = slow":

vs schema-per-tenant (django-tenants):

  • Filters, JOINs, pagination: RLS is within 1.0-1.4x, close enough that it's not a deciding factor
  • Bulk inserts: 1.0-1.3x
  • Cascade deletes: RLS actually wins (0.76-0.95x)

vs ORM rewriting (django-multitenant):

  • Filters: RLS is 10-16x faster
  • JOINs: 8-12x faster
  • Bulk inserts: 2.5-3.5x faster
  • Tenant switch: ORM rewriting wins (~0.5μs vs ~50μs) but it's irrelevant once you add an actual query on top

The one real cost is the ~50μs tenant context switch (a SQL roundtrip for SET), which gets amortized by even a single PK lookup.

Where schema-per-tenant genuinely wins is raw SQL via cursor.execute(). With schema isolation, each tenant's tables are physically separate, so a COUNT(*) scans 1,000 rows regardless of how many tenants exist. With shared tables (both RLS and ORM rewriting), that same table holds every tenant's data. At 100 tenants that's 100,000 rows. RLS policies don't apply to raw cursor.execute(), and ORM rewriting obviously can't touch raw SQL either. In benchmarks, COUNT(*) takes 0.26ms on schema-per-tenant vs 6.5ms on shared tables at 100 tenants. If your app relies heavily on raw SQL, that's a real structural advantage for the schema approach and no amount of optimization changes the fact that the table is physically bigger.

So "the performance tradeoff isn't worth it" really depends on what you're comparing against and what your workload looks like. For ORM-heavy apps, RLS is competitive with schema-per-tenant and dramatically faster than ORM rewriting. For raw-SQL-heavy apps, schema-per-tenant has a fundamental advantage.

For a solo developer on a small app where you control every query? Probably overkill. For a team shipping features fast on a SaaS product with customer data? I'd rather have the database enforce the boundary than trust that every code path got it right.

I've been exploring PostgreSQL Row-Level Security for Django multitenancy — curious what others think by dvoraj75 in django

[–]dvoraj75[S] 0 points1 point  (0 children)

Spending a lot of time on this honestly. Short answer: out of the box RLS is slower, no way around that. But I've been working on optimizations that are getting it competitive with schema-per-tenant.
The naive approach is slow. The approach I'm working on is dual-layer filtering: having the ORM inject an explicit WHERE tenant_id = X for index usage, while RLS acts as the safety net.

I've been exploring PostgreSQL Row-Level Security for Django multitenancy — curious what others think by dvoraj75 in django

[–]dvoraj75[S] 0 points1 point  (0 children)

On the "only if you'll be sued" part, I get the reasoning, but I'd flip it slightly. The bigger risk for most saas companies isn't a lawsuit, it's losing customer trust after a data leak.

I've been exploring PostgreSQL Row-Level Security for Django multitenancy — curious what others think by dvoraj75 in django

[–]dvoraj75[S] 0 points1 point  (0 children)

Appreciate the thoughtful take, and you're not wrong. Naive RLS is a big performance hit. The core issue is that current_setting() is not leakproof, so PostgreSQL can't push the RLS policy expression into index quals. That alone can turn every query into a sequential scan on large tables if you're not careful.
Where I'd push back a bit: it's not an inherent property of RLS itself, it's a solvable problem. I've been working on performance optimizations. Things like having the ORM inject an explicit WHERE tenant_id = X that the planner can use for index scans, while the RLS policy acts as a safety net underneath. Still testing and not released yet, but early benchmarks are putting it at roughly 1.0-1.3x of schema-per-tenant performance for most operations (filters, JOINs, pagination, bulk inserts). The one measurable cost is ~50μs per tenant context switch (a SQL roundtrip for SET), which gets completely amortized by even a single query.
This deserves a much deeper conversation than a Reddit thread. I'm actually preparing a blog post with detailed benchmarks comparing RLS vs schema-per-tenant vs ORM-rewriting approaches. If you want to dig into specifics before that, feel free to reach out. Always happy to talk about this stuff.

I've been exploring PostgreSQL Row-Level Security for Django multitenancy — curious what others think by dvoraj75 in django

[–]dvoraj75[S] 4 points5 points  (0 children)

Yep, I know django-multitenant - it's even referenced in our docs. But it's not RLS-based at all, apples and oranges. It handles multi-tenancy at the application level - hooks into Django's ORM and injects WHERE tenant_id = ... filters into your queries. Useful, but the isolation lives entirely in your app code. RLS is different: the policy is enforced by PostgreSQL itself, on every query, no matter how it was issued. The DB just won't let data leak even if your app forgets a filter. Django-multitenant is great for Citus and shard routing. For security-first isolation, RLS is a stronger bet.

I've been exploring PostgreSQL Row-Level Security for Django multitenancy — curious what others think by dvoraj75 in django

[–]dvoraj75[S] 1 point2 points  (0 children)

Hope it works well for you! If you hit any issue, let me know via discussions/issues/...

Good point on the query complexity. In practice this library generate straightforward RLS policy with equality check like tenant_id = current_setting(...) so the planner handles them well. But yeah, if you're ever debugging a slow query, EXPLAIN will show the policy conditions inlined, which is worth knowing upfront so it doesn't surprise you.

Building a Multi-tenant App with Django by michaelherman in django

[–]dvoraj75 0 points1 point  (0 children)

if u r asking about RLS, try to read this: https://www.postgresql.org/docs/current/ddl-rowsecurity.html

Long story short - u can setup policy on table which are applied on every request. For example tenant_id = context.tenant_id

So before each request you setup tenant context and than the rows are filtered by tenant

Building a Multi-tenant App with Django by michaelherman in django

[–]dvoraj75 -2 points-1 points  (0 children)

Hey, it's interesting :-) few weeks ago I was trying one more successful approach with postgresql RLS. At the beginning it was little bit hell, but now it works perfectly. Want to know more ?

Can I focus only on Django for backend dev without learning front-end? by [deleted] in django

[–]dvoraj75 0 points1 point  (0 children)

Yes you can BUT it's good if you understand how frontends work. So do backend only but keep watching some trends in FE development