5mins of Postgres E28: Row Level Security in Postgres, security invoker views and why LEAKPROOF functions matter
Today, we're going to talk about Postgres Row Level Security, the BYPASSRLS attribute, security invoker views, and Postgres performance problems when using LEAKPROOF functions and Row Level Security policies.
Let's jump in.
In this blog post by Miki from the Nile team, Miki describes how to use RLS for a multi-tenant database. Multi-tenant databases are databases where you have different customers, aka tenants, and they each own a portion of the database. There are other approaches that you can use, for example you could use a schema per tenant or database per tenant approach.
Miki describes how to use a single database with multiple tenants. We can look at an example of how RLS works with these three tables here. We have a
users table, an
orgs table and an
org_members table. We want to limit access to the orgs table so that only users who are actually members of the organization can see it.
First, we'll enable the
ROW LEVEL SECURITY flag on the
orgs table. Then, we'll create a function that determines the current user that is logged in. In Postgres, you can define arbitrary settings on the database that are kept local for the connection. So here, we use this
app namespace, and then we use the
You just do a
SET command on the connection, and then this is read using this
current_app_user function. We then create a policy on the
orgs table that says, if the
SELECT from the
org_members table matches the current org ID and then also the user that we're currently logged in as is a member of this organization then we return the org.
In this case, you only have a single database user, but then you have the setting that's used to control which user is actually logged into your application. When you authenticate a customer in the application, then you would do the
SET app.current_app_user to the current user, and that tells the database who's currently authorized.
RLS is pretty cool. The thing to know about RLS that's mentioned here is that when you have a user that has a BYPASSRLS attribute, then the RLS attributes don't apply. Which makes sense, cause you might want to be able to query the database or dump and restore the database for internal processes. Superusers, by default, have this BYPASSRLS attribute applied, which means if you're testing things with superuser, don't be surprised if RLS doesn't work.
The other thing that was mentioned in regards to security in the Hacker News discussion about Miki's article was something that I actually had forgotten that was a problem, which is when you use RLS together with a view.
If you define a view in Postgres, a view is actually defined as what's called a security definer. That means when you create a view, the user who creates the view defines the permissions for which table can be accessed or which part of the table can be loaded. If you create a view and then you query it with a user that you would expect to have access to just a subset of the view, that doesn't actually apply. Views, by default, are loading the whole data, not taking any RLS policies into account.
Views by default are loading the whole data, not taking any RLS policies into account.
Postgres 15 actually improves on this. Postgres 15 adds what's called a security invoker view. I can now say
CREATE VIEW security_invoker = true. That means when the view is queried, then it would actually apply everything as if you were the user just querying those tables directly. For RLS that's in most cases the better choice.
There is a way to do this in older Postgres versions as well. You can use what's described in this blog post by Ben Burwell here, where you essentially define a function. This is a bit obscure, but it does apply RLS restrictions correctly.
The other thing I want to mention is when to not use RLS. Somebody mentioned in this thread on Hacker News they used RLS and it can cause performance headaches. Even for the simplest of queries! RLS had some historic issues with complex queries and they've been improved, the planner has better logic these days.
RLS can cause performance headaches, Even for the simplest of queries.
But there is still a current problem, and it's around when functions are
LEAKPROOF. When Postgres works with a row level security policy in place and it plans your query, it has to decide when to execute functions and expressions. First, you have to filter down the result set and then you can apply functions that may leak data from other rows. The function might be looking at not just the row it's actually matching, but other rows as well. When something is not marked as
LEAKPROOF it means that it may leak data from outside of the actual row you are returning.
LEAKPROOF helps the optimizer, the planner, choose to apply functions ahead of the row level security check. For performance reasons, that's actually very important.
Here's a Stack Overflow post from somebody two years ago, who said "Postgres is ignoring my index". The reason was they were using an operator that was not
LEAKPROOF. Operators underneath the hood are functions. ILIKE wasn't using the index when used inside RLS. They actually changed the existing function in Postgres to apply the LEAKPROOF-ness attribute. You may choose to do that.
There is actually an interesting discussion that occurred roughly around the same time, a couple of years ago, where they were discussing, upstream, which internal functions in Postgres are marked LEAKPROOF and which are not. The conclusion was mostly that they didn't really change, which is unfortunate, because sometimes you can't actually change the
LEAKPROOF attribute, for example, when you're using RDS or other managed database services.
RLS can be very useful, but it does have performance implications, even for simple queries.
That's it for today's episode. Thank you so much for listening. This was 5mins of Postgres. Subscribe to our YouTube channel or follow us on Twitter to hear about next week's episodes. Talk to you next week!