Creating custom extensions on Amazon RDS and Aurora with pg_tle
In today's episode 46 of 5mins of Postgres, we talk about trusted language extensions and how to use them on Amazon RDS and Aurora, as well as the new pg_tle extension. We also take a look at pg_tle's support for the "password check hook".
Let's have a look!
This is an announcement about Trusted Language Extensions for PostgreSQL on Amazon Aurora and Amazon RDS that came out during the AWS re:invent event this week. Have a look at the article if you’d like, but I'll jump right into what this is about and why they created this.
In Postgres, when you have the notion of "Trusted", what this refers to is languages that do not allow access to the database server internals or the file system. For example, PL/pgSQL gets created automatically, but behind the scenes it calls
CREATE TRUSTED LANGUAGE plpgsql. So, PL/pgSQL is considered trusted versus PL/PythonU is not, because PL/PythonU actually allows you to access the server file system and ultimately become root or super-user or something else on the system. This, of course, could be a problem.
The other thing to understand is that Postgres has the CREATE EXTENSION mechanism. This takes an extension name which is loaded from a file. You're essentially running a SQL file in the server to create this extension and all the objects created by the extension are also associated to it. So, if you run a
DROP EXTENSION later, it will remove all the schema objects that the extension file contained.
You're essentially running a SQL file in the server to create this extension and all the objects created by the extension are also associated to it.
Many extensions actually contain not just a SQL schema definition, but also a shared library object where the extension is written in C. What we're talking about here does not relate at all to shared library extensions or C extensions though. It just relates to trusted languages such as PL/pgSQL. This will not allow you to run TimescaleDB on Amazon RDS, but it will let you create your own extensions with a custom schema object and PL/pgSQL or PL/v8 extension definitions.
AWS made this an open source project under the Apache 2 license, called pg_tle. I think that's great, because it shows that they have actually learned from the past and are now trying to be better citizens in the Postgres community. Microsoft Azure and GCP could take this and run this in their own environments as well.
Specifically, pg_tle is a C extension that lets you then run trusted language extensions. The idea here is that it makes things accessible in the context of a managed service provider where you don't have file system access. When you run
CREATE EXTENSION, instead of loading information from a file, it lets you load the extension definition from a metadata table that pg_tle maintains.
Let's take a look at how this looks like in practice. Here we call
pgtle.install_extension. This is something that you would do once, when you create your extension. It’s when you register it into the system. You give it a certain name, for example in the link above we use the name
pg_distance, and we then have an extension definition file. This is the same that you would see today - if you looked at any Postgres extension, you will see a file with these SQL definitions in them. We then just do regular
CREATE EXTENSION pg_distance, and then we can use the functions that are contained within that pg_distance definition.
This is really cool because it works just the same way as a regular extension. With the one difference being that first step, where you're creating the extension through the SQL function, instead of having to create it on the file system.
As Jonathan Katz, who is one of the authors of this new project, notes: one use case that he's particularly excited about is this being a new way to package stored procedures. You can think about this as, not necessarily a way to bring existing extensions into Postgres on a managed service provider, but rather as a way for you to have a set of packages of common stored procedures you're using.
The other thing I want to mention here is that they added support for hooks. For now, they added support for one hook - the password check hook - but it shows the promise of this also being able to do more than you were able to do before!
The password check hook lets you hook into a
CREATE ROLE or
ALTER ROLE statement in Postgres, and lets you verify if the password fulfills certain characteristics based on a function you created. We can have a list of bad passwords for example, and then we create our function and then at the end, what we say is "register this particular function as a hook into Postgres".
The password check hook lets you hook into a
ALTER ROLEstatement in Postgres, and lets you verify if the password fulfills certain characteristics based on a function you created.
What Postgres does well in terms of extensibility is it provides these hooks that let you override mechanisms of the core database system. And what pg_tle does for you is to essentially proxy that hook integration, so that even though you're not a C library - you're a trusted language - you still can access the hook functionality in Postgres.
If you extend this to not just the password checks but a lot of the other hooks in Postgres that run during regular server operations, one day, you may actually be able to have an extension like PostGIS, or maybe one of the more simpler extensions, be integrated through a mechanism like this without requiring providers to constantly update the extensions for you and adding the support.
Today, pg_tle does not let you do most of that just yet. But, I do think it's a step in the right direction as it will let people on Amazon RDS and Aurora already do more today than they could have done before.