5mins of Postgres E20: An important bug in Postgres 14 with REINDEX / CREATE INDEX CONCURRENTLY, and using the amcheck extension
Today, we're going to talk about an important indexing bug that was discovered in Postgres 14 and how you can verify if this affects you with the amcheck extension.
Let's dive in.
We'll start with this tweet by Michael Paquier that talks about
REINDEX CONCURRENTLY and
CREATE INDEX CONCURRENTLY. If you are using Postgres 14, there is a risk of those two commands creating corrupted indexes.
Important: Do not use these two commands on Postgres 14 until this bug is fixed and if you have used
REINDEX CONCURRENTLY, or
CREATE INDEX CONCURRENTLY on your own production database, as have I, then you do want to make sure to use amcheck on these B-tree indexes to ensure that they haven't actually run into the problem. More on amcheck below.
Now, let's dive deeper into what actually happened here. On May 18th, somebody reported a bug to the Postgres bugs mailing list, and what they saw in their own production database was that they were rebuilding their primary key index. This was on a very busy database with a lot of inserts and updates and that's likely going to contribute to the chance of this happening.
They were able to reproduce this problem that they saw, on a smaller test case. They're describing this test case here. Where they are essentially running a pgbench, and then during the pgbench run, they're running a
REINDEX INDEX CONCURRENTLY command and after that has run, they are using the amcheck extension to check whether there are any errors in the index.
The amcheck extension runs through your index and it makes sure that all the pointers in the index to the table are consistent. So for example, if something was existing in a table, but it was not existing in the index, then it will throw an error. And that's exactly the error that they got here in this bug report, which is that amcheck was reporting that a heap tuple, so that's essentially an entry into the main table, did not have a matching index tuple, a matching index entry, within the index that they had re-indexed.
They then also continued to verify whether there were actually missing records in the index. The way they did that, is they essentially forced a sequential scan and compared that with the results of an index scan, through that discovered that the index scan was not returning some of the records, versus the sequential scan was. Clearly, there was actually missing data in the index.
This is pretty much the worst case situation that can happen with bugs around indexes, which is returning bad data. Obviously, the community has prioritized making sure that this can be addressed quickly and that the root cause has been understood. Now, this discussion goes through a lot of additional emails, but the conclusion is that earlier this week they found the main problematic change in Postgres 14. They reverted that change.
The community is currently discussing, most likely, pulling forward the next patch release dates, at least for Postgres 14, so that they can get the release into people's hands quickly so you can use Postgres 14 and
CREATE INDEX CONCURRENTLY safely.
Now, if you are on Postgres 14, and you have run
CREATE INDEX CONCURRENTLY, or
REINDEX CONCURRENTLY previously what is really essential is the amcheck extension. The amcheck extension has a
bt_index_check function. And so this
bt_index_check function lets you verify whether an index is corrupted or not. So far, the community seems to be in consensus that that's actually the right way to test for this problem.
The way to resolve this issue, until the patch release is out, is investigate your indexes that you've run
REINDEX CONCURRENTLY on, especially on busy tables, run that
bt_index_check function, and if the
bt_index_check function tells you the index is broken, drop the index, recreate it and do not use
CONCURRENTLY, or if you do then make sure to verify to index again afterwards.
Hopefully you have no corrupted indexes and can update your Postgres system soon to the new patch release!