VACUUM: Bloat - VACUUM Blocked by Xmin Horizon

Check Frequency

Every 30 minutes

Default Configuration

Detects when the xmin horizon on the server was assigned more than 24 hours ago and creates an issue with severity "warning". The issue will be created even if no VACUUM is currently blocked by this, as this will potentially block any future VACUUMs. Resolves once the assignment age of the xmin horizon is under the threshold.

This check is enabled by default. These parameters can be tuned in the Configure section of the Alerts & Check-Up page.

Guidance

Impact

VACUUM is potentially blocked by the xmin horizon.

The xmin horizon tells you up to which point the vacuum process can clean up dead rows. When this value is behind and not advancing, VACUUMs will be blocked and will not be able to clean up dead rows.

When VACUUM is blocked and dead rows can't be cleaned, it can result in table bloat and slow queries.

The xmin horizon tracked by pganalyze is the oldest xmin across all databases, summarized as a server-wide metric. Note that while some types of xmin horizon delays affect all databases on a server, such as the xmin for each physical replication slots, in other cases not all databases of a server will be impacted, as Postgres can still clean up dead rows of tables on unrelated databases. For example, if the xmin horizon is held back by a long running transaction on database1, Postgres can still clean up dead rows of tables in database2, as the transaction of database1 will never be able to see database2.

Common Causes and Solutions

  • Long-running transactions

    Long-running transactions may still need to access rows that could otherwise be considered dead, so they can block cleanup.

    Solution

    You can find the transaction holding back the xmin horizon and its connection's pid by running the following command:

    SELECT pid, datname, usename, state, backend_xmin, backend_xid
    FROM pg_stat_activity
    WHERE backend_xmin IS NOT NULL OR backend_xid IS NOT NULL
    ORDER BY greatest(age(backend_xmin), age(backend_xid)) DESC;

    You can cancel it by running either of commands:

    SELECT pg_cancel_backend('<query_pid>');
  • Lagging or stale physical replication slots

    With physical streaming replication with hot_standby_feedback is on, when replication is lagging or a replica server is stale (e.g. down), the oldest transaction that the replication slot needs the database to retain can be "stuck", holding back the xmin horizon.

    Solution

    You can check the replication status on the Replication page.

    If the replication slot is no longer used, you can remove it by running the following command:

    SELECT pg_drop_replication_slot('<slot_name>');
  • Lagging or stale logical replication slots

    With logical replication slots, replication can also get stale when DDL changes (database migrations) don't get applied to a replica server (subscriber). When the subscriber was unable to replicate data due to a schema mismatch, replication will error and get stale. This causes the xmin of the system catalogs of the primary (publisher) to be held back until replication resumes.

    Solution

    You can check the replication status on the Replication page. You may also want to check logs on both the publisher and the subscriber for any error messages regarding logical replication, such as schema differences.

    If the replication slot is no longer used, remove it it by running the following command:

    SELECT pg_drop_replication_slot('<slot_name>');
  • Long-running queries on standbys

    When hot_standby_feedback is on with physical streaming replication, queries on standbys will hold back the xmin horizon just as if they were running on the primary.

    Solution

    You can find the xmin of all standby servers by running the following command:

    SELECT application_name, client_addr, backend_xmin
    FROM pg_stat_replication
    ORDER BY age(backend_xmin) DESC;

    Once the standby is identified, you can find the query holding back the xmin horizon and its connection's pid in that standby by running the following command:

    SELECT pid, datname, usename, state, backend_xmin, backend_xid
    FROM pg_stat_activity
    WHERE backend_xmin IS NOT NULL OR backend_xid IS NOT NULL
    ORDER BY greatest(age(backend_xmin), age(backend_xid)) DESC;
  • Abandoned prepared transactions

    A transaction prepared for a two-phase commit will prevent cleanup until it is either committed or rolled back.

    Solution

    You can find the prepared transaction by running the following command:

    SELECT gid, prepared, owner, database, transaction AS xmin
    FROM pg_prepared_xacts
    ORDER BY age(transaction) DESC;

    Once identified, you can either commit or cancel the transaction by running either of commands:

    COMMIT PREPARED <gid_from_above>;
    ROLLBACK PREPARED <gid_from_above>;

Couldn't find what you were looking for or want to talk about something specific?
Start a conversation with us →