We just released our second eBook: "The Most Important Events to Monitor in Your Postgres Logs"Download Now

S3: Database system was interrupted (server starting)

Category: Server Events
SQLSTATE: n/a
Urgency: medium

Example Postgres Log Output:

LOG: database system was interrupted; last known up at 2017-05-07 22:33:02 UTC
LOG: database system was not properly shut down; automatic recovery in progress
LOG: database system shutdown was interrupted; last known up at 2017-05-05 20:17:07 UTC
LOG: database system was interrupted while in recovery at 2017-05-05 20:17:07 UTC
HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
LOG: database system was interrupted while in recovery at log time 2017-05-05 20:17:07 UTC
HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.

Explanation:

The server has crashed or was forcefully shut down. This log event often indicates a problem has occurred just before, e.g. S1 - Server Crashed.

On primary servers Postgres can usually recover from this without problems, due to being able to replay WAL from the last known safe data directory state, i.e. the last checkpoint. You might see this take quite a bit longer than a startup after a clean shutdown (S2 - Server Start), this is due to the time it takes to replay the WAL.

On standby servers this event can indicate a problem of its own, and as the HINT message indicates you might have to make a new standby from a fresh base backup. This is especially the case if you see the message recur over and over again right after another.

Recommended Action:

It often worth reviewing the database logs just before a restart that needs to run recovery. Its likely that e.g. S1 - Server Crashed, happened just before, and its log output will help you determine the root cause.

If you want to optimize for a faster recovery time it sometimes makes sense to lower checkpoint_timeout - note that this will result in higher I/O usage. You can monitor the frequency and cause of checkpoints using the W40 - Checkpoint starting event.

In the case that the event indicates a standby/follower server that can't start up anymore, you might have to start from a fresh base backup.

Learn More:


Couldn't find what you were looking for or want to talk about something specific?
Start a conversation with us →