pganalyze Collector settings

The collector can be configured either through an INI config file (in a package-based install) or environment variables (typically for running via its Docker image). Most settings can be configured through either mechanism. If both are present, the config file takes precedence. Note that a single collector instance can monitor more than one database server, though this is not supported when configuring through environment variables.

For the INI-based setup, the [pganalyze] section describes settings that apply to all servers. Other sections describe the servers to monitor, how to connect to them, and server-specific configuration settings. You should name the other sections after the servers they correspond to, though note these are not the names that will appear in the app. In-app names are based on hostname settings as determined during monitoring. When you set environment variables in addition to specifying a configuration file, the environment variable settings apply to all monitored servers.

After you make changes, you can run pganalyze-collector --test --reload to verify the new configuration and load the new configuration in the collector background process if they work correctly. This minimizes monitoring interruptions and simplifies config file updates.

The tables below list configuration settings, their defaults if not set, and their descriptions. If a setting is configurable through environment variables, the environment variable name follows the setting in parentheses. Environment variables for boolean settings expect 1 for true and 0 for false.

Note that these settings apply to the latest version of the collector.

General settings

Common settings for configuring collector behavior, independent of the platform.

Setting Default Description
api_key (PGA_API_KEY) n/a, required API key to authenticate the collector to the pganalyze app. We will show this when adding a server in the app, and you can review it in your organization's API keys page.
api_base_url (PGA_API_BASEURL) Base URL for contacting the pganalyze API. You typically do not need to change this, unless you are running pganalyze Enterprise Edition.
ignore_schema_regexp (IGNORE_SCHEMA_REGEXP) [none] Skip collecting metadata for all matching tables, schemas, or functions; match is checked against schema-qualified object names (e.g. to ignore table "foo" only in the public schema, set to ^public\.foo$)
skip_if_replica (SKIP_IF_REPLICA) false Skip all metadata collection and snapshot submission while this server is a replica (according to pg_is_in_recovery). When the server is promoted and is no longer a replica, automatically start collecting and submitting metadata as configured.
enable_log_explain (PGA_ENABLE_LOG_EXPLAIN) false Enable log-based EXPLAIN. See setup instructions, but note we recommend using auto_explain instead if possible.

Server connection settings

How to connect to the server(s) pganalyze will be monitoring. Note that when monitoring multiple databases on the same server, the first specified is considered the primary database. This is where helper functions are expected to be defined, and the one we connect to for server-wide metrics.

Setting Default Description
db_url (DB_URL) n/a, either this or individual settings below are required URL of the database server to monitor
db_name (DB_NAME) n/a, either this or db_url is required Name of database to monitor; or, comma-separated list of all databases to monitor, starting with primary (last entry can be * to monitor all databases on server)
db_username (DB_USERNAME) n/a, either this or db_url is required Postgres user to connect as (we recommend using the pganalyze monitoring user here)
db_password (DB_PASSWORD) [none] Password for the Postgres user
db_host (DB_HOST) n/a, either this or db_url is required Host to connect to
db_port (DB_PORT) 5432 Port to connect on
db_sslmode (DB_SSLMODE) prefer The sslmode setting to connect with (see Postgres documentation for more details)
db_sslrootcert (DB_SSLROOTCERT) system certificate store Path to SSL certificate authority (CA) certificate(s) to use to verify the server's certificate, or one of rds-ca-2015-root or rds-ca-2019-root to use the built-in AWS RDS certificates
db_sslrootcert_contents (DB_SSLROOTCERT_CONTENTS) n/a, see above Alternative to above, using actual contents of the certificate(s) instead
db_sslcert (DB_SSLCERT) [none] Path to the client SSL certificate (optional, usually not required)
db_sslcert_contents (DB_SSLCERT_CONTENTS) [none] Alternative to above, using actual contents of the certificate instead
db_sslkey (DB_SSLKEY) [none] Path to the secret key used for the client certificate
db_sslkey_contents (DB_SSLKEY_CONTENTS) [none] Alternative to above, using actual contents of the key

PII Filtering settings

We take the responsibility of access to your database very seriously. As discussed above, we already limit the direct access we have to your data, but some personally-identifiable information or other sensitive values can still come up in query text or logs. To address this, the collector has several settings to filter these before we collect them.

Setting Default Description
filter_log_secret (FILTER_LOG_SECRET) none One or more of none/all/credential/parsing_error/statement_text/statement_parameter/table_data/ops/unidentified (comma separated)
filter_query_sample (FILTER_QUERY_SAMPLE) none Either none or all
filter_query_text (FILTER_QUERY_TEXT) unparsable Either none or unparsable

Our recommended configuration for servers containing highly sensitive data is as follows:

filter_log_secret: all
filter_query_sample: all
filter_query_text: unparsable

Note that this automatically turns off the query sample and EXPLAIN plan features, as they may contain sensitive data in the query text. We are working to provide more fine-grained PII filtering options in the future, including the option to sanitize EXPLAIN plans automatically.

AWS settings

Only relevant if you are running your database in Amazon RDS or Amazon Aurora. See our RDS/Aurora setup instructions for details.

Note that the aws_endpoint_* settings are only relevant if you are using custom AWS endpoints. See the AWS documentation for details.

Setting Default Description
aws_region (AWS_REGION) auto-detected from hostname Region your AWS server is running in
aws_db_instance_id (AWS_INSTANCE_ID) auto-detected from hostname Instance ID of your AWS server; may need to be set manually when using IP addresses or custom DNS records
aws_access_key_id (AWS_ACCESS_KEY_ID) [none] Only necessary if not using recommended instance roles configuration
aws_secret_access_key (AWS_SECRET_ACCESS_KEY) [none] See above
aws_account_id (AWS_ACCOUNT_ID) [none] If specified, and api_system_scope (see below) is not specified, this is prepended to the auto-generated system scope (optional, can be used to, e.g., differentiate staging from production)
aws_assume_role (AWS_ASSUME_ROLE) [none] If using cross-account role delegation, the ARN of the role to assume; see the AWS documentation for details
aws_endpoint_signing_region (AWS_ENDPOINT_SIGNING_REGION) [none] Region to use for signing requests (optional, usually not required)
aws_endpoint_rds_url (AWS_ENDPOINT_RDS_URL) [none] URL of RDS service (optional, usually not required)
aws_endpoint_ec2_url (AWS_ENDPOINT_EC2_URL) [none] URL of EC2 service (optional, usually not required)
aws_endpoint_cloudwatch_url (AWS_ENDPOINT_CLOUDWATCH_URL) [none] URL of CloudWatch service (optional, usually not required)
aws_endpoint_cloudwatch_logs_url (AWS_ENDPOINT_CLOUDWATCH_LOGS_URL) [none] URL of CloudWatch log service (optional, usually not required)

Azure settings

Only relevant if you are running your database in Azure using Azure Database for PostgreSQL. See our Azure setup instructions for details.

Setting Default Description
azure_db_server_name (AZURE_DB_SERVER_NAME) auto-detected from hostname Name of your server; may need to be set manually when using IP addresses or custom DNS records
azure_eventhub_namespace (AZURE_EVENTHUB_NAMESPACE) n/a, required for Log Insights Event Hub namespace to use for log handling
azure_eventhub_name (AZURE_EVENTHUB_NAME) n/a, required for Log Insights Event Hub name to use for log handling
azure_ad_tenant_id (AZURE_AD_TENANT_ID) [none] The "Directory (tenant) ID" on your application. Only necessary if not using the recommended Managed Identity setup; see these setup instructions for details
azure_ad_client_id (AZURE_AD_CLIENT_ID) [none] The "Application (client) ID" on your application
azure_ad_client_secret (AZURE_AD_CLIENT_SECRET) [none] When using client secrets, specify the generated secret here
azure_ad_certificate_path (AZURE_AD_CERTIFICATE_PATH) [none] When using certificates, specify the path to your certificate here
azure_ad_certificate_password (AZURE_AD_CERTIFICATE_PASSWORD) [none] When using certificates, specify your certificate password here, if required

Google Cloud Platform

Only relevant if you are running your database in GCP using Google Cloud SQL. See the GCP setup instructions for details.

Setting Default Description
gcp_cloudsql_instance_id (GCP_CLOUDSQL_INSTANCE_ID) n/a, required for Log Insights Google Cloud SQL instance ID
gcp_project_id (GCP_PROJECT_ID) n/a, required GCP project ID; see Google documentation for details
gcp_pubsub_subscription (GCP_PUBSUB_SUBSCRIPTION) n/a, required for Log Insights See GCP setup instructions for details
gcp_credentials_file (GCP_CREDENTIALS_FILE) [none] Only necessary if not using the recommended method of assigning the Service Account to the VM directly; see these setup instructions for details

Self-managed servers

If running on your own infrastructure, a platform other than the cloud providers listed above, or in a self-managed VM on a cloud provider, the configuration settings here may be useful.

Setting Default Description
db_log_location (LOG_LOCATION) n/a, required for Log Insights database log file or directory location (must be readable by pganalyze system user)

Additional settings

Like the general settings above, but less commonly used. We only recommend using these settings after talking to pganalyze support.

Setting Default Description
api_system_id (PGA_API_SYSTEM_ID) Automatically detected Overrides the ID of the system, used for uniquely identifying the server with the pganalyze API. This is commonly what's used to refer to a single server, and defaults to the instance ID for managed database providers.
api_system_type (PGA_API_SYSTEM_TYPE) Automatically detected Overrides the type of the system, used for uniquely identifying the server with the pganalyze API. Must be one of the following: amazon_rds, azure_database, google_cloudsql, self_hosted or heroku.
api_system_scope (PGA_API_SYSTEM_SCOPE) Automatically detected Overrides the scope of the system, used for uniquely identifying the server with the pganalyze API. Can be used for auxiliary identifying characteristics (e.g., region of a server ID that's re-used).
api_system_scope_fallback (PGA_API_SYSTEM_SCOPE_FALLBACK) [none] When the pganalyze backend receives a snapshot with a fallback scope set, and there is no server created with the regular scope, it will first search the servers with the fallback scope. If found, that server's scope will be updated to the (new) regular scope. If not found, a new server will be created with the regular scope. The main goal of the fallback scope is to avoid creating a duplicate server when changing the scope value.
db_log_docker_tail [none] Experimental: name of docker container to collect logs from using docker logs -t. This requires that the collector runs on the Docker host.
query_stats_interval (QUERY_STATS_INTERVAL) 60 How often to collect query statistics, in seconds; supported values are 60 (once a minute) and 600 (once every ten minutes)
max_collector_connections (MAX_COLLECTOR_CONNECTION) 10 Maximum connections allowed to the database with the collector application_name, in order to protect against accidental connection leaks in the collector
http_proxy (HTTP_PROXY) [none] Proxy to be used for all HTTP connections, such as API calls. Use for proxies that do not support SSL. Example: http://username:password@myproxy
https_proxy (HTTPS_PROXY) [none] Proxy to be used for all HTTP connections, such as API calls. Use for proxies that support SSL. Example: https://username:password@myproxy
no_proxy (NO_PROXY) [none] Comma-delimited list of hostnames that should be accessed directly, without using a configured proxy. Has no effect unless either HTTP_PROXY or HTTPS_PROXY is specified.
disable_logs (PGA_DISABLE_LOGS) false Disable Log Insights data collection
disable_activity (PGA_DISABLE_ACTIVITY) false Disable activity snapshot data collection (VACUUM, connection traces, etc.)
error_callback [none] Script to call if snapshot fails (learn more in the collector README)
success_callback [none] Script to call if snapshot succeeds (learn more in the collector README)

Couldn't find what you were looking for or want to talk about something specific?
Start a conversation with us →