pganalyze Collector settings

The collector can be configured through one of three mechanisms:

Package-based install: INI config file, commonly in /etc/pganalyze-collector.conf
Docker-based install: INI config passed via the CONFIG_CONTENTS environment variable
Docker-based install: Individual environment variables for each setting (e.g. DB_HOST)

Most settings can be configured through either INI config settings or individual environment variables. When you set individual environment variables in addition to the INI config, the environment variable settings apply to all monitored servers. If both are present, the INI config takes precedence.

Note that a single collector instance can monitor more than one database server, by utilizing separate configuration sections in the INI config. Utilizing individual environment variables only supports configuring a single server.

For the INI config, the [pganalyze] section describes settings that apply to all servers. Other sections describe the servers to monitor, how to connect to them, and server-specific configuration settings. You should name the other sections after the servers they correspond to, though note these are not the names that will appear in the app. In-app names are based on hostname settings as determined during monitoring.

After you make changes, you can run pganalyze-collector --test to verify the new configuration. For package-based installs, if the test succeeds, this will automatically load the new configuration in the collector background process (equivalent to using the --reload flag). This minimizes monitoring interruptions and simplifies config file updates.

The tables below list configuration settings, their defaults if not set, and their descriptions. If a setting is configurable through environment variables, the environment variable name follows the setting in parentheses. Boolean settings in both the INI config and environment variables can either be specified as true/false or 1/0 values.

Note that these settings apply to the latest version of the collector.

General settings
Server connection settings
PII Filtering settings
Schema filter settings
AWS settings
Azure settings
Google Cloud Platform
Crunchy Bridge settings
Self-managed servers
OpenTelemetry exporter settings
Additional settings

General settings

Common settings for configuring collector behavior, independent of the platform.

Setting	Default	Description
api_key (`PGA_API_KEY`)	n/a, required	API key to authenticate the collector to the pganalyze app. We will show this when adding a server in the app, and you can review it on your organization's Settings page under the API keys tab.
api_base_url (`PGA_API_BASE_URL`)	https://api.pganalyze.com	Base URL for contacting the pganalyze API. You typically do not need to change this, unless you are running pganalyze Enterprise Server. Note: The name of the environment variable was `PGA_API_BASEURL` prior to collector release v0.55.0.
skip_if_replica (`SKIP_IF_REPLICA`)	false	Skip all metadata collection and snapshot submission while this server is a replica (according to `pg_is_in_recovery`). When the server is promoted and is no longer a replica, automatically start collecting and submitting metadata as configured.
enable_log_explain (`PGA_ENABLE_LOG_EXPLAIN`)	false	Enable log-based EXPLAIN. See setup instructions, but note we recommend using auto_explain instead if possible.
max_buffer_cache_monitoring_gb (`MAX_BUFFER_CACHE_MONITORING_GB`)	200	Threshold in GB for enabling the Buffer Cache Statistics feature. If the Postgres `shared_buffers` setting exceeds the threshold it is disabled.

Server connection settings

How to connect to the server(s) pganalyze will be monitoring. Note that when monitoring multiple databases on the same server, the first specified is considered the primary database. This is where helper functions are expected to be defined, and the one we connect to for server-wide metrics.

Setting	Default	Description
db_url (`DB_URL`)	n/a, either this or individual settings below are required	URL of the database server to monitor
db_name (`DB_NAME`)	n/a, either this or db_url is required	Name of database to monitor; or, comma-separated list of all databases to monitor, starting with primary (last entry can be `*` to monitor all databases on server)
db_username (`DB_USERNAME`)	n/a, either this or db_url is required	Postgres user to connect as (we recommend using the pganalyze monitoring user here)
db_password (`DB_PASSWORD`)	[none]	Password for the Postgres user
db_host (`DB_HOST`)	n/a, either this or db_url is required	Host to connect to. Or, if connecting to a local server using peer authentication, the path to a local unix domain socket as configured by unix_socket_directories.
db_port (`DB_PORT`)	5432	Port to connect on
db_sslmode (`DB_SSLMODE`)	prefer	The `sslmode` setting to connect with (see Postgres documentation for more details)
db_sslrootcert (`DB_SSLROOTCERT`)	system certificate store	Path to SSL certificate authority (CA) certificate(s) to use to verify the server's certificate, or `rds-ca-global` to use the built-in Amazon RDS global CA certificate set
db_sslrootcert_contents (`DB_SSLROOTCERT_CONTENTS`)	[none], see above	Alternative to above, using actual contents of the certificate(s) instead
db_sslcert (`DB_SSLCERT`)	[none]	Path to the client SSL certificate (optional, usually not required)
db_sslcert_contents (`DB_SSLCERT_CONTENTS`)	[none]	Alternative to above, using actual contents of the certificate instead
db_sslkey (`DB_SSLKEY`)	[none]	Path to the secret key used for the client certificate
db_sslkey_contents (`DB_SSLKEY_CONTENTS`)	[none]	Alternative to above, using actual contents of the key
`DB_ALL_NAMES`	false	Alternative to setting `*` as the last entry of db_name (`DB_NAME`) to monitor all databases on the server. Can only be set using an environment variable. This is helpful when db_url (`DB_URL`) is used to specify a monitoring server (db_name is unused) and you want to monitor all databases on that server.

PII Filtering settings

We take the responsibility of access to your database very seriously. As discussed above, we already limit the direct access we have to your data, but some personally-identifiable information or other sensitive values can still come up in query text or logs. To address this, the collector has several settings to filter these before we collect them. See additional documentation for details.

Setting	Default	Description
filter_log_secret (`FILTER_LOG_SECRET`)	credential, parsing_error, unidentified	Either `none`, `all`, or one or more of `credential`/`parsing_error`/`statement_text`/`statement_parameter`/`table_data`/`ops`/`unidentified` (comma separated)
filter_query_sample (`FILTER_QUERY_SAMPLE`)	none	Either `none`, `normalize` or `all`
filter_query_text (`FILTER_QUERY_TEXT`)	unparsable	Either `none` or `unparsable`

Schema filter settings

The pganalyze collector limits the number of schema objects (tables, views, etc.) that can be monitored on each database server. This limit is currently 5,000 tables or views per database server. If this limit is exceeded, no schema information will be collected.

You can avoid reaching this limit by using the following setting to select which tables/views should be excluded:

Setting	Default	Description
ignore_schema_regexp (`IGNORE_SCHEMA_REGEXP`)	[none]	Skip collecting metadata for all matching tables, schemas, or functions; match is checked against schema-qualified object names (e.g. to ignore table "foo" only in the public schema, set to `^public\.foo$`)

To validate whether the setting is working as intended, you can run a query on your database server to count the number of monitored tables and views. Note you would have to run this on each database on the server and then summarize the counts, which should not exceed the 5,000 limit in aggregate:

SELECT current_database() AS dbname,
       COUNT(*) AS table_and_view_count
  FROM pg_class c
       LEFT JOIN pg_catalog.pg_namespace n ON (n.oid = c.relnamespace)
 WHERE c.relkind IN ('r','v','m','p')
       AND c.relpersistence <> 't'
       AND c.relname NOT IN ('pg_stat_statements')
       AND n.nspname NOT IN ('pg_catalog','pg_toast','information_schema')
       AND (n.nspname || '.' || c.relname) !~* 'REGEXP';

Make sure to replace REGEXP with the value of your ignore_schema_regexp setting.

AWS settings

Only relevant if you are running your database in Amazon RDS or Amazon Aurora. See our RDS/Aurora setup instructions for details.

Note that the aws_endpoint_* settings are only relevant if you are using custom AWS endpoints. See the AWS documentation for details.

Setting	Default	Description
aws_region (`AWS_REGION`)	auto-detected from hostname	Region your AWS server is running in
aws_db_instance_id (`AWS_DB_INSTANCE_ID`)	auto-detected from hostname	Instance ID of your Amazon RDS instance; may need to be set manually when using IP addresses or custom DNS records
aws_db_cluster_id (`AWS_DB_CLUSTER_ID`)	auto-detected from hostname	Cluster ID of your Amazon Aurora cluster (either cluster or reader endpoint); may need to be set manually when using IP addresses or custom DNS records
aws_access_key_id (`AWS_ACCESS_KEY_ID`)	[none]	Only necessary if not using recommended instance roles configuration
aws_secret_access_key (`AWS_SECRET_ACCESS_KEY`)	[none]	See above
aws_account_id (`AWS_ACCOUNT_ID`)	[none]	If specified, and api_system_scope (see below) is not specified, this is prepended to the auto-generated system scope (optional, can be used to, e.g., differentiate staging from production)
db_use_iam_auth (`DB_USE_IAM_AUTH`)	false	Fetches a short-lived token for logging into the database instance from the AWS API, instead of using a hardcoded password in the collector configuration file. To use this setting, IAM authentication needs to be enabled on the database instance / cluster, the pganalyze IAM policy needs to cover the "rds-db:connect" privilege for the pganalyze user, and the user needs to be granted the "rds_iam" role in Postgres.
aws_assume_role (`AWS_ASSUME_ROLE`)	[none]	If using cross-account role delegation, the ARN of the role to assume; see the AWS documentation for details
aws_web_identity_token_file (`AWS_WEB_IDENTITY_TOKEN_FILE`)	[none]	If running the collector inside EKS, can be used with aws_role_arn in order to access AWS resources; see the AWS documentation for details
aws_role_arn (`AWS_ROLE_ARN`)	[none]	If running the collector inside EKS, can be used with aws_web_identity_token_file in order to access AWS resources; see the AWS documentation for details
aws_endpoint_signing_region (`AWS_ENDPOINT_SIGNING_REGION`)	[none]	Region to use for signing requests (optional, usually not required)
aws_endpoint_rds_url (`AWS_ENDPOINT_RDS_URL`)	[none]	URL of RDS service (optional, usually not required)
aws_endpoint_ec2_url (`AWS_ENDPOINT_EC2_URL`)	[none]	URL of EC2 service (optional, usually not required)
aws_endpoint_cloudwatch_url (`AWS_ENDPOINT_CLOUDWATCH_URL`)	[none]	URL of CloudWatch service (optional, usually not required)
aws_endpoint_cloudwatch_logs_url (`AWS_ENDPOINT_CLOUDWATCH_LOGS_URL`)	[none]	URL of CloudWatch log service (optional, usually not required)

Azure settings

Only relevant if you are running your database in Azure using Azure Database for PostgreSQL. See our

Azure setup instructions for details.

Setting	Default	Description
azure_db_server_name (`AZURE_DB_SERVER_NAME`)	auto-detected from hostname	Name of your server; may need to be set manually when using IP addresses or custom DNS records
azure_subscription_id (`AZURE_SUBSCRIPTION_ID`)	n/a, required for system stats	Subscription ID of your Azure Database server
azure_eventhub_namespace (`AZURE_EVENTHUB_NAMESPACE`)	n/a, required for Log Insights	Event Hub namespace to use for log handling
azure_eventhub_name (`AZURE_EVENTHUB_NAME`)	n/a, required for Log Insights	Event Hub name to use for log handling
azure_ad_tenant_id (`AZURE_AD_TENANT_ID`)	[none]	The "Directory (tenant) ID" on your application. Only necessary if not using the recommended Managed Identity setup; see the setup instructions for details
azure_ad_client_id (`AZURE_AD_CLIENT_ID`)	[none]	The "Application (client) ID" on your application. Only necessary if not using the recommended Managed Identity setup (see above). Alternatively, when using a Managed Identity, this can also be used to define which identity to use when a VM has multiple identities assigned.
azure_ad_client_secret (`AZURE_AD_CLIENT_SECRET`)	[none]	When using client secrets, specify the generated secret here
azure_ad_certificate_path (`AZURE_AD_CERTIFICATE_PATH`)	[none]	When using certificates, specify the path to your certificate here
azure_ad_certificate_password (`AZURE_AD_CERTIFICATE_PASSWORD`)	[none]	When using certificates, specify your certificate password here, if required

Google Cloud Platform

Only relevant if you are running your database in GCP using Google Cloud SQL or Google AlloyDB. See the GCP setup instructions for details.

Setting	Default	Description
gcp_cloudsql_instance_id (`GCP_CLOUDSQL_INSTANCE_ID`)	n/a, required (for Cloud SQL)	Google Cloud SQL instance ID
gcp_alloydb_cluster_id (`GCP_ALLOYDB_CLUSTER_ID`)	n/a, required (for AlloyDB)	Google AlloyDB cluster ID
gcp_alloydb_instance_id (`GCP_ALLOYDB_INSTANCE_ID`)	n/a, required (for AlloyDB)	Google AlloyDB instance ID (within the given cluster)
gcp_project_id (`GCP_PROJECT_ID`)	n/a, required	GCP project ID; see Google documentation for details
gcp_pubsub_subscription (`GCP_PUBSUB_SUBSCRIPTION`)	n/a, required for Log Insights	See GCP setup instructions for details
gcp_credentials_file (`GCP_CREDENTIALS_FILE`)	[none]	Only necessary if not using the recommended method of assigning the Service Account to the VM directly; see these setup instructions for details
db_use_iam_auth (`DB_USE_IAM_AUTH`)	false	Fetches a short-lived token for logging into the database instance from the Google Cloud API, instead of using a hardcoded password in the collector configuration file.
gcp_region (`GCP_REGION`)	n/a, required for IAM Authentication	GCP region of the Cloud SQL instance
gcp_use_public_ip (`GCP_USE_PUBLIC_IP`)	false	Set to `true` if the CloudSQL proxy connection should use the available public IP rather than the default private IP. Setting to `true` when no public IP is assigned will result in a connection error.

Crunchy Bridge settings

Only relevant if you are running your database in Crunchy Bridge.

Setting	Default	Description
crunchy_bridge_api_key (`CRUNCHY_BRIDGE_API_KEY`)	n/a, required for accurate Storage Space information	The API key of Crunchy Bridge. When specified, the collector uses the Crunchy Bridge API to obtain accurate Storage Space information, as well as cluster information.
crunchy_bridge_cluster_id (`CRUNCHY_BRIDGE_CLUSTER_ID`)	auto-detected from hostname	ID of your cluster; may need to be set manually when using IP addresses or custom DNS records

Self-managed servers

If running on your own infrastructure, a platform other than the cloud providers listed above, or in a self-managed VM on a cloud provider, the configuration settings here may be useful.

Setting	Default	Description
db_log_location (`LOG_LOCATION`)	[none]	database log file or directory location (must be readable by pganalyze system user)
db_data_directory (`DB_DATA_DIRECTORY`)	auto-detected	The data_directory for this server
db_log_syslog_server (`LOG_SYSLOG_SERVER`)	[none]	local address (host:port) to listen on for syslog messages (see syslog server instructions)
db_log_syslog_server_ca_file (`LOG_SYSLOG_SERVER_CA_FILE`)	[none]	Path to TLS certificate authority (CA) certificate(s) to use to verify the certificate of the collector syslog server (optional, not required if a certificate is signed by a trusted CA)
db_log_syslog_server_ca_contents (`LOG_SYSLOG_SERVER_CA_CONTENTS`)	[none]	Alternative to above, using actual contents of the CA certificate(s) instead
db_log_syslog_server_cert_file (`LOG_SYSLOG_SERVER_CERT_FILE`)	[none]	Path to the collector syslog server's TLS certificate (required for receiving logs over TLS)
db_log_syslog_server_cert_contents (`LOG_SYSLOG_SERVER_CERT_CONTENTS`)	[none]	Alternative to above, using actual contents of the certificate instead
db_log_syslog_server_key_file (`LOG_SYSLOG_SERVER_KEY_FILE`)	[none]	Path to the secret key used for the collector syslog server's certificate (required for receiving logs over TLS)
db_log_syslog_server_key_contents (`LOG_SYSLOG_SERVER_KEY_CONTENTS`)	[none]	Alternative to above, using actual contents of the secret key instead
db_log_syslog_server_client_ca_file (`LOG_SYSLOG_SERVER_CLIENT_CA_FILE`)	[none]	Path to TLS certificate authority (CA) certificate(s) to use to verify the certificate of the client that is sending logs to the collector syslog server, such as rsyslog (optional, not required if a client doesn't provide a certificate or a client certificate is signed by a trusted CA)
db_log_syslog_server_client_ca_contents (`LOG_SYSLOG_SERVER_CLIENT_CA_CONTENTS`)	[none]	Alternative to above, using actual contents of the CA certificate(s) instead
always_collect_system_data (`PGA_ALWAYS_COLLECT_SYSTEM_DATA`)	false	Always gather local system metrics, regardless of whether the database address is local or remote. This is useful for setups which connect to a local database with a non-local IP address.
db_log_otel_server (`LOG_OTEL_SERVER`)	[none]	local address (host:port) to listen on for OpenTelemetry log data model messages (see Kubernetes with OpenTelemetry instructions)
db_log_otel_k8s_pod (`LOG_OTEL_K8S_POD`)	[none]	The pod name (e.g., `mypod`) or `namespace/mypod` to filter matching logs
db_log_otel_k8s_labels (`LOG_OTEL_K8S_LABELS`)	[none]	Comma-separated Kubernetes label selectors to filter matching logs. Only support equality-based selectors.

OpenTelemetry exporter settings

To integrate with application performance monitoring (APM) tools, the collector can optionally act as an OpenTelemetry tracing exporter. This adds data from pganalyze into existing application traces, and allows application engineers to have a quick way to find out if slow database queries are causing performance problems in the app.

The exporter, if enabled, will send a tracing span for each collected EXPLAIN plan to the configured OpenTelemetry endpoint, with a link back to the full plan in pganalyze via the db.postgresql.plan span attribute.

Note that this requires queries to have parent span information propagated using the traceparent query tag, as defined in the W3 Trace Context specification. This is enabled on the application side by libraries like sqlcommenter.

Setting	Default	Description
otel_exporter_otlp_endpoint (`OTEL_EXPORTER_OTLP_ENDPOINT`)	[none]	OpenTelemetry endpoint to export tracing data to, for example `http://localhost:4318`. Both HTTP (secure and insecure) and gRPC (secure) protocols are supported. The endpoint can either be an OpenTelemetry collector, or a third-party service providing an OTLP endpoint such as Honeycomb or New Relic.
otel_exporter_otlp_headers (`OTEL_EXPORTER_OTLP_HEADERS`)	[none]	OpenTelemetry OTLP headers, for example to set API keys for managed services like Honeycomb (`x-honeycomb-team=KEY`) and New Relic (`api-key=KEY`).
otel_service_name (`OTEL_SERVICE_NAME`)	Postgres (pganalyze)	The logical name of the service. The tracing span sent by the collector will use this name.

Additional settings

Like the general settings above, but less commonly used. We only recommend using these settings after talking to pganalyze support.

Setting	Default	Description
api_system_id (`PGA_API_SYSTEM_ID`)	Automatically detected	Overrides the ID of the system, used for uniquely identifying the server with the pganalyze API. This is commonly what's used to refer to a single server, and defaults to the instance ID for managed database providers.
api_system_type (`PGA_API_SYSTEM_TYPE`)	Automatically detected	Overrides the type of the system, used for uniquely identifying the server with the pganalyze API. Must be one of the following: amazon_rds, azure_database, google_cloudsql, self_hosted or heroku.
api_system_scope (`PGA_API_SYSTEM_SCOPE`)	Automatically detected	Overrides the scope of the system, used for uniquely identifying the server with the pganalyze API. Can be used for auxiliary identifying characteristics (e.g., region of a server ID that's re-used).
api_system_scope_fallback (`PGA_API_SYSTEM_SCOPE_FALLBACK`)	[none]	When the pganalyze backend receives a snapshot with a fallback scope set, and there is no server created with the regular scope, it will first search the servers with the fallback scope. If found, that server's scope will be updated to the (new) regular scope. If not found, a new server will be created with the regular scope. The main goal of the fallback scope is to avoid creating a duplicate server when changing the scope value.
db_log_docker_tail	[none]	Experimental: name of docker container to collect logs from using `docker logs -t`. This requires that the collector runs on the Docker host.
disable_citus_schema_stats (`DISABLE_CITUS_SCHEMA_STATS`)	none	Either `none`, `index`, or `all`. If using the Citus extension in your database, turn off the collection of statistics for distributed indexes, or both distributed indexes and distributed tables. For very large schemas, this collection can error out due to timeouts or locks. When using this option it's recommended to instead monitor the workers directly for table and index sizes.
query_stats_interval (`QUERY_STATS_INTERVAL`)	60	How often to collect query statistics, in seconds; supported values are `60` (once a minute) and `600` (once every ten minutes)
max_collector_connections (`MAX_COLLECTOR_CONNECTION`)	10	Maximum connections allowed to the database with the collector application_name, in order to protect against accidental connection leaks in the collector
http_proxy (`HTTP_PROXY`)	[none]	Proxy to be used for all HTTP connections, such as API calls. Use for proxies that do not support SSL. Example: `http://username:password@myproxy`
https_proxy (`HTTPS_PROXY`)	[none]	Proxy to be used for all HTTP connections, such as API calls. Use for proxies that support SSL. Example: `https://username:password@myproxy`
no_proxy (`NO_PROXY`)	[none]	Comma-delimited list of hostnames that should be accessed directly, without using a configured proxy. Has no effect unless either HTTP_PROXY or HTTPS_PROXY is specified.
disable_logs (`PGA_DISABLE_LOGS`)	false	Disable Log Insights data collection
disable_activity (`PGA_DISABLE_ACTIVITY`)	false	Disable activity snapshot data collection (VACUUM, connection traces, etc.)
error_callback	[none]	Script to call if snapshot fails (learn more in the collector README)
success_callback	[none]	Script to call if snapshot succeeds (learn more in the collector README)

Couldn't find what you were looking for or want to talk about something specific?
Start a conversation with us →