Note our Terms of Service update as of Mar. 5, 2025

Postgres vs. SQL Server: B-Tree Index Differences & the Benefit of Deduplication

When it comes to optimizing query performance, indexing is one of the most powerful tools available to database engineers. Both PostgreSQL and Microsoft SQL Server (or Azure SQL) use B-Tree indexes as their default indexing structure, but the way each system implements, maintains, and uses those indexes varies in subtle but important ways.

In this blog post, we explore key areas where PostgreSQL and SQL Server diverge: how their B-Tree indexes implementations behave under the hood and how they store and access data on disk. We'll also benchmark the impact of deduplication of values on index size in each database system.

We've also included a comprehensive reference guide at the end (see Postgres vs. SQL Server Index Comparison Table). Whether you're optimizing queries or planning a migration, these differences can have a meaningful impact on both performance and indexing strategy.

How B-Tree indexing works in PostgreSQL vs. SQL Server

At a high level, both databases use B-Tree indexes to speed up equality and range queries. B-Trees maintain sorted order and are balanced for consistent read performance. But while the concept is similar in both databases, the way it's implemented has important performance consequences.

SQL Server: Clustered vs Nonclustered Index

SQL Server uses clustered indexes to physically order the table's data by the indexed column. When a clustered index is defined, the rows in the table are stored in the same order as the index itself. Nonclustered indexes are stored separately and point to rows using a row locator, either a RID or the clustered key. This physical ordering can be beneficial for range scans or pagination queries, but it also means you're limited to one clustered index per table. More importantly, SQL Server stores each index entry in full, even if multiple entries have identical values on the same page. There's no deduplication, so indexes with many repeated values can grow large and consume excessive I/O.

Postgres B-Tree Index

PostgreSQL does not have clustered indexes in the SQL Server sense. All PostgreSQL tables are stored as unordered heaps, and indexes are purely logical structures that point to tuples in the heap. This design gives PostgreSQL some flexibility: it allows for easier index maintenance and avoids the complications of physical reordering.

However, it also means that you can't rely on an index to define how the table is physically laid out. If query performance depends on reading data in a particular order, Postgres does allow you to run the CLUSTER command, but it requires a full table lock. In production environments, you can use tools like pg_repack to achieve a similar result.

So while both databases use B-Tree indexes as their default, SQL Server's tight coupling between index and physical storage creates a different set of expectations and limitations. PostgreSQL's index model has some performance downsides (since there is no clustered index implementation), but distinct features like deduplication make it perform better in other situations.

PostgreSQL's B-Tree deduplication

Deduplication was introduced in PostgreSQL version 13 and addresses a common inefficiency in traditional B-Tree indexes. When many rows share the same indexed value—think status codes, boolean flags, or timestamps—standard B-Trees store each value and its corresponding tuple pointer individually. This results in bloated index pages and increased maintenance cost, especially for write-heavy workloads.

PostgreSQL deduplicates repeated values within a single index page by default. Instead of storing the same key value multiple times, it stores it once and maintains a compact structure that tracks all matching heap pointers. This reduces index size significantly and improves cache performance, since more index entries fit in memory.

SQL Server does not support deduplication. Each index entry is stored independently, even if the values are identical. In datasets with skewed distributions, PostgreSQL's approach leads to more compact, more efficient indexes, with fewer pages and less disk I/O.

Benchmarking B-Tree indexes on PostgreSQL vs. SQL Server

To understand how PostgreSQL's index deduplication affects real-world performance and storage, we ran a benchmark comparing B-Tree index sizes across PostgreSQL and SQL Server under varying levels of data duplication. Each test created a table of 10 million rows with differing levels of value repetition, ranging from entirely unique values to repeated values at a 1000x factor.

Here's how we structured the test in both databases, so you can reproduce it yourself.

PostgreSQL Test Setup

CREATE TABLE factor_1(col int);  
CREATE TABLE factor_10(col int);  
CREATE TABLE factor_100(col int);  
CREATE TABLE factor_1000(col int);

INSERT INTO factor_1 SELECT * FROM GENERATE_SERIES(1, 10000000);  
INSERT INTO factor_10 SELECT val / 10 FROM GENERATE_SERIES(1, 10000000) x(val);  
INSERT INTO factor_100 SELECT val / 100 FROM GENERATE_SERIES(1, 10000000) x(val);  
INSERT INTO factor_1000 SELECT val / 1000 FROM GENERATE_SERIES(1, 10000000) x(val);

CREATE INDEX factor_1_idx ON factor_1(col);  
CREATE INDEX factor_10_idx ON factor_10(col);  
CREATE INDEX factor_100_idx ON factor_100(col);  
CREATE INDEX factor_1000_idx ON factor_1000(col);

CREATE INDEX factor_1_idx_no_dup_fill100 ON factor_1(col) WITH (deduplicate_items = off, fillfactor = 100);  
CREATE INDEX factor_10_idx_no_dup_fill100 ON factor_10(col) WITH (deduplicate_items = off, fillfactor = 100);  
CREATE INDEX factor_100_idx_no_dup_fill100 ON factor_100(col) WITH (deduplicate_items = off, fillfactor = 100);  
CREATE INDEX factor_1000_idx_no_dup_fill100 ON factor_1000(col) WITH (deduplicate_items = off, fillfactor = 100);  

SQL Server Test Setup

CREATE TABLE factor_1(col int);  
CREATE TABLE factor_10(col int);  
CREATE TABLE factor_100(col int);  
CREATE TABLE factor_1000(col int);

INSERT INTO factor_1 SELECT * FROM GENERATE_SERIES(1, 10000000);  
INSERT INTO factor_10 SELECT value / 10 FROM GENERATE_SERIES(1, 10000000);  
INSERT INTO factor_100 SELECT value / 100 FROM GENERATE_SERIES(1, 10000000);  
INSERT INTO factor_1000 SELECT value / 1000 FROM GENERATE_SERIES(1, 10000000);

CREATE INDEX factor_1_idx ON factor_1(col);  
CREATE INDEX factor_10_idx ON factor_10(col);  
CREATE INDEX factor_100_idx ON factor_100(col);  
CREATE INDEX factor_1000_idx ON factor_1000(col);  

Benchmark results: PostgreSQL's deduplication reduces index size

When we benchmarked index sizes across PostgreSQL and SQL Server, we saw a sharp divergence as data duplication increased. With values repeated 1,000 times, a PostgreSQL index using deduplication was 3x smaller than the same index created with deduplication turned off. Compared to SQL Server, which does not support deduplication and stores each repeated value in full, PostgreSQL consistently produced smaller, more efficient indexes.

pganalyze-sql-server-postgresql-btree-index-size-benchmark.png

This difference matters. High-cardinality columns like status flags, timestamps, and categorical fields are common in production systems. When these values repeat across millions of rows, large indexes can quickly become a performance bottleneck, slowing scans, increasing I/O, and inflating memory usage.

PostgreSQL's deduplication reduces index size significantly, making it easier to keep indexes in memory and reduce disk pressure. For teams moving from SQL Server to PostgreSQL, or simply scaling out workloads with heavily used indexes, this optimization isn't just theoretical. It has a direct impact on resource usage, query performance, and overall operational efficiency.

Comparison Table: PostgreSQL vs. SQL Server Indexing

Index implementations for both B-Tree and other index types vary significantly between PostgreSQL and SQL Server. We've put together a comprehensive index comparison table to help you as a reference in your SQL Server to PostgreSQL migrations.

(Certain index types exist in SQL Server but not in PostgreSQL or vice versa. We've noted supportability as follows: 🟢 Supported index type 🔴 Not supported index type.)

Index Type Use Case Example PostgreSQL SQL Server
B-Tree Best for general-purpose indexing, equality and range queries (e.g., filtering users by age or date). 🟢 Default index type, supports equality & range queries, sorting, and pattern matching with prefixes. 🟢 On SQL Server the default structure for clustered and nonclustered indexes is a B-Tree.
Clustered Automatically orders table rows by the index key; best for frequently sorted queries. 🔴 PostgreSQL does not have clustered indexes; instead, you can use the CLUSTER command to order the table based on a nonclustered index; however, this order will not be preserved as new data gets inserted. 🟢 Equivalent to PostgreSQL B-Tree; sorts & stores data in order based on key.
Nonclustered Useful for indexes that speed up searches without affecting physical storage order. 🟢 In PostgreSQL all indexes are nonclustered. 🟢 Can be created on heap or a clustered index; stores data separately from the table.
Hash Optimized for exact match lookups, like searching by user ID or email address. 🟢 In PostgreSQL, hash indexes can only index a single column. While you can create multiple indexes to support a query, typically a multi-column B-Tree index is more effective. 🟢 Used for memory-optimized tables; requires a fixed bucket count.
Filtered / Partial Efficient for indexing a subset of data, such as active users only. 🟢 PostgreSQL can use Partial Indexes to index only a subset of rows. 🟢 A Filtered Index is a nonclustered index that indexes only a subset of table rows.
BRIN Best for very large tables where data is naturally ordered, such as time-series data. 🟢 Stores summaries of block ranges; best for large, sequentially stored data. 🔴 N/A
Full-text Used for natural language searches, such as searching text in articles or product reviews. 🟢 PostgreSQL supports Full-Text Search using GIN indexes on tsvector columns. 🟢 SQL Server uses an inverted index for text-based queries, similar to PostgreSQL GIN.
GIN Great for indexing JSONB, arrays, and full-text search (e.g., searching product descriptions). 🟢 Inverted index; best for JSON, full-text search, and arrays. 🔴 Partial capability via Full-text index.
Vector Efficiently perform similarity search or nearest neighbor search across high-dimensional data, most commonly in AI and machine learning applications. 🟢 PostgreSQL doesn't include vector support natively, but the open-source extension pgvector enables vector storage and indexing. 🔴 SQL Server does not natively support vector indexing or search. Microsoft recommends using its Azure AI Search instead.
XML Optimized for querying and storing XML documents. 🔴 PostgreSQL does not support indexes directly on XML types; however, expression indexes can be used on subsets of the XML data. For unstructured documents, JSONB is the recommended data type. 🟢 SQL Server has dedicated indexes on XML data types.
Spatial Used for geographic queries, e.g., finding locations within a radius. 🟢 In PostgreSQL spatial indexing queries are provided by the open source PostGIS extension. 🟢 SQL Server has built in spatial data types.
SP-GiST Used for hierarchical data structures like tree-based searches (e.g., routing networks). 🟢 Supports non-balanced tree structures like quadtrees & k-d trees, good for hierarchical data. 🔴 N/A
GiST Ideal for geometric and full-text search queries, e.g., finding nearby locations. 🟢 Infrastructure for specialized indexes; used for geometric & full-text search. 🔴 N/A
Columnstore Best for OLAP workloads and analytical queries (e.g., data warehousing). 🔴 While PostgreSQL has different extensions that offer columnar storage, like Citus and Timescale, it's a relatively recent implementation and may be limited by use case. 🟢 SQL Server has built-in columnar storage implemented as an index type since SQL Server 2012.

Choosing the right index for your workload

Understanding the differences between PostgreSQL and SQL Server indexing is crucial when optimizing query performance, planning a migration, or designing a high-performance database. Choosing the right indexing strategy requires deep knowledge of query execution patterns and performance trade-offs. Many teams manually experiment with different indexing strategies, which can lead to over-indexing, redundant indexes, or missed optimization opportunities.

Instead of trial and error, pganalyze Index Advisor automatically detects missing indexes, redundant indexes, and optimal column order for multicolumn indexes by applying a constraint programming model against real query execution data. This removes the guesswork and ensures that PostgreSQL databases are indexed for maximum performance.

References:


Enjoy blog posts like this?

Get them once a month to your inbox