Index-Only Scan Strategies for PostGIS

Index-Only Scan (IOS) execution is one of the most impactful performance levers available in the Advanced GiST Indexing & Optimization toolkit: when the query planner can satisfy every column reference — filter predicates and projected attributes alike — from the index leaf pages without touching the heap, the result is dramatically lower I/O, reduced shared-buffer pressure, and faster response times for spatial APIs and telemetry pipelines.

Achieving true IOS for spatial workloads has historically been awkward. PostGIS geometry values are large, often stored out-of-line via TOAST, and traditional GiST indexes carry only the indexed key. PostgreSQL 12 added INCLUDE support to GiST, and PostgreSQL 14 improved visibility-map tracking for high-write tables. When those two ingredients are combined with autovacuum discipline and strict Python query construction, teams can systematically engineer heap-fetch-free paths for their spatial filter-and-project hotspots.

The diagram below illustrates the difference between a conventional Index Scan (which must visit the heap for every qualifying tuple) and an Index Only Scan (which serves the result entirely from index leaf pages when the visibility map confirms all-visible pages):

Prerequisites and Infrastructure Validation

Before implementing IOS patterns in production, confirm that your stack meets these baseline requirements.

PostgreSQL and PostGIS versions:

sql

SELECT version();
SELECT PostGIS_Full_Version();

PostgreSQL 14 or later is recommended. GiST INCLUDE support arrived in PostgreSQL 12; PostgreSQL 14 improved visibility-map bit maintenance under concurrent writes.
PostGIS 3.2 or later for current geometry operator behaviour.

Required extensions:

sql

SELECT name, installed_version
FROM pg_available_extensions
WHERE name IN ('postgis', 'postgis_topology');

Python packages: psycopg (v3) or asyncpg, plus SQLAlchemy 2.0 or later. For ORM model mapping, geoalchemy2 0.14+.

Existing GiST index audit — check whether the target table already has a GiST spatial index, and whether it has INCLUDE columns:

sql

SELECT
    i.relname                          AS index_name,
    am.amname                         AS access_method,
    ix.indkey                         AS key_columns,
    ix.indnkeyatts                    AS n_key_atts,
    (ix.indnatts - ix.indnkeyatts)    AS n_include_atts,
    pg_get_indexdef(i.oid)            AS definition
FROM pg_index ix
JOIN pg_class i  ON i.oid = ix.indexrelid
JOIN pg_am    am ON am.oid = (SELECT relam FROM pg_class WHERE oid = ix.indrelid LIMIT 1)
                   -- actually get the am from the index itself:
JOIN pg_class t  ON t.oid = ix.indrelid
JOIN pg_am    am2 ON am2.oid = i.relam
WHERE t.relname = 'sensor_readings'
  AND am2.amname = 'gist';

A simpler check that works for most PostgreSQL versions:

sql

SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'sensor_readings'
  AND indexdef ILIKE '%gist%';

If the INCLUDE (...) clause is absent, the index cannot support Index-Only Scans. Proceed to Step 3 to create a covering index.

Autovacuum readiness — confirm autovacuum is active and recently ran on the target table:

sql

SELECT relname, n_dead_tup, last_vacuum, last_autovacuum, n_live_tup
FROM pg_stat_user_tables
WHERE relname = 'sensor_readings';

Core Execution Workflow

Step 1 — Identify Candidate Query Patterns

An Index-Only Scan is viable only when every column referenced in SELECT, WHERE, ORDER BY, and GROUP BY clauses is present in a single index — either as a key column or an INCLUDE column. In spatial workloads, this translates to:

Filtering on bounding-box operators (&&, ~, @) or ST_DWithin radius searches
Projecting lightweight scalar attributes: id, status, recorded_at, category
Avoiding SELECT * or any column not in the INCLUDE list
Not calling geometry functions on the result (e.g. ST_AsGeoJSON(geom) forces a heap fetch)

Projecting the geometry column itself (SELECT geom FROM ...) immediately forces a heap fetch. If your application must return geometry payloads from the same query, IOS is not applicable; instead look at composite spatial indexes to speed up the filter pass before the heap access.

Step 2 — Validate or Create a Covering GiST Index

If no covering index exists, create one. The geom column drives the spatial filter; the INCLUDE columns are stored in leaf pages and served directly to the executor:

sql

CREATE INDEX CONCURRENTLY idx_sensors_gist_covering
ON sensor_readings USING GIST (geom)
INCLUDE (sensor_id, reading_type, recorded_at);

CONCURRENTLY allows the index build to run without locking the table for writes — essential on live systems.

Keep INCLUDE columns narrow. INTEGER, SMALLINT, TIMESTAMP WITH TIME ZONE, and short VARCHAR are ideal. Every included column adds to leaf-page size; bloated leaf pages evict shared-buffer cache entries and undermine the I/O benefit you are trying to achieve.

Combine this pattern with partial GiST indexes when your table contains logically inactive rows:

sql

CREATE INDEX CONCURRENTLY idx_sensors_active_covering
ON sensor_readings USING GIST (geom)
INCLUDE (sensor_id, reading_type, recorded_at)
WHERE status = 'active';

A partial covering index is smaller, warms faster in shared buffers, and produces Heap Fetches: 0 on queries that include the WHERE status = 'active' predicate.

Step 3 — Tune Autovacuum for Visibility Map Hygiene

PostgreSQL relies on the visibility map (VM) to decide whether a heap page must be visited. If the VM marks a page as all-visible, the executor skips the heap entirely. Dead tuples from UPDATE and DELETE operations invalidate VM bits.

For high-write spatial tables, tighten autovacuum thresholds at the table level:

sql

ALTER TABLE sensor_readings SET (
    autovacuum_vacuum_scale_factor = 0.01,
    autovacuum_vacuum_cost_delay   = 0,
    autovacuum_analyze_scale_factor = 0.005
);

autovacuum_vacuum_scale_factor = 0.01 triggers autovacuum when 1% of rows are dead (versus the 20% default). On a 10-million-row table that means vacuum fires after 100,000 dead tuples accumulate rather than 2,000,000.

After a heavy write batch, run VACUUM manually to restore VM bits before benchmarking:

sql

VACUUM (ANALYZE, VERBOSE) sensor_readings;

Monitor VM health before and after:

sql

SELECT
    heap_blks_total,
    heap_blks_scanned,
    heap_blks_vacuumed,
    index_vacuum_count,
    num_dead_tuples
FROM pg_stat_progress_vacuum
WHERE relid = 'sensor_readings'::regclass;

Heap-Only Tuple (HOT) updates are a natural ally. When an UPDATE modifies only non-indexed columns, PostgreSQL stores the new version on the same heap page, leaving index entries and VM bits for other pages intact. Design update patterns to favour HOT where possible by avoiding modifications to the geom column or any INCLUDE column in routine updates.

Step 4 — Construct Python Queries with Explicit Column Selection

A perfectly designed covering index is defeated by application code that requests extra columns. ORM frameworks frequently add primary keys, foreign keys, or relationship columns to SELECT clauses for object identity management; those additions silently break IOS.

SQLAlchemy 2.0 — explicit Core-style column selection:

python

from sqlalchemy import select, func, create_engine, text
from geoalchemy2 import Geometry
from sqlalchemy import Column, Integer, String, DateTime
from sqlalchemy.orm import DeclarativeBase, Session

class Base(DeclarativeBase):
    pass

class SensorReading(Base):
    __tablename__ = "sensor_readings"
    sensor_id    = Column(Integer, primary_key=True)
    reading_type = Column(String(32))
    recorded_at  = Column(DateTime(timezone=True))
    status       = Column(String(16))
    geom         = Column(Geometry("POINT", srid=4326))

# Only select the three columns present in the INCLUDE clause.
# Do NOT call .options(lazyload(...)) — disable relationship loading globally
# for IOS-critical endpoints or use Core-style queries.
stmt = (
    select(
        SensorReading.sensor_id,
        SensorReading.reading_type,
        SensorReading.recorded_at,
    )
    .where(
        SensorReading.geom.op("&&")(
            func.ST_MakeEnvelope(-122.5, 37.7, -122.4, 37.8, 4326)
        )
    )
    .where(SensorReading.status == "active")
)

engine = create_engine("postgresql+psycopg://user:pass@localhost/gisdb")
with engine.connect() as conn:
    # yield_per streams results without loading all rows into memory
    result = conn.execute(stmt.execution_options(yield_per=500))
    for row in result:
        process_reading(row)

psycopg 3 — parameterised raw SQL:

python

import psycopg
from psycopg.rows import namedtuple_row

QUERY = """
    SELECT sensor_id, reading_type, recorded_at
    FROM sensor_readings
    WHERE geom && ST_MakeEnvelope(%s, %s, %s, %s, 4326)
      AND status = 'active'
"""

with psycopg.connect("postgresql://user:pass@localhost/gisdb") as conn:
    conn.row_factory = namedtuple_row
    with conn.cursor(name="ios_cursor") as cur:  # server-side cursor
        cur.itersize = 1000
        cur.execute(QUERY, (-122.5, 37.7, -122.4, 37.8))
        for row in cur:
            process_reading(row)

asyncpg — async server-side cursor:

python

import asyncpg

QUERY = """
    SELECT sensor_id, reading_type, recorded_at
    FROM sensor_readings
    WHERE geom && ST_MakeEnvelope($1, $2, $3, $4, 4326)
      AND status = 'active'
"""

async def stream_active_sensors(pool: asyncpg.Pool, bbox: tuple[float, ...]):
    async with pool.acquire() as conn:
        async with conn.transaction():
            async for row in conn.cursor(QUERY, *bbox):
                yield row

Key discipline points for all three drivers:

List only columns present in the INCLUDE clause.
Use server-side cursors (name= in psycopg, cursor() in asyncpg, yield_per in SQLAlchemy) to avoid loading the full result set into memory.
Parameterise spatial inputs — consistent parameter types help the planner cache the execution plan and reuse the IOS path.

Step 5 — Validate Execution Plans

Never assume IOS is occurring from query syntax alone. After creating the index and running at least one VACUUM, validate with EXPLAIN (ANALYZE, BUFFERS):

sql

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT sensor_id, reading_type, recorded_at
FROM sensor_readings
WHERE geom && ST_MakeEnvelope(-122.5, 37.7, -122.4, 37.8, 4326)
  AND status = 'active';

A successful plan looks like:

Index Only Scan using idx_sensors_active_covering on sensor_readings
  (cost=0.42..18.33 rows=47 width=24) (actual time=0.223..0.891 rows=52 loops=1)
  Index Cond: (geom && '0103000020E610000...'::geometry)
  Filter: (status = 'active')
  Heap Fetches: 0
  Buffers: shared hit=14
Planning Time: 1.2 ms
Execution Time: 0.9 ms

The critical indicators:

Index Only Scan — not Index Scan and not Seq Scan
Heap Fetches: 0 — definitive confirmation of zero heap access
Buffers: shared hit=... — the index pages were served from shared buffers, not disk

If Heap Fetches is greater than zero, the visibility map has stale bits. Run VACUUM (ANALYZE) sensor_readings and re-run EXPLAIN.

For a deeper walkthrough of interpreting spatial EXPLAIN output — including cost estimation and buffer accounting — see reading EXPLAIN ANALYZE output for spatial joins.

Performance Considerations

EXPLAIN Plan Comparison: IOS vs Sequential Scan

A table without a covering index, or with a stale visibility map, falls back to a sequential scan or a heap-fetching index scan:

-- WITHOUT covering index:
Seq Scan on sensor_readings
  (cost=0.00..9840.00 rows=52 width=24) (actual time=12.3..198.4 rows=52 loops=1)
  Filter: ((geom && '...'::geometry) AND (status = 'active'))
  Rows Removed by Filter: 498721

-- WITH covering index, stale VM (Heap Fetches > 0):
Index Scan using idx_sensors_active_covering on sensor_readings
  (cost=0.42..19.01 rows=52 width=24) (actual time=0.231..1.04 rows=52 loops=1)
  Index Cond: (geom && '...'::geometry)
  Heap Fetches: 52
  Buffers: shared hit=14 read=52

The buffer read line (read=52) reveals heap pages being loaded from disk. After VACUUM:

Index Only Scan using idx_sensors_active_covering on sensor_readings
  Heap Fetches: 0
  Buffers: shared hit=14

Relevant GUC Settings

GUC	Recommended value	Effect on IOS
`random_page_cost`	1.1–2.0 (SSD)	Lower values make the planner prefer index paths
`effective_cache_size`	50–75% of RAM	Higher values increase planner confidence in cache hits
`autovacuum_vacuum_scale_factor`	0.01–0.02	Triggers VACUUM earlier, keeping VM bits fresh
`work_mem`	4MB–64MB per query	Affects sort/hash steps; IOS bypasses both

Apply index-path preference for a single session during benchmarking:

sql

SET random_page_cost = 1.1;
SET effective_cache_size = '12GB';

Statistics Freshness

The query planner uses table statistics to decide between scan strategies. After bulk inserts or deletes:

sql

ANALYZE sensor_readings;

Without fresh statistics the planner may underestimate or overestimate selectivity, choosing a sequential scan even when the covering index is present and the VM is clean.

Common Failure Modes and Fixes

Heap Fetches Greater Than Zero Despite Covering Index

Diagnosis: The visibility map has stale bits — dead tuples exist from writes that have not yet been vacuumed.

Fix:

sql

VACUUM (ANALYZE) sensor_readings;
-- Confirm VM bits restored:
SELECT n_dead_tup, last_vacuum FROM pg_stat_user_tables WHERE relname = 'sensor_readings';

If the table has very high write throughput, lower autovacuum_vacuum_scale_factor as shown in Step 3, or schedule explicit VACUUM during off-peak windows.

Planner Chooses Seq Scan Instead of Index Only Scan

Diagnosis: Statistics are stale, random_page_cost is too high, or the planner estimates the bounding box covers a large fraction of the table.

Diagnosis SQL:

sql

EXPLAIN (ANALYZE, BUFFERS)
SELECT sensor_id, reading_type, recorded_at
FROM sensor_readings
WHERE geom && ST_MakeEnvelope(-122.5, 37.7, -122.4, 37.8, 4326);

Look for a high rows estimate in the Seq Scan node — if the planner thinks the filter returns 30%+ of rows it will prefer a sequential scan regardless of index quality.

Fix:

sql

ANALYZE sensor_readings;
-- Or force the issue in development:
SET enable_seqscan = off;
EXPLAIN (ANALYZE, BUFFERS) ...;  -- verify IOS is structurally possible
RESET enable_seqscan;            -- always reset; never leave off in production

ORM Silently Adds Extra Columns

Diagnosis: Generated SQL requests columns outside the INCLUDE list. Capture the query in psycopg’s logging or SQLAlchemy’s echo mode:

python

engine = create_engine("postgresql+psycopg://...", echo=True)

Look for unexpected columns in the SELECT clause — primary key surrogates, relationship FK columns, or __version__ columns added by versioning extensions.

Fix: Switch the IOS-critical endpoint to SQLAlchemy Core select() with an explicit column list, or use raw parameterised SQL. Never use Model.query.all() or session.get(Model, id) on IOS paths.

SRID Mismatch Forces Implicit ST_Transform

When the bounding-box literal uses a different SRID than the indexed geometry column, PostgreSQL must transform one or the other. This wraps the column in a function call, preventing index use entirely.

Diagnosis:

sql

SELECT Find_SRID('public', 'sensor_readings', 'geom');

Fix: Always pass spatial inputs in the same SRID as the column. If your application receives WGS-84 coordinates but the table uses a projected CRS such as EPSG:32633, transform at the boundary:

sql

WHERE geom && ST_Transform(
    ST_MakeEnvelope(-122.5, 37.7, -122.4, 37.8, 4326),
    32633
)

Index Bloat After Bulk Inserts

Bulk writes that create many dead-tuple pages force VACUUM to work harder and temporarily degrade VM bit coverage. After large ingestion jobs:

sql

-- Check index bloat via pg_stat_user_indexes
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'sensor_readings';

-- Reclaim bloat without locking:
REINDEX INDEX CONCURRENTLY idx_sensors_active_covering;

Verification

Run this checklist after deployment to confirm the IOS path is live:

sql

-- 1. Confirm index exists with INCLUDE columns
SELECT indexdef FROM pg_indexes
WHERE tablename = 'sensor_readings'
  AND indexdef ILIKE '%include%';

-- 2. Confirm visibility map has no stale pages
SELECT n_dead_tup, last_vacuum, last_autovacuum
FROM pg_stat_user_tables
WHERE relname = 'sensor_readings';

-- 3. Confirm execution plan shows Heap Fetches: 0
EXPLAIN (ANALYZE, BUFFERS)
SELECT sensor_id, reading_type, recorded_at
FROM sensor_readings
WHERE geom && ST_MakeEnvelope(-122.5, 37.7, -122.4, 37.8, 4326)
  AND status = 'active';

Expected output of Step 3: Index Only Scan ... Heap Fetches: 0.

Timing comparison — a simple before/after using EXPLAIN (ANALYZE) execution time is the most direct signal:

sql

-- Baseline (before IOS): record actual time from Seq Scan plan
-- After IOS: expect 5–20x reduction in execution time for selective spatial filters

Monitoring query to track IOS usage over time:

sql

SELECT
    indexrelname,
    idx_scan        AS total_scans,
    idx_tup_read    AS tuples_read_from_index,
    idx_tup_fetch   AS tuples_fetched_from_heap
FROM pg_stat_user_indexes
WHERE relname = 'sensor_readings'
  AND indexrelname = 'idx_sensors_active_covering';

When idx_tup_fetch is zero (or very close to zero relative to idx_tup_read), the index is serving queries without heap access.

Frequently Asked Questions

Why does my GiST index not produce Index Only Scans even with INCLUDE columns?

The most common cause is a stale visibility map. PostgreSQL must confirm every tuple on a page is visible to all transactions before it can skip the heap. Run VACUUM (ANALYZE) on the table and recheck EXPLAIN output for Heap Fetches: 0.

Can I include a geometry column in INCLUDE to serve geometry from the index?

Technically yes, but geometry values are large and variable-length. Including them balloons the index to the point where it no longer fits in shared_buffers, negating the I/O benefit. Keep INCLUDE columns narrow (integers, timestamps, short text) and fetch geometry from the heap only when required.

How does an Index-Only Scan interact with HOT updates in PostgreSQL?

Heap-Only Tuple (HOT) updates store new tuple versions on the same heap page without creating a new index entry, which preserves visibility map bits for other pages. When HOT updates dominate your write pattern — i.e. you only update non-indexed columns — autovacuum can keep the visibility map clean far more easily, making sustained Index-Only Scans realistic.

Advanced GiST Indexing & Optimization — parent reference covering the full GiST design space
Leveraging Index-Only Scans for Point Data — narrow implementation for point geometry tables
Composite Spatial Indexes — multi-attribute GiST indexes for geometry + timestamp filtering
Partial GiST Indexes — reducing index footprint by excluding inactive rows
Query Plan Analysis with EXPLAIN — reading cost estimates and buffer counts in spatial query plans