Why does selecting ST_X(geom) break an index-only scan?

GiST stores only the 2D bounding box, not the full geometry. ST_X() needs the real geometry payload, which lives in the heap. Store X/Y as separate DOUBLE PRECISION INCLUDE columns instead.

What does Heap Fetches > 0 mean in EXPLAIN output?

A non-zero Heap Fetches count means the visibility map was stale for at least some index entries, forcing PostgreSQL to verify tuple visibility in the heap. Run VACUUM on the table to restore the all-visible flag and bring Heap Fetches back to 0.

Can I use this pattern with asyncpg or SQLAlchemy?

Yes. With asyncpg, use explicit column lists in your query string and a server-side cursor. With SQLAlchemy 2.0, call select() with explicit column attributes rather than the mapped class, and set yield_per to avoid loading all rows into memory.

Leveraging Index-Only Scans for Point Data

Point data workloads — sensor telemetry, GPS tracks, IoT device readings — are among the highest-frequency spatial queries in production. The narrow scenario this page addresses, within the broader Index-Only Scan Strategies workflow, is the case where every query filters on a geometry column but only ever projects a handful of lightweight scalar attributes: a device ID, a timestamp, a reading value. If the database must visit the heap for each matching row, I/O costs scale linearly with result set size. A properly configured index-only scan eliminates that heap traffic entirely, serving results straight from the GiST index leaf pages.

Why the Naive Approach Fails

The common mistake is treating the geometry column as both the filter predicate and a projected output. It looks harmless:

python

# Broken: selects geom directly, forces heap fetch for every row
cur.execute("""
    SELECT sensor_id, geom
    FROM sensor_readings
    WHERE geom && ST_MakeEnvelope(%s, %s, %s, %s, 4326)
""", (lon_min, lat_min, lon_max, lat_max))

Even with a GiST index on geom, this query cannot use an index-only scan. PostGIS GiST indexes store only the 2D bounding box of each geometry — for polygon and linestring data that bounding box is an approximation, but for point data the bounding box is mathematically identical to the coordinate pair. The full geometry object still lives in the heap (possibly in a TOAST page), so any query that projects geom — or applies functions such as ST_X(), ST_AsText(), or ST_Distance() to it in the SELECT list — forces PostgreSQL to fetch every matching row from the heap. For a query returning 50,000 sensor readings per second, that is 50,000 random-access heap fetches that could have been avoided.

The same failure happens silently when an ORM adds the geometry column to the SELECT list because it is part of the mapped model class:

python

# Broken: SQLAlchemy loads the full SensorReading object, including geom
rows = session.query(SensorReading).filter(
    SensorReading.geom.ST_Intersects(bbox_geom)
).all()

Production-Ready Implementation

The fix has two parts: a covering GiST index that stores the projected scalar columns inside the index structure, and an explicit-projection query that never touches geom in the output list.

Step 1 — Create the Covering Index

sql

-- Run once; can be built CONCURRENTLY on a live table.
-- Store only narrow, frequently projected columns in INCLUDE.
-- Keep total INCLUDE size under ~200 bytes to maximise leaf-page density.
CREATE INDEX CONCURRENTLY idx_sensor_readings_ios
ON sensor_readings USING GIST (geom)
INCLUDE (sensor_id, recorded_at, reading_value);

For tables that mix active and archived rows, pair this with a WHERE clause to build a partial GiST index that excludes cold data:

sql

CREATE INDEX CONCURRENTLY idx_sensor_readings_active_ios
ON sensor_readings USING GIST (geom)
INCLUDE (sensor_id, recorded_at, reading_value)
WHERE archived_at IS NULL;

Step 2 — Query with Explicit Scalar Projection

python

import psycopg2
import psycopg2.extras

def fetch_sensors_in_bbox(
    conn,
    lon_min: float, lat_min: float,
    lon_max: float, lat_max: float,
    verify_ios: bool = True,
) -> list[tuple]:
    """
    Fetch sensor_id, recorded_at, reading_value for all readings in the
    given bounding box. Projects only INCLUDE-d columns so PostgreSQL can
    serve results directly from the GiST index leaf pages.

    Args:
        conn: psycopg2 connection (Read Committed isolation recommended).
        lon_min, lat_min, lon_max, lat_max: bounding box in EPSG:4326.
        verify_ios: if True, runs EXPLAIN first and raises if heap fetches > 0.
    Returns:
        List of (sensor_id, recorded_at, reading_value) tuples.
    """
    bbox_params = (lon_min, lat_min, lon_max, lat_max)

    data_sql = """
        SELECT sensor_id, recorded_at, reading_value
        FROM sensor_readings
        WHERE geom && ST_MakeEnvelope(%s, %s, %s, %s, 4326)
          AND archived_at IS NULL
        ORDER BY recorded_at DESC;
    """

    if verify_ios:
        explain_sql = "EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) " + data_sql
        with conn.cursor() as cur:
            cur.execute(explain_sql, bbox_params)
            plan = cur.fetchone()[0]
            heap_fetches = plan[0].get("Plan", {}).get("Heap Fetches", -1)
            if heap_fetches != 0:
                raise RuntimeError(
                    f"Index-only scan degraded: Heap Fetches={heap_fetches}. "
                    "Run VACUUM on sensor_readings or check query projection."
                )

    with conn.cursor(cursor_factory=psycopg2.extras.NamedTupleCursor) as cur:
        # Use a server-side cursor to stream large result sets without
        # loading all rows into Python memory at once.
        cur.itersize = 2000  # fetch 2000 rows per round-trip
        cur.execute(data_sql, bbox_params)
        return cur.fetchall()

SQLAlchemy 2.0 equivalent — select individual column attributes, not the full mapped object:

python

from sqlalchemy import select, func, text
from sqlalchemy.orm import Session

def fetch_sensors_in_bbox_sa(session: Session, lon_min, lat_min, lon_max, lat_max):
    stmt = (
        select(
            SensorReading.sensor_id,
            SensorReading.recorded_at,
            SensorReading.reading_value,
        )
        .where(
            SensorReading.geom.op("&&")(
                func.ST_MakeEnvelope(lon_min, lat_min, lon_max, lat_max, 4326)
            ),
            SensorReading.archived_at.is_(None),
        )
        .order_by(SensorReading.recorded_at.desc())
        .execution_options(yield_per=2000)
    )
    return session.execute(stmt).all()

The key constraint in both versions: geom never appears in the SELECT list. The && bounding-box operator uses the GiST index for spatial filtering but the selected columns (sensor_id, recorded_at, reading_value) come from the INCLUDE payload — no heap visit required.

Configuration and Tuning Knobs

Visibility map maintenance is the most common reason a correctly structured query falls back to heap fetches after initial deployment. Each UPDATE or DELETE invalidates the all-visible flag for affected pages; autovacuum must reclaim those pages before the executor will trust the visibility map again.

Per-table autovacuum overrides for a high-write spatial table:

sql

ALTER TABLE sensor_readings SET (
    autovacuum_vacuum_scale_factor   = 0.01,   -- vacuum at 1% dead tuples (default 20%)
    autovacuum_vacuum_cost_delay     = 0,       -- no throttling; run at full speed
    autovacuum_vacuum_threshold      = 100      -- always vacuum once 100 dead tuples exist
);

Transaction isolation matters too. Serializable and Repeatable Read isolation levels require the executor to re-verify tuple visibility even when the visibility map is current, eliminating the IOS shortcut. For bulk point lookups where IOS is critical, use Read Committed:

python

conn = psycopg2.connect(dsn)
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_READ_COMMITTED)

GUC settings that affect index-only scan selection:

Parameter	Recommended value	Effect
`random_page_cost`	`1.1` (SSD) / `4.0` (HDD)	Lower values make index paths more attractive to the planner
`effective_cache_size`	50–75% of total RAM	Higher values tell the planner more of the index fits in OS cache
`work_mem`	`4MB`–`16MB` per connection	Relevant for sort nodes above an IOS; does not affect IOS itself

For further guidance on planner cost parameters and interpreting scan-type decisions, see query plan analysis with EXPLAIN.

Verification Steps

Run this query after deploying the covering index to confirm the plan is correct:

sql

EXPLAIN (ANALYZE, BUFFERS)
SELECT sensor_id, recorded_at, reading_value
FROM sensor_readings
WHERE geom && ST_MakeEnvelope(-122.5, 37.70, -122.4, 37.80, 4326)
  AND archived_at IS NULL;

A healthy output looks like this:

Index Only Scan using idx_sensor_readings_active_ios on sensor_readings
  (cost=0.42..18.73 rows=312 width=28)
  (actual time=0.08..1.42 rows=312 loops=1)
  Index Cond: (geom && '...'::geometry)
  Heap Fetches: 0
  Buffers: shared hit=14
Planning Time: 0.3 ms
Execution Time: 1.6 ms

Three things to verify:

The plan node reads Index Only Scan — not Index Scan or Bitmap Index Scan.
Heap Fetches: 0 — any non-zero value means the visibility map has stale pages.
Buffers: shared hit=N with no shared read lines — all data came from shared buffers (the index was already cached).

Track degradation over time with pg_stat_user_indexes:

sql

SELECT
    indexrelname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch          -- non-zero when heap fallback occurred
FROM pg_stat_user_indexes
WHERE relname = 'sensor_readings'
  AND indexrelname = 'idx_sensor_readings_active_ios';

A widening gap between idx_tup_read and idx_tup_fetch is a signal to run VACUUM ANALYZE sensor_readings immediately.

Gotchas Checklist

Geometry column in SELECT * — any SELECT * expands to include geom, immediately forcing heap access. Always enumerate columns explicitly, even in ad-hoc queries during debugging.
Coordinate extraction breaks IOS — ST_X(geom), ST_Y(geom), and ST_AsText(geom) all require the full geometry from the heap. If you need coordinates as numbers, add longitude DOUBLE PRECISION and latitude DOUBLE PRECISION columns to the table and list them in INCLUDE instead.
ORM identity map expansion — SQLAlchemy’s session.query(SensorReading) fetches every mapped column (including geom) to build the Python object. Use select(SensorReading.sensor_id, ...) with individual column attributes, or switch to Core-style session.execute(stmt) with a select() construct.
Bulk insert resets all-visible bits — a COPY or large INSERT batch invalidates the visibility map for every touched page. Run VACUUM ANALYZE sensor_readings immediately after bulk loads if IOS throughput is latency-sensitive.
Index bloat after frequent updates — GiST INCLUDE indexes accumulate dead index entries just like heap pages accumulate dead tuples. Schedule REINDEX CONCURRENTLY idx_sensor_readings_active_ios during maintenance windows, or monitor pg_stat_user_indexes.idx_blks_hit for cache-miss anomalies that suggest bloat.

Index-Only Scan Strategies — parent page covering the full IOS implementation workflow, environment prerequisites, and Python driver patterns
Advanced GiST Indexing & Optimization — top-level reference for GiST architecture, index type selection, and production hardening
Partial GiST Indexes: Creating Partial Indexes for Active Map Regions — combine with INCLUDE to keep covering indexes lean by excluding archived rows
Query Plan Analysis with EXPLAIN — interpret cost estimates, buffer counts, and scan-type decisions to diagnose IOS regressions
Bounding-Box Filtering with the && Operator — the && operator used in every example here, with broader context on envelope-based spatial filtering