Point data workloads — sensor telemetry, GPS tracks, IoT device readings — are among the highest-frequency spatial queries in production. The narrow scenario this page addresses, within the broader Index-Only Scan Strategies workflow, is the case where every query filters on a geometry column but only ever projects a handful of lightweight scalar attributes: a device ID, a timestamp, a reading value. If the database must visit the heap for each matching row, I/O costs scale linearly with result set size. A properly configured index-only scan eliminates that heap traffic entirely, serving results straight from the GiST index leaf pages.

Index-only scan vs heap-fetch path for PostGIS point data Two query paths side by side. Left path (naive): query goes through GiST index, then fetches each row from the heap. Right path (optimised): query goes through covering GiST index with INCLUDE columns, result served directly from index with no heap access. SELECT geom, sensor_id FROM … GiST index (bounding box) Heap fetch (row-by-row) Result rows Naive approach SELECT sensor_id, recorded_at FROM … Covering GiST index INCLUDE (sensor_id, recorded_at) Result rows No heap access Index-only scan

Why the Naive Approach Fails

The common mistake is treating the geometry column as both the filter predicate and a projected output. It looks harmless:

python
# Broken: selects geom directly, forces heap fetch for every row
cur.execute("""
    SELECT sensor_id, geom
    FROM sensor_readings
    WHERE geom && ST_MakeEnvelope(%s, %s, %s, %s, 4326)
""", (lon_min, lat_min, lon_max, lat_max))

Even with a GiST index on geom, this query cannot use an index-only scan. PostGIS GiST indexes store only the 2D bounding box of each geometry — for polygon and linestring data that bounding box is an approximation, but for point data the bounding box is mathematically identical to the coordinate pair. The full geometry object still lives in the heap (possibly in a TOAST page), so any query that projects geom — or applies functions such as ST_X(), ST_AsText(), or ST_Distance() to it in the SELECT list — forces PostgreSQL to fetch every matching row from the heap. For a query returning 50,000 sensor readings per second, that is 50,000 random-access heap fetches that could have been avoided.

The same failure happens silently when an ORM adds the geometry column to the SELECT list because it is part of the mapped model class:

python
# Broken: SQLAlchemy loads the full SensorReading object, including geom
rows = session.query(SensorReading).filter(
    SensorReading.geom.ST_Intersects(bbox_geom)
).all()

Production-Ready Implementation

The fix has two parts: a covering GiST index that stores the projected scalar columns inside the index structure, and an explicit-projection query that never touches geom in the output list.

Step 1 — Create the Covering Index

sql
-- Run once; can be built CONCURRENTLY on a live table.
-- Store only narrow, frequently projected columns in INCLUDE.
-- Keep total INCLUDE size under ~200 bytes to maximise leaf-page density.
CREATE INDEX CONCURRENTLY idx_sensor_readings_ios
ON sensor_readings USING GIST (geom)
INCLUDE (sensor_id, recorded_at, reading_value);

For tables that mix active and archived rows, pair this with a WHERE clause to build a partial GiST index that excludes cold data:

sql
CREATE INDEX CONCURRENTLY idx_sensor_readings_active_ios
ON sensor_readings USING GIST (geom)
INCLUDE (sensor_id, recorded_at, reading_value)
WHERE archived_at IS NULL;

Step 2 — Query with Explicit Scalar Projection

python
import psycopg2
import psycopg2.extras

def fetch_sensors_in_bbox(
    conn,
    lon_min: float, lat_min: float,
    lon_max: float, lat_max: float,
    verify_ios: bool = True,
) -> list[tuple]:
    """
    Fetch sensor_id, recorded_at, reading_value for all readings in the
    given bounding box. Projects only INCLUDE-d columns so PostgreSQL can
    serve results directly from the GiST index leaf pages.

    Args:
        conn: psycopg2 connection (Read Committed isolation recommended).
        lon_min, lat_min, lon_max, lat_max: bounding box in EPSG:4326.
        verify_ios: if True, runs EXPLAIN first and raises if heap fetches > 0.
    Returns:
        List of (sensor_id, recorded_at, reading_value) tuples.
    """
    bbox_params = (lon_min, lat_min, lon_max, lat_max)

    data_sql = """
        SELECT sensor_id, recorded_at, reading_value
        FROM sensor_readings
        WHERE geom && ST_MakeEnvelope(%s, %s, %s, %s, 4326)
          AND archived_at IS NULL
        ORDER BY recorded_at DESC;
    """

    if verify_ios:
        explain_sql = "EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) " + data_sql
        with conn.cursor() as cur:
            cur.execute(explain_sql, bbox_params)
            plan = cur.fetchone()[0]
            heap_fetches = plan[0].get("Plan", {}).get("Heap Fetches", -1)
            if heap_fetches != 0:
                raise RuntimeError(
                    f"Index-only scan degraded: Heap Fetches={heap_fetches}. "
                    "Run VACUUM on sensor_readings or check query projection."
                )

    with conn.cursor(cursor_factory=psycopg2.extras.NamedTupleCursor) as cur:
        # Use a server-side cursor to stream large result sets without
        # loading all rows into Python memory at once.
        cur.itersize = 2000  # fetch 2000 rows per round-trip
        cur.execute(data_sql, bbox_params)
        return cur.fetchall()

SQLAlchemy 2.0 equivalent — select individual column attributes, not the full mapped object:

python
from sqlalchemy import select, func, text
from sqlalchemy.orm import Session

def fetch_sensors_in_bbox_sa(session: Session, lon_min, lat_min, lon_max, lat_max):
    stmt = (
        select(
            SensorReading.sensor_id,
            SensorReading.recorded_at,
            SensorReading.reading_value,
        )
        .where(
            SensorReading.geom.op("&&")(
                func.ST_MakeEnvelope(lon_min, lat_min, lon_max, lat_max, 4326)
            ),
            SensorReading.archived_at.is_(None),
        )
        .order_by(SensorReading.recorded_at.desc())
        .execution_options(yield_per=2000)
    )
    return session.execute(stmt).all()

The key constraint in both versions: geom never appears in the SELECT list. The && bounding-box operator uses the GiST index for spatial filtering but the selected columns (sensor_id, recorded_at, reading_value) come from the INCLUDE payload — no heap visit required.

Configuration and Tuning Knobs

Visibility map maintenance is the most common reason a correctly structured query falls back to heap fetches after initial deployment. Each UPDATE or DELETE invalidates the all-visible flag for affected pages; autovacuum must reclaim those pages before the executor will trust the visibility map again.

Per-table autovacuum overrides for a high-write spatial table:

sql
ALTER TABLE sensor_readings SET (
    autovacuum_vacuum_scale_factor   = 0.01,   -- vacuum at 1% dead tuples (default 20%)
    autovacuum_vacuum_cost_delay     = 0,       -- no throttling; run at full speed
    autovacuum_vacuum_threshold      = 100      -- always vacuum once 100 dead tuples exist
);

Transaction isolation matters too. Serializable and Repeatable Read isolation levels require the executor to re-verify tuple visibility even when the visibility map is current, eliminating the IOS shortcut. For bulk point lookups where IOS is critical, use Read Committed:

python
conn = psycopg2.connect(dsn)
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_READ_COMMITTED)

GUC settings that affect index-only scan selection:

Parameter Recommended value Effect
random_page_cost 1.1 (SSD) / 4.0 (HDD) Lower values make index paths more attractive to the planner
effective_cache_size 50–75% of total RAM Higher values tell the planner more of the index fits in OS cache
work_mem 4MB16MB per connection Relevant for sort nodes above an IOS; does not affect IOS itself

For further guidance on planner cost parameters and interpreting scan-type decisions, see query plan analysis with EXPLAIN.

Verification Steps

Run this query after deploying the covering index to confirm the plan is correct:

sql
EXPLAIN (ANALYZE, BUFFERS)
SELECT sensor_id, recorded_at, reading_value
FROM sensor_readings
WHERE geom && ST_MakeEnvelope(-122.5, 37.70, -122.4, 37.80, 4326)
  AND archived_at IS NULL;

A healthy output looks like this:

Index Only Scan using idx_sensor_readings_active_ios on sensor_readings
  (cost=0.42..18.73 rows=312 width=28)
  (actual time=0.08..1.42 rows=312 loops=1)
  Index Cond: (geom && '...'::geometry)
  Heap Fetches: 0
  Buffers: shared hit=14
Planning Time: 0.3 ms
Execution Time: 1.6 ms

Three things to verify:

  1. The plan node reads Index Only Scan — not Index Scan or Bitmap Index Scan.
  2. Heap Fetches: 0 — any non-zero value means the visibility map has stale pages.
  3. Buffers: shared hit=N with no shared read lines — all data came from shared buffers (the index was already cached).

Track degradation over time with pg_stat_user_indexes:

sql
SELECT
    indexrelname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch          -- non-zero when heap fallback occurred
FROM pg_stat_user_indexes
WHERE relname = 'sensor_readings'
  AND indexrelname = 'idx_sensor_readings_active_ios';

A widening gap between idx_tup_read and idx_tup_fetch is a signal to run VACUUM ANALYZE sensor_readings immediately.

Gotchas Checklist

  • Geometry column in SELECT * — any SELECT * expands to include geom, immediately forcing heap access. Always enumerate columns explicitly, even in ad-hoc queries during debugging.
  • Coordinate extraction breaks IOSST_X(geom), ST_Y(geom), and ST_AsText(geom) all require the full geometry from the heap. If you need coordinates as numbers, add longitude DOUBLE PRECISION and latitude DOUBLE PRECISION columns to the table and list them in INCLUDE instead.
  • ORM identity map expansion — SQLAlchemy’s session.query(SensorReading) fetches every mapped column (including geom) to build the Python object. Use select(SensorReading.sensor_id, ...) with individual column attributes, or switch to Core-style session.execute(stmt) with a select() construct.
  • Bulk insert resets all-visible bits — a COPY or large INSERT batch invalidates the visibility map for every touched page. Run VACUUM ANALYZE sensor_readings immediately after bulk loads if IOS throughput is latency-sensitive.
  • Index bloat after frequent updates — GiST INCLUDE indexes accumulate dead index entries just like heap pages accumulate dead tuples. Schedule REINDEX CONCURRENTLY idx_sensor_readings_active_ios during maintenance windows, or monitor pg_stat_user_indexes.idx_blks_hit for cache-miss anomalies that suggest bloat.