Point data workloads — sensor telemetry, GPS tracks, IoT device readings — are among the highest-frequency spatial queries in production. The narrow scenario this page addresses, within the broader Index-Only Scan Strategies workflow, is the case where every query filters on a geometry column but only ever projects a handful of lightweight scalar attributes: a device ID, a timestamp, a reading value. If the database must visit the heap for each matching row, I/O costs scale linearly with result set size. A properly configured index-only scan eliminates that heap traffic entirely, serving results straight from the GiST index leaf pages.
Why the Naive Approach Fails
The common mistake is treating the geometry column as both the filter predicate and a projected output. It looks harmless:
# Broken: selects geom directly, forces heap fetch for every row
cur.execute("""
SELECT sensor_id, geom
FROM sensor_readings
WHERE geom && ST_MakeEnvelope(%s, %s, %s, %s, 4326)
""", (lon_min, lat_min, lon_max, lat_max))Even with a GiST index on geom, this query cannot use an index-only scan. PostGIS GiST indexes store only the 2D bounding box of each geometry — for polygon and linestring data that bounding box is an approximation, but for point data the bounding box is mathematically identical to the coordinate pair. The full geometry object still lives in the heap (possibly in a TOAST page), so any query that projects geom — or applies functions such as ST_X(), ST_AsText(), or ST_Distance() to it in the SELECT list — forces PostgreSQL to fetch every matching row from the heap. For a query returning 50,000 sensor readings per second, that is 50,000 random-access heap fetches that could have been avoided.
The same failure happens silently when an ORM adds the geometry column to the SELECT list because it is part of the mapped model class:
# Broken: SQLAlchemy loads the full SensorReading object, including geom
rows = session.query(SensorReading).filter(
SensorReading.geom.ST_Intersects(bbox_geom)
).all()Production-Ready Implementation
The fix has two parts: a covering GiST index that stores the projected scalar columns inside the index structure, and an explicit-projection query that never touches geom in the output list.
Step 1 — Create the Covering Index
-- Run once; can be built CONCURRENTLY on a live table.
-- Store only narrow, frequently projected columns in INCLUDE.
-- Keep total INCLUDE size under ~200 bytes to maximise leaf-page density.
CREATE INDEX CONCURRENTLY idx_sensor_readings_ios
ON sensor_readings USING GIST (geom)
INCLUDE (sensor_id, recorded_at, reading_value);For tables that mix active and archived rows, pair this with a WHERE clause to build a partial GiST index that excludes cold data:
CREATE INDEX CONCURRENTLY idx_sensor_readings_active_ios
ON sensor_readings USING GIST (geom)
INCLUDE (sensor_id, recorded_at, reading_value)
WHERE archived_at IS NULL;Step 2 — Query with Explicit Scalar Projection
import psycopg2
import psycopg2.extras
def fetch_sensors_in_bbox(
conn,
lon_min: float, lat_min: float,
lon_max: float, lat_max: float,
verify_ios: bool = True,
) -> list[tuple]:
"""
Fetch sensor_id, recorded_at, reading_value for all readings in the
given bounding box. Projects only INCLUDE-d columns so PostgreSQL can
serve results directly from the GiST index leaf pages.
Args:
conn: psycopg2 connection (Read Committed isolation recommended).
lon_min, lat_min, lon_max, lat_max: bounding box in EPSG:4326.
verify_ios: if True, runs EXPLAIN first and raises if heap fetches > 0.
Returns:
List of (sensor_id, recorded_at, reading_value) tuples.
"""
bbox_params = (lon_min, lat_min, lon_max, lat_max)
data_sql = """
SELECT sensor_id, recorded_at, reading_value
FROM sensor_readings
WHERE geom && ST_MakeEnvelope(%s, %s, %s, %s, 4326)
AND archived_at IS NULL
ORDER BY recorded_at DESC;
"""
if verify_ios:
explain_sql = "EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) " + data_sql
with conn.cursor() as cur:
cur.execute(explain_sql, bbox_params)
plan = cur.fetchone()[0]
heap_fetches = plan[0].get("Plan", {}).get("Heap Fetches", -1)
if heap_fetches != 0:
raise RuntimeError(
f"Index-only scan degraded: Heap Fetches={heap_fetches}. "
"Run VACUUM on sensor_readings or check query projection."
)
with conn.cursor(cursor_factory=psycopg2.extras.NamedTupleCursor) as cur:
# Use a server-side cursor to stream large result sets without
# loading all rows into Python memory at once.
cur.itersize = 2000 # fetch 2000 rows per round-trip
cur.execute(data_sql, bbox_params)
return cur.fetchall()SQLAlchemy 2.0 equivalent — select individual column attributes, not the full mapped object:
from sqlalchemy import select, func, text
from sqlalchemy.orm import Session
def fetch_sensors_in_bbox_sa(session: Session, lon_min, lat_min, lon_max, lat_max):
stmt = (
select(
SensorReading.sensor_id,
SensorReading.recorded_at,
SensorReading.reading_value,
)
.where(
SensorReading.geom.op("&&")(
func.ST_MakeEnvelope(lon_min, lat_min, lon_max, lat_max, 4326)
),
SensorReading.archived_at.is_(None),
)
.order_by(SensorReading.recorded_at.desc())
.execution_options(yield_per=2000)
)
return session.execute(stmt).all()The key constraint in both versions: geom never appears in the SELECT list. The && bounding-box operator uses the GiST index for spatial filtering but the selected columns (sensor_id, recorded_at, reading_value) come from the INCLUDE payload — no heap visit required.
Configuration and Tuning Knobs
Visibility map maintenance is the most common reason a correctly structured query falls back to heap fetches after initial deployment. Each UPDATE or DELETE invalidates the all-visible flag for affected pages; autovacuum must reclaim those pages before the executor will trust the visibility map again.
Per-table autovacuum overrides for a high-write spatial table:
ALTER TABLE sensor_readings SET (
autovacuum_vacuum_scale_factor = 0.01, -- vacuum at 1% dead tuples (default 20%)
autovacuum_vacuum_cost_delay = 0, -- no throttling; run at full speed
autovacuum_vacuum_threshold = 100 -- always vacuum once 100 dead tuples exist
);Transaction isolation matters too. Serializable and Repeatable Read isolation levels require the executor to re-verify tuple visibility even when the visibility map is current, eliminating the IOS shortcut. For bulk point lookups where IOS is critical, use Read Committed:
conn = psycopg2.connect(dsn)
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_READ_COMMITTED)GUC settings that affect index-only scan selection:
| Parameter | Recommended value | Effect |
|---|---|---|
random_page_cost |
1.1 (SSD) / 4.0 (HDD) |
Lower values make index paths more attractive to the planner |
effective_cache_size |
50–75% of total RAM | Higher values tell the planner more of the index fits in OS cache |
work_mem |
4MB–16MB per connection |
Relevant for sort nodes above an IOS; does not affect IOS itself |
For further guidance on planner cost parameters and interpreting scan-type decisions, see query plan analysis with EXPLAIN.
Verification Steps
Run this query after deploying the covering index to confirm the plan is correct:
EXPLAIN (ANALYZE, BUFFERS)
SELECT sensor_id, recorded_at, reading_value
FROM sensor_readings
WHERE geom && ST_MakeEnvelope(-122.5, 37.70, -122.4, 37.80, 4326)
AND archived_at IS NULL;A healthy output looks like this:
Index Only Scan using idx_sensor_readings_active_ios on sensor_readings
(cost=0.42..18.73 rows=312 width=28)
(actual time=0.08..1.42 rows=312 loops=1)
Index Cond: (geom && '...'::geometry)
Heap Fetches: 0
Buffers: shared hit=14
Planning Time: 0.3 ms
Execution Time: 1.6 ms
Three things to verify:
- The plan node reads
Index Only Scan— notIndex ScanorBitmap Index Scan. Heap Fetches: 0— any non-zero value means the visibility map has stale pages.Buffers: shared hit=Nwith noshared readlines — all data came from shared buffers (the index was already cached).
Track degradation over time with pg_stat_user_indexes:
SELECT
indexrelname,
idx_scan,
idx_tup_read,
idx_tup_fetch -- non-zero when heap fallback occurred
FROM pg_stat_user_indexes
WHERE relname = 'sensor_readings'
AND indexrelname = 'idx_sensor_readings_active_ios';A widening gap between idx_tup_read and idx_tup_fetch is a signal to run VACUUM ANALYZE sensor_readings immediately.
Gotchas Checklist
- Geometry column in
SELECT *— anySELECT *expands to includegeom, immediately forcing heap access. Always enumerate columns explicitly, even in ad-hoc queries during debugging. - Coordinate extraction breaks IOS —
ST_X(geom),ST_Y(geom), andST_AsText(geom)all require the full geometry from the heap. If you need coordinates as numbers, addlongitude DOUBLE PRECISIONandlatitude DOUBLE PRECISIONcolumns to the table and list them inINCLUDEinstead. - ORM identity map expansion — SQLAlchemy’s
session.query(SensorReading)fetches every mapped column (includinggeom) to build the Python object. Useselect(SensorReading.sensor_id, ...)with individual column attributes, or switch to Core-stylesession.execute(stmt)with aselect()construct. - Bulk insert resets all-visible bits — a
COPYor largeINSERTbatch invalidates the visibility map for every touched page. RunVACUUM ANALYZE sensor_readingsimmediately after bulk loads if IOS throughput is latency-sensitive. - Index bloat after frequent updates — GiST
INCLUDEindexes accumulate dead index entries just like heap pages accumulate dead tuples. ScheduleREINDEX CONCURRENTLY idx_sensor_readings_active_iosduring maintenance windows, or monitorpg_stat_user_indexes.idx_blks_hitfor cache-miss anomalies that suggest bloat.
Related Topics
- Index-Only Scan Strategies — parent page covering the full IOS implementation workflow, environment prerequisites, and Python driver patterns
- Advanced GiST Indexing & Optimization — top-level reference for GiST architecture, index type selection, and production hardening
- Partial GiST Indexes: Creating Partial Indexes for Active Map Regions — combine with
INCLUDEto keep covering indexes lean by excluding archived rows - Query Plan Analysis with EXPLAIN — interpret cost estimates, buffer counts, and scan-type decisions to diagnose IOS regressions
- Bounding-Box Filtering with the && Operator — the
&&operator used in every example here, with broader context on envelope-based spatial filtering