Problem Statement
Spatial-temporal queries — filtering a table by both a geometry bounding box and a time range — are a daily reality for IoT dashboards, fleet tracking APIs, and geospatial analytics pipelines. The challenge, covered in depth under composite spatial indexes, is that PostgreSQL cannot combine two separate single-column indexes into one coordinated scan; instead it bitmap-ANDs their results, loading far more pages than necessary. This page shows exactly how to collapse both predicates into a single composite GiST index using the btree_gist extension, and how to drive that index from Python using SQLAlchemy 2.0 async and asyncpg.
Why the Naive Approach Fails
The obvious first attempt is two independent indexes — one GiST on the geometry column and one B-tree on the timestamp column — and letting PostgreSQL combine them at runtime with a Bitmap AND. Here is what that looks like:
-- Naive: two separate indexes
CREATE INDEX idx_sensor_geom ON sensor_events USING GIST (location_geom);
CREATE INDEX idx_sensor_ts ON sensor_events USING BTREE (recorded_at);# Python: naively fetching all rows that match, then filtering in Python
import psycopg2
from shapely.geometry import box
from shapely import wkb
conn = psycopg2.connect(dsn="postgresql://user:pass@localhost/gisdb")
cur = conn.cursor()
# Fetches every geometry hit; timestamp filter happens client-side
cur.execute(
"SELECT id, location_geom, recorded_at FROM sensor_events "
"WHERE location_geom && ST_MakeEnvelope(%s, %s, %s, %s, 4326)",
(-74.1, 40.6, -73.7, 40.9)
)
rows = cur.fetchall() # may pull millions of rows across the network
filtered = [r for r in rows if start <= r[2] <= end] # in-memory filterThis has two compounding problems. First, the Bitmap AND path requires PostgreSQL to read both indexes independently, build two in-memory bitmaps, AND them, and then re-check heap tuples — adding several hundred milliseconds on a 50 M-row table. Second, the Python fetchall() pulls the full geometry-matching set across the wire before any temporal filter runs, burning memory and network bandwidth proportional to the spatial hit count, not the final result count. On a dense urban sensor network covering a major city, a bounding-box query might match 800 000 rows before the time filter reduces them to 2 000.
Production-Ready Implementation
Step 1: Enable btree_gist
PostgreSQL’s GiST framework natively handles spatial bounding boxes but has no built-in operators for scalar types. The btree_gist extension injects B-tree comparison semantics — =, <, >, <=, >=, <>, and range operators — into the GiST operator class, enabling the database to store timestamp extents alongside geometry extents in the same R-tree-like hierarchy.
-- Requires superuser or pg_extension_owner privilege
CREATE EXTENSION IF NOT EXISTS btree_gist;
-- Verify installation
SELECT extname, extversion
FROM pg_extension
WHERE extname = 'btree_gist';Step 2: Create the Composite Index
Use CONCURRENTLY to build without holding an AccessExclusiveLock on the table. The build will take longer but the table remains fully writable during construction — essential for any production system.
-- Create composite GiST index: geometry column first, then timestamp
-- Geometry-first is optimal when spatial predicates are more selective
CREATE INDEX CONCURRENTLY idx_events_geom_ts
ON public.sensor_events
USING GIST (location_geom, recorded_at);
-- Alternatively, timestamp-first for narrow time-window workloads
-- (e.g. IoT ingestion where queries always target the last 15 minutes)
-- CREATE INDEX CONCURRENTLY idx_events_ts_geom
-- ON public.sensor_events
-- USING GIST (recorded_at, location_geom);Step 3: Query the Index from Python
Below is a single, copy-paste-ready module covering both SQLAlchemy 2.0 async (for FastAPI / async services) and raw asyncpg (for high-throughput pipeline code). Both patterns use parameterised queries that the planner can cache, and neither materialises more rows than needed.
"""
composite_spatial_temporal.py
------------------------------
Query sensor_events using a composite GiST index on (location_geom, recorded_at).
Requires: sqlalchemy>=2.0, asyncpg, geoalchemy2, shapely
"""
from __future__ import annotations
from datetime import datetime
from typing import AsyncIterator
import asyncpg
from geoalchemy2 import Geometry
from geoalchemy2.functions import ST_MakeEnvelope
from sqlalchemy import Column, DateTime, Integer, Float, select, and_
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase
# ---------------------------------------------------------------------------
# ORM model
# ---------------------------------------------------------------------------
class Base(DeclarativeBase):
pass
class SensorEvent(Base):
__tablename__ = "sensor_events"
id = Column(Integer, primary_key=True)
location_geom = Column(Geometry(geometry_type="POINT", srid=4326))
recorded_at = Column(DateTime(timezone=True), nullable=False)
value = Column(Float)
# ---------------------------------------------------------------------------
# SQLAlchemy 2.0 async — for FastAPI / Starlette services
# ---------------------------------------------------------------------------
async def fetch_events_sqlalchemy(
session: AsyncSession,
min_lon: float,
min_lat: float,
max_lon: float,
max_lat: float,
start_ts: datetime,
end_ts: datetime,
limit: int = 500,
) -> list[SensorEvent]:
"""
Uses the composite GiST index on (location_geom, recorded_at).
ST_MakeEnvelope generates the bounding box with explicit SRID=4326
so the planner avoids an implicit cast that would bypass the index.
"""
stmt = (
select(SensorEvent)
.where(
and_(
SensorEvent.location_geom.op("&&")(
ST_MakeEnvelope(min_lon, min_lat, max_lon, max_lat, 4326)
),
SensorEvent.recorded_at.between(start_ts, end_ts),
)
)
.order_by(SensorEvent.recorded_at.desc())
.limit(limit)
)
result = await session.execute(stmt)
return list(result.scalars().all())
# ---------------------------------------------------------------------------
# asyncpg — for pipeline / ETL code that needs maximum throughput
# Use a server-side cursor (cursor()) to avoid materialising the full result
# ---------------------------------------------------------------------------
async def stream_events_asyncpg(
pool: asyncpg.Pool,
min_lon: float,
min_lat: float,
max_lon: float,
max_lat: float,
start_ts: datetime,
end_ts: datetime,
prefetch: int = 200,
) -> AsyncIterator[asyncpg.Record]:
"""
Streams rows via a server-side cursor; never pulls the full result set
into memory. prefetch controls the wire batch size per round-trip.
"""
sql = """
SELECT id, location_geom::text AS geom_wkt, recorded_at, value
FROM sensor_events
WHERE location_geom && ST_MakeEnvelope($1, $2, $3, $4, 4326)
AND recorded_at BETWEEN $5 AND $6
ORDER BY recorded_at DESC
"""
async with pool.acquire() as conn:
async with conn.transaction():
async for record in conn.cursor(
sql,
min_lon, min_lat, max_lon, max_lat,
start_ts, end_ts,
prefetch=prefetch,
):
yield recordConfiguration and Tuning Knobs
| Setting | Recommended value | Why it matters |
|---|---|---|
work_mem |
16MB–64MB |
Spatial-temporal queries often sort by recorded_at DESC. Enough work_mem keeps the sort in RAM; disk spillage negates the index speed gain. |
random_page_cost |
1.1 (SSD) / 4.0 (HDD) |
GiST index scans involve random I/O. If random_page_cost is too high the planner switches to sequential scans even when the composite index would be faster. |
effective_cache_size |
50–75% of total RAM | Tells the planner how much of the index fits in the OS page cache. Too low and it underestimates the benefit of index scans. |
statement_timeout |
5000 (ms) |
Guards against runaway queries when a user supplies an enormous bounding box that escapes the temporal filter. |
Apply these session-level in Python when you need per-query overrides without touching postgresql.conf:
async with pool.acquire() as conn:
await conn.execute("SET work_mem = '32MB'")
await conn.execute("SET random_page_cost = 1.1")
rows = await conn.fetch(
"SELECT id, recorded_at FROM sensor_events "
"WHERE location_geom && ST_MakeEnvelope($1,$2,$3,$4,4326) "
" AND recorded_at BETWEEN $5 AND $6",
min_lon, min_lat, max_lon, max_lat, start_ts, end_ts,
)Verification Steps
Run this immediately after creating the index to confirm PostgreSQL is using it:
-- 1. Check the index exists and has been scanned
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'sensor_events'
AND indexrelname = 'idx_events_geom_ts';
-- 2. Verify the planner chooses an Index Scan (not Bitmap Heap Scan or Seq Scan)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT id, recorded_at
FROM sensor_events
WHERE location_geom && ST_MakeEnvelope(-74.05, 40.70, -73.90, 40.80, 4326)
AND recorded_at BETWEEN '2024-01-01'::timestamptz
AND '2024-01-02'::timestamptz;A healthy output shows Index Scan using idx_events_geom_ts with Buffers: shared hit=N read=M where N dominates (most data served from cache). If you see Bitmap Heap Scan instead, the planner estimated a large result set — narrow your bounding box or time window, or check that ANALYZE has been run recently.
For deeper interpretation of EXPLAIN output, see reading EXPLAIN ANALYZE output for spatial joins.
# Python timing comparison: before vs. after the composite index
import time, asyncpg, asyncio
async def benchmark(pool: asyncpg.Pool) -> None:
sql = (
"SELECT COUNT(*) FROM sensor_events "
"WHERE location_geom && ST_MakeEnvelope(-74.05,40.70,-73.90,40.80,4326) "
" AND recorded_at BETWEEN '2024-01-01' AND '2024-01-02'"
)
t0 = time.perf_counter()
async with pool.acquire() as conn:
count = await conn.fetchval(sql)
elapsed = (time.perf_counter() - t0) * 1000
print(f"Rows: {count} | Query time: {elapsed:.1f} ms")Gotchas Checklist
-
SRID mismatch bypasses the index. If
location_geomis stored as SRID 4326 but you passST_MakeEnvelope(...)without an explicit SRID, PostgreSQL inserts an implicitST_Transformcast, which prevents index use. Always pass the SRID as the fifth argument toST_MakeEnvelope. -
timestampvstimestamptzoperator class. The composite index must use the same type as the column. Ifrecorded_atistimestamptz, cast your Pythondatetimeobjects to timezone-aware before binding them; a naivedatetimemay coerce silently and produce wrong results in non-UTC sessions. -
Stale statistics after bulk loads. A large
COPYor batch insert can shift column statistics dramatically. RunANALYZE sensor_events;after any load of more than ~5% of total row count, or the planner may stick with sequential scans based on outdated histograms. -
GiST page bloat accumulates faster than B-tree bloat. Heavy
UPDATEorDELETEworkloads expand bounding box entries without reclaiming space.VACUUMdoes not compact GiST pages; you needREINDEX CONCURRENTLY idx_events_geom_tsto rebuild the tree. Schedule this monthly, or whenpgstattuplereports bloat above 20%. -
CREATE INDEX CONCURRENTLYcannot run inside a transaction block. If your migration framework wraps DDL in a transaction, the concurrent build will fail. Run index creation outside transaction scope, or use your migration tool’sdisable_ddl_transactionflag.
Related Topics
- Composite Spatial Indexes — parent page covering the full design space of multi-column GiST indexes
- Advanced GiST Indexing & Optimization — top-level guide to GiST operator classes, partial indexes, and planner cost model
- Partial GiST Indexes for Active Map Regions — narrow the index to only the rows your queries actually touch
- Reading EXPLAIN ANALYZE Output for Spatial Joins — interpret the planner output produced by the verification query above
- Leveraging Index-Only Scans for Point Data — eliminate heap fetches entirely for point-geometry workloads