Problem Statement

Spatial-temporal queries — filtering a table by both a geometry bounding box and a time range — are a daily reality for IoT dashboards, fleet tracking APIs, and geospatial analytics pipelines. The challenge, covered in depth under composite spatial indexes, is that PostgreSQL cannot combine two separate single-column indexes into one coordinated scan; instead it bitmap-ANDs their results, loading far more pages than necessary. This page shows exactly how to collapse both predicates into a single composite GiST index using the btree_gist extension, and how to drive that index from Python using SQLAlchemy 2.0 async and asyncpg.


Two single-column indexes vs. one composite GiST index Left side shows a geometry index and a timestamp index being bitmap-ANDed at query time, loading two sets of pages. Right side shows a single composite GiST index that prunes both dimensions in a single traversal. Two separate single-column indexes idx_location_geom GiST (geometry) idx_recorded_at B-tree (timestamp) bitmap scan bitmap scan Bitmap AND Heap fetch (many pages) 2 index reads + heap re-check One composite GiST index idx_events_geom_ts GiST (geometry, recorded_at) via btree_gist Single traversal: prune bbox + time range Heap fetch (fewer pages) 1 index read, tighter candidate set

Why the Naive Approach Fails

The obvious first attempt is two independent indexes — one GiST on the geometry column and one B-tree on the timestamp column — and letting PostgreSQL combine them at runtime with a Bitmap AND. Here is what that looks like:

sql
-- Naive: two separate indexes
CREATE INDEX idx_sensor_geom ON sensor_events USING GIST (location_geom);
CREATE INDEX idx_sensor_ts   ON sensor_events USING BTREE (recorded_at);
python
# Python: naively fetching all rows that match, then filtering in Python
import psycopg2
from shapely.geometry import box
from shapely import wkb

conn = psycopg2.connect(dsn="postgresql://user:pass@localhost/gisdb")
cur = conn.cursor()

# Fetches every geometry hit; timestamp filter happens client-side
cur.execute(
    "SELECT id, location_geom, recorded_at FROM sensor_events "
    "WHERE location_geom && ST_MakeEnvelope(%s, %s, %s, %s, 4326)",
    (-74.1, 40.6, -73.7, 40.9)
)
rows = cur.fetchall()  # may pull millions of rows across the network
filtered = [r for r in rows if start <= r[2] <= end]  # in-memory filter

This has two compounding problems. First, the Bitmap AND path requires PostgreSQL to read both indexes independently, build two in-memory bitmaps, AND them, and then re-check heap tuples — adding several hundred milliseconds on a 50 M-row table. Second, the Python fetchall() pulls the full geometry-matching set across the wire before any temporal filter runs, burning memory and network bandwidth proportional to the spatial hit count, not the final result count. On a dense urban sensor network covering a major city, a bounding-box query might match 800 000 rows before the time filter reduces them to 2 000.

Production-Ready Implementation

Step 1: Enable btree_gist

PostgreSQL’s GiST framework natively handles spatial bounding boxes but has no built-in operators for scalar types. The btree_gist extension injects B-tree comparison semantics — =, <, >, <=, >=, <>, and range operators — into the GiST operator class, enabling the database to store timestamp extents alongside geometry extents in the same R-tree-like hierarchy.

sql
-- Requires superuser or pg_extension_owner privilege
CREATE EXTENSION IF NOT EXISTS btree_gist;

-- Verify installation
SELECT extname, extversion
FROM pg_extension
WHERE extname = 'btree_gist';

Step 2: Create the Composite Index

Use CONCURRENTLY to build without holding an AccessExclusiveLock on the table. The build will take longer but the table remains fully writable during construction — essential for any production system.

sql
-- Create composite GiST index: geometry column first, then timestamp
-- Geometry-first is optimal when spatial predicates are more selective
CREATE INDEX CONCURRENTLY idx_events_geom_ts
ON public.sensor_events
USING GIST (location_geom, recorded_at);

-- Alternatively, timestamp-first for narrow time-window workloads
-- (e.g. IoT ingestion where queries always target the last 15 minutes)
-- CREATE INDEX CONCURRENTLY idx_events_ts_geom
-- ON public.sensor_events
-- USING GIST (recorded_at, location_geom);

Step 3: Query the Index from Python

Below is a single, copy-paste-ready module covering both SQLAlchemy 2.0 async (for FastAPI / async services) and raw asyncpg (for high-throughput pipeline code). Both patterns use parameterised queries that the planner can cache, and neither materialises more rows than needed.

python
"""
composite_spatial_temporal.py
------------------------------
Query sensor_events using a composite GiST index on (location_geom, recorded_at).
Requires: sqlalchemy>=2.0, asyncpg, geoalchemy2, shapely
"""
from __future__ import annotations

from datetime import datetime
from typing import AsyncIterator

import asyncpg
from geoalchemy2 import Geometry
from geoalchemy2.functions import ST_MakeEnvelope
from sqlalchemy import Column, DateTime, Integer, Float, select, and_
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase


# ---------------------------------------------------------------------------
# ORM model
# ---------------------------------------------------------------------------

class Base(DeclarativeBase):
    pass


class SensorEvent(Base):
    __tablename__ = "sensor_events"

    id           = Column(Integer, primary_key=True)
    location_geom = Column(Geometry(geometry_type="POINT", srid=4326))
    recorded_at  = Column(DateTime(timezone=True), nullable=False)
    value        = Column(Float)


# ---------------------------------------------------------------------------
# SQLAlchemy 2.0 async — for FastAPI / Starlette services
# ---------------------------------------------------------------------------

async def fetch_events_sqlalchemy(
    session: AsyncSession,
    min_lon: float,
    min_lat: float,
    max_lon: float,
    max_lat: float,
    start_ts: datetime,
    end_ts: datetime,
    limit: int = 500,
) -> list[SensorEvent]:
    """
    Uses the composite GiST index on (location_geom, recorded_at).
    ST_MakeEnvelope generates the bounding box with explicit SRID=4326
    so the planner avoids an implicit cast that would bypass the index.
    """
    stmt = (
        select(SensorEvent)
        .where(
            and_(
                SensorEvent.location_geom.op("&&")(
                    ST_MakeEnvelope(min_lon, min_lat, max_lon, max_lat, 4326)
                ),
                SensorEvent.recorded_at.between(start_ts, end_ts),
            )
        )
        .order_by(SensorEvent.recorded_at.desc())
        .limit(limit)
    )
    result = await session.execute(stmt)
    return list(result.scalars().all())


# ---------------------------------------------------------------------------
# asyncpg — for pipeline / ETL code that needs maximum throughput
# Use a server-side cursor (cursor()) to avoid materialising the full result
# ---------------------------------------------------------------------------

async def stream_events_asyncpg(
    pool: asyncpg.Pool,
    min_lon: float,
    min_lat: float,
    max_lon: float,
    max_lat: float,
    start_ts: datetime,
    end_ts: datetime,
    prefetch: int = 200,
) -> AsyncIterator[asyncpg.Record]:
    """
    Streams rows via a server-side cursor; never pulls the full result set
    into memory. prefetch controls the wire batch size per round-trip.
    """
    sql = """
        SELECT id, location_geom::text AS geom_wkt, recorded_at, value
        FROM sensor_events
        WHERE location_geom && ST_MakeEnvelope($1, $2, $3, $4, 4326)
          AND recorded_at BETWEEN $5 AND $6
        ORDER BY recorded_at DESC
    """
    async with pool.acquire() as conn:
        async with conn.transaction():
            async for record in conn.cursor(
                sql,
                min_lon, min_lat, max_lon, max_lat,
                start_ts, end_ts,
                prefetch=prefetch,
            ):
                yield record

Configuration and Tuning Knobs

Setting Recommended value Why it matters
work_mem 16MB64MB Spatial-temporal queries often sort by recorded_at DESC. Enough work_mem keeps the sort in RAM; disk spillage negates the index speed gain.
random_page_cost 1.1 (SSD) / 4.0 (HDD) GiST index scans involve random I/O. If random_page_cost is too high the planner switches to sequential scans even when the composite index would be faster.
effective_cache_size 50–75% of total RAM Tells the planner how much of the index fits in the OS page cache. Too low and it underestimates the benefit of index scans.
statement_timeout 5000 (ms) Guards against runaway queries when a user supplies an enormous bounding box that escapes the temporal filter.

Apply these session-level in Python when you need per-query overrides without touching postgresql.conf:

python
async with pool.acquire() as conn:
    await conn.execute("SET work_mem = '32MB'")
    await conn.execute("SET random_page_cost = 1.1")
    rows = await conn.fetch(
        "SELECT id, recorded_at FROM sensor_events "
        "WHERE location_geom && ST_MakeEnvelope($1,$2,$3,$4,4326) "
        "  AND recorded_at BETWEEN $5 AND $6",
        min_lon, min_lat, max_lon, max_lat, start_ts, end_ts,
    )

Verification Steps

Run this immediately after creating the index to confirm PostgreSQL is using it:

sql
-- 1. Check the index exists and has been scanned
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'sensor_events'
  AND indexrelname = 'idx_events_geom_ts';

-- 2. Verify the planner chooses an Index Scan (not Bitmap Heap Scan or Seq Scan)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT id, recorded_at
FROM sensor_events
WHERE location_geom && ST_MakeEnvelope(-74.05, 40.70, -73.90, 40.80, 4326)
  AND recorded_at BETWEEN '2024-01-01'::timestamptz
                      AND '2024-01-02'::timestamptz;

A healthy output shows Index Scan using idx_events_geom_ts with Buffers: shared hit=N read=M where N dominates (most data served from cache). If you see Bitmap Heap Scan instead, the planner estimated a large result set — narrow your bounding box or time window, or check that ANALYZE has been run recently.

For deeper interpretation of EXPLAIN output, see reading EXPLAIN ANALYZE output for spatial joins.

python
# Python timing comparison: before vs. after the composite index
import time, asyncpg, asyncio

async def benchmark(pool: asyncpg.Pool) -> None:
    sql = (
        "SELECT COUNT(*) FROM sensor_events "
        "WHERE location_geom && ST_MakeEnvelope(-74.05,40.70,-73.90,40.80,4326) "
        "  AND recorded_at BETWEEN '2024-01-01' AND '2024-01-02'"
    )
    t0 = time.perf_counter()
    async with pool.acquire() as conn:
        count = await conn.fetchval(sql)
    elapsed = (time.perf_counter() - t0) * 1000
    print(f"Rows: {count}  |  Query time: {elapsed:.1f} ms")

Gotchas Checklist

  • SRID mismatch bypasses the index. If location_geom is stored as SRID 4326 but you pass ST_MakeEnvelope(...) without an explicit SRID, PostgreSQL inserts an implicit ST_Transform cast, which prevents index use. Always pass the SRID as the fifth argument to ST_MakeEnvelope.

  • timestamp vs timestamptz operator class. The composite index must use the same type as the column. If recorded_at is timestamptz, cast your Python datetime objects to timezone-aware before binding them; a naive datetime may coerce silently and produce wrong results in non-UTC sessions.

  • Stale statistics after bulk loads. A large COPY or batch insert can shift column statistics dramatically. Run ANALYZE sensor_events; after any load of more than ~5% of total row count, or the planner may stick with sequential scans based on outdated histograms.

  • GiST page bloat accumulates faster than B-tree bloat. Heavy UPDATE or DELETE workloads expand bounding box entries without reclaiming space. VACUUM does not compact GiST pages; you need REINDEX CONCURRENTLY idx_events_geom_ts to rebuild the tree. Schedule this monthly, or when pgstattuple reports bloat above 20%.

  • CREATE INDEX CONCURRENTLY cannot run inside a transaction block. If your migration framework wraps DDL in a transaction, the concurrent build will fail. Run index creation outside transaction scope, or use your migration tool’s disable_ddl_transaction flag.