Does column order in a composite GiST index matter?

Yes. PostgreSQL evaluates the left-most column first when pruning index branches. Place the column whose predicate eliminates more rows as the first column — geometry first for map-tile workloads, timestamp first for narrow time-window IoT queries.

Why does the query planner ignore my composite GiST index?

The planner skips the index when both columns are not referenced with supported operators, when table statistics are stale (run ANALYZE), when random_page_cost is set too high relative to seq_page_cost, or when the result set is large enough that a sequential scan is cheaper.

Can I combine geometry with timestamptz instead of timestamp?

Yes. btree_gist supports both timestamp and timestamptz. Use timestamptz for any data that crosses time zones — it avoids silent offset errors when comparing values stored in different sessions with different TimeZone settings.

Composite GiST Indexes for Geometry and Timestamp Columns

Problem Statement

Spatial-temporal queries — filtering a table by both a geometry bounding box and a time range — are a daily reality for IoT dashboards, fleet tracking APIs, and geospatial analytics pipelines. The challenge, covered in depth under composite spatial indexes, is that PostgreSQL cannot combine two separate single-column indexes into one coordinated scan; instead it bitmap-ANDs their results, loading far more pages than necessary. This page shows exactly how to collapse both predicates into a single composite GiST index using the btree_gist extension, and how to drive that index from Python using SQLAlchemy 2.0 async and asyncpg.

Why the Naive Approach Fails

The obvious first attempt is two independent indexes — one GiST on the geometry column and one B-tree on the timestamp column — and letting PostgreSQL combine them at runtime with a Bitmap AND. Here is what that looks like:

sql

-- Naive: two separate indexes
CREATE INDEX idx_sensor_geom ON sensor_events USING GIST (location_geom);
CREATE INDEX idx_sensor_ts   ON sensor_events USING BTREE (recorded_at);

python

# Python: naively fetching all rows that match, then filtering in Python
import psycopg2
from shapely.geometry import box
from shapely import wkb

conn = psycopg2.connect(dsn="postgresql://user:pass@localhost/gisdb")
cur = conn.cursor()

# Fetches every geometry hit; timestamp filter happens client-side
cur.execute(
    "SELECT id, location_geom, recorded_at FROM sensor_events "
    "WHERE location_geom && ST_MakeEnvelope(%s, %s, %s, %s, 4326)",
    (-74.1, 40.6, -73.7, 40.9)
)
rows = cur.fetchall()  # may pull millions of rows across the network
filtered = [r for r in rows if start <= r[2] <= end]  # in-memory filter

This has two compounding problems. First, the Bitmap AND path requires PostgreSQL to read both indexes independently, build two in-memory bitmaps, AND them, and then re-check heap tuples — adding several hundred milliseconds on a 50 M-row table. Second, the Python fetchall() pulls the full geometry-matching set across the wire before any temporal filter runs, burning memory and network bandwidth proportional to the spatial hit count, not the final result count. On a dense urban sensor network covering a major city, a bounding-box query might match 800 000 rows before the time filter reduces them to 2 000.

Production-Ready Implementation

Step 1: Enable `btree_gist`

PostgreSQL’s GiST framework natively handles spatial bounding boxes but has no built-in operators for scalar types. The btree_gist extension injects B-tree comparison semantics — =, <, >, <=, >=, <>, and range operators — into the GiST operator class, enabling the database to store timestamp extents alongside geometry extents in the same R-tree-like hierarchy.

sql

-- Requires superuser or pg_extension_owner privilege
CREATE EXTENSION IF NOT EXISTS btree_gist;

-- Verify installation
SELECT extname, extversion
FROM pg_extension
WHERE extname = 'btree_gist';

Step 2: Create the Composite Index

Use CONCURRENTLY to build without holding an AccessExclusiveLock on the table. The build will take longer but the table remains fully writable during construction — essential for any production system.

sql

-- Create composite GiST index: geometry column first, then timestamp
-- Geometry-first is optimal when spatial predicates are more selective
CREATE INDEX CONCURRENTLY idx_events_geom_ts
ON public.sensor_events
USING GIST (location_geom, recorded_at);

-- Alternatively, timestamp-first for narrow time-window workloads
-- (e.g. IoT ingestion where queries always target the last 15 minutes)
-- CREATE INDEX CONCURRENTLY idx_events_ts_geom
-- ON public.sensor_events
-- USING GIST (recorded_at, location_geom);

Step 3: Query the Index from Python

Below is a single, copy-paste-ready module covering both SQLAlchemy 2.0 async (for FastAPI / async services) and raw asyncpg (for high-throughput pipeline code). Both patterns use parameterised queries that the planner can cache, and neither materialises more rows than needed.

python

"""
composite_spatial_temporal.py
------------------------------
Query sensor_events using a composite GiST index on (location_geom, recorded_at).
Requires: sqlalchemy>=2.0, asyncpg, geoalchemy2, shapely
"""
from __future__ import annotations

from datetime import datetime
from typing import AsyncIterator

import asyncpg
from geoalchemy2 import Geometry
from geoalchemy2.functions import ST_MakeEnvelope
from sqlalchemy import Column, DateTime, Integer, Float, select, and_
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase


# ---------------------------------------------------------------------------
# ORM model
# ---------------------------------------------------------------------------

class Base(DeclarativeBase):
    pass


class SensorEvent(Base):
    __tablename__ = "sensor_events"

    id           = Column(Integer, primary_key=True)
    location_geom = Column(Geometry(geometry_type="POINT", srid=4326))
    recorded_at  = Column(DateTime(timezone=True), nullable=False)
    value        = Column(Float)


# ---------------------------------------------------------------------------
# SQLAlchemy 2.0 async — for FastAPI / Starlette services
# ---------------------------------------------------------------------------

async def fetch_events_sqlalchemy(
    session: AsyncSession,
    min_lon: float,
    min_lat: float,
    max_lon: float,
    max_lat: float,
    start_ts: datetime,
    end_ts: datetime,
    limit: int = 500,
) -> list[SensorEvent]:
    """
    Uses the composite GiST index on (location_geom, recorded_at).
    ST_MakeEnvelope generates the bounding box with explicit SRID=4326
    so the planner avoids an implicit cast that would bypass the index.
    """
    stmt = (
        select(SensorEvent)
        .where(
            and_(
                SensorEvent.location_geom.op("&&")(
                    ST_MakeEnvelope(min_lon, min_lat, max_lon, max_lat, 4326)
                ),
                SensorEvent.recorded_at.between(start_ts, end_ts),
            )
        )
        .order_by(SensorEvent.recorded_at.desc())
        .limit(limit)
    )
    result = await session.execute(stmt)
    return list(result.scalars().all())


# ---------------------------------------------------------------------------
# asyncpg — for pipeline / ETL code that needs maximum throughput
# Use a server-side cursor (cursor()) to avoid materialising the full result
# ---------------------------------------------------------------------------

async def stream_events_asyncpg(
    pool: asyncpg.Pool,
    min_lon: float,
    min_lat: float,
    max_lon: float,
    max_lat: float,
    start_ts: datetime,
    end_ts: datetime,
    prefetch: int = 200,
) -> AsyncIterator[asyncpg.Record]:
    """
    Streams rows via a server-side cursor; never pulls the full result set
    into memory. prefetch controls the wire batch size per round-trip.
    """
    sql = """
        SELECT id, location_geom::text AS geom_wkt, recorded_at, value
        FROM sensor_events
        WHERE location_geom && ST_MakeEnvelope($1, $2, $3, $4, 4326)
          AND recorded_at BETWEEN $5 AND $6
        ORDER BY recorded_at DESC
    """
    async with pool.acquire() as conn:
        async with conn.transaction():
            async for record in conn.cursor(
                sql,
                min_lon, min_lat, max_lon, max_lat,
                start_ts, end_ts,
                prefetch=prefetch,
            ):
                yield record

Configuration and Tuning Knobs

Setting	Recommended value	Why it matters
`work_mem`	`16MB`–`64MB`	Spatial-temporal queries often sort by `recorded_at DESC`. Enough `work_mem` keeps the sort in RAM; disk spillage negates the index speed gain.
`random_page_cost`	`1.1` (SSD) / `4.0` (HDD)	GiST index scans involve random I/O. If `random_page_cost` is too high the planner switches to sequential scans even when the composite index would be faster.
`effective_cache_size`	50–75% of total RAM	Tells the planner how much of the index fits in the OS page cache. Too low and it underestimates the benefit of index scans.
`statement_timeout`	`5000` (ms)	Guards against runaway queries when a user supplies an enormous bounding box that escapes the temporal filter.

Apply these session-level in Python when you need per-query overrides without touching postgresql.conf:

python

async with pool.acquire() as conn:
    await conn.execute("SET work_mem = '32MB'")
    await conn.execute("SET random_page_cost = 1.1")
    rows = await conn.fetch(
        "SELECT id, recorded_at FROM sensor_events "
        "WHERE location_geom && ST_MakeEnvelope($1,$2,$3,$4,4326) "
        "  AND recorded_at BETWEEN $5 AND $6",
        min_lon, min_lat, max_lon, max_lat, start_ts, end_ts,
    )

Verification Steps

Run this immediately after creating the index to confirm PostgreSQL is using it:

sql

-- 1. Check the index exists and has been scanned
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'sensor_events'
  AND indexrelname = 'idx_events_geom_ts';

-- 2. Verify the planner chooses an Index Scan (not Bitmap Heap Scan or Seq Scan)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT id, recorded_at
FROM sensor_events
WHERE location_geom && ST_MakeEnvelope(-74.05, 40.70, -73.90, 40.80, 4326)
  AND recorded_at BETWEEN '2024-01-01'::timestamptz
                      AND '2024-01-02'::timestamptz;

A healthy output shows Index Scan using idx_events_geom_ts with Buffers: shared hit=N read=M where N dominates (most data served from cache). If you see Bitmap Heap Scan instead, the planner estimated a large result set — narrow your bounding box or time window, or check that ANALYZE has been run recently.

For deeper interpretation of EXPLAIN output, see reading EXPLAIN ANALYZE output for spatial joins.

python

# Python timing comparison: before vs. after the composite index
import time, asyncpg, asyncio

async def benchmark(pool: asyncpg.Pool) -> None:
    sql = (
        "SELECT COUNT(*) FROM sensor_events "
        "WHERE location_geom && ST_MakeEnvelope(-74.05,40.70,-73.90,40.80,4326) "
        "  AND recorded_at BETWEEN '2024-01-01' AND '2024-01-02'"
    )
    t0 = time.perf_counter()
    async with pool.acquire() as conn:
        count = await conn.fetchval(sql)
    elapsed = (time.perf_counter() - t0) * 1000
    print(f"Rows: {count}  |  Query time: {elapsed:.1f} ms")

Gotchas Checklist

SRID mismatch bypasses the index. If location_geom is stored as SRID 4326 but you pass ST_MakeEnvelope(...) without an explicit SRID, PostgreSQL inserts an implicit ST_Transform cast, which prevents index use. Always pass the SRID as the fifth argument to ST_MakeEnvelope.
timestamp vs timestamptz operator class. The composite index must use the same type as the column. If recorded_at is timestamptz, cast your Python datetime objects to timezone-aware before binding them; a naive datetime may coerce silently and produce wrong results in non-UTC sessions.
Stale statistics after bulk loads. A large COPY or batch insert can shift column statistics dramatically. Run ANALYZE sensor_events; after any load of more than ~5% of total row count, or the planner may stick with sequential scans based on outdated histograms.
GiST page bloat accumulates faster than B-tree bloat. Heavy UPDATE or DELETE workloads expand bounding box entries without reclaiming space. VACUUM does not compact GiST pages; you need REINDEX CONCURRENTLY idx_events_geom_ts to rebuild the tree. Schedule this monthly, or when pgstattuple reports bloat above 20%.
CREATE INDEX CONCURRENTLY cannot run inside a transaction block. If your migration framework wraps DDL in a transaction, the concurrent build will fail. Run index creation outside transaction scope, or use your migration tool’s disable_ddl_transaction flag.

Composite Spatial Indexes — parent page covering the full design space of multi-column GiST indexes
Advanced GiST Indexing & Optimization — top-level guide to GiST operator classes, partial indexes, and planner cost model
Partial GiST Indexes for Active Map Regions — narrow the index to only the rows your queries actually touch
Reading EXPLAIN ANALYZE Output for Spatial Joins — interpret the planner output produced by the verification query above
Leveraging Index-Only Scans for Point Data — eliminate heap fetches entirely for point-geometry workloads