Problem Statement

This page covers a narrow but high-impact scenario: your map_pois table (or equivalent spatial table) holds millions of geometry rows, but your application only ever queries a geographically or administratively bounded subset — the “active” region. Without targeted index design, every viewport query either scans the full GiST index or, worse, falls back to a sequential scan. The fix lives in partial GiST indexes, which limit index coverage to rows matching a fixed WHERE predicate, shrinking the index tree and reducing both query latency and write overhead for the rows that matter.

Why the Naive Approach Fails

A common first attempt is to create a full GiST index and rely on the query planner to “figure it out”:

sql
-- Full GiST index — works, but indexes every row including archived/inactive ones
CREATE INDEX idx_map_pois_geom ON map_pois USING GIST (geom);

Then from Python:

python
# Naive: fetchall pulls the entire result into memory before iteration
cur.execute("""
    SELECT id, ST_AsText(geom), status
    FROM map_pois
    WHERE status = 'active'
      AND ST_Intersects(geom, ST_MakeEnvelope(%s, %s, %s, %s, 4326))
""", (minx, miny, maxx, maxy))
rows = cur.fetchall()   # OOM risk on large active sets

Two problems here. First, the full GiST index contains every geometry — including hundreds of thousands of archived, draft, and soft-deleted rows that will never appear in your active viewport. The index tree is larger than necessary, consumes more shared buffers, and takes longer to scan. Second, fetchall() pulls the entire matching set into process memory in one go, which becomes an OOM risk once the active region contains tens of thousands of geometries. The sections below fix both problems.

Production-Ready Implementation

Step 1 — Prerequisites and Validation

Confirm that PostGIS is installed and that the geometry column carries an explicit SRID. Mismatched or missing SRIDs are the single most common cause of index non-use.

sql
-- Confirm PostGIS is installed
SELECT postgis_full_version();

-- Confirm SRID on the geometry column (must not be 0 or -1)
SELECT f_table_name, f_geometry_column, srid, type
FROM geometry_columns
WHERE f_table_name = 'map_pois';

-- If SRID is missing, fix it before creating the index
ALTER TABLE map_pois
  ALTER COLUMN geom TYPE geometry(Point, 4326)
  USING ST_SetSRID(geom, 4326);

Step 2 — Index Creation

A status-only predicate is the simplest and most flexible form: any query that filters status = 'active' can use it, regardless of the spatial envelope being queried.

sql
-- Partial GiST index covering only active rows
CREATE INDEX CONCURRENTLY idx_map_pois_active
ON map_pois USING GIST (geom)
WHERE status = 'active';

CONCURRENTLY avoids an AccessShareLock on the table, so you can build the index in production without blocking reads or writes. It does require two table scans and cannot run inside a transaction block.

To further restrict by a fixed geographic region — for example, a metropolitan area that your application always scopes queries to — add a spatial predicate to the WHERE clause:

sql
-- Partial index scoped to active POIs within the San Francisco metro bounding box
CREATE INDEX CONCURRENTLY idx_map_pois_active_sf
ON map_pois USING GIST (geom)
WHERE status = 'active'
  AND geom && ST_MakeEnvelope(-122.52, 37.60, -122.33, 37.82, 4326);

Keep the envelope coordinates literal constants, not function calls or parameters. The planner caches index predicate statistics at CREATE INDEX time; a dynamic or parameterized WHERE clause would produce an index that the planner can never confidently select.

Step 3 — Python Query with Server-Side Cursor

The matching Python query must include the same predicate columns so the planner can apply logical implication. Use a named cursor (server-side) to stream results row-by-row instead of loading the full result set into memory.

python
import psycopg2
from typing import Iterator, Tuple, Any

def stream_active_region(
    conn: psycopg2.extensions.connection,
    minx: float,
    miny: float,
    maxx: float,
    maxy: float,
    srid: int = 4326,
    itersize: int = 500,
) -> Iterator[Tuple[Any, ...]]:
    """
    Stream active-region POIs through a server-side cursor.

    The WHERE clause mirrors the partial index predicate (status = 'active')
    so the planner selects idx_map_pois_active rather than doing a seq scan.

    The && pre-filter is listed explicitly before ST_Intersects so PostGIS
    applies the cheap bounding-box test first, then refines with the exact
    geometry check only for candidates that pass.
    """
    sql = """
        SELECT id, ST_AsGeoJSON(geom) AS geom_json, status
        FROM map_pois
        WHERE status = 'active'
          AND geom && ST_MakeEnvelope(%(minx)s, %(miny)s, %(maxx)s, %(maxy)s, %(srid)s)
          AND ST_Intersects(
                geom,
                ST_MakeEnvelope(%(minx)s, %(miny)s, %(maxx)s, %(maxy)s, %(srid)s)
              )
    """
    params = {"minx": minx, "miny": miny, "maxx": maxx, "maxy": maxy, "srid": srid}

    # Named cursor → server-side; itersize controls rows fetched per round-trip
    with conn.cursor("active_region_cursor") as cur:
        cur.itersize = itersize
        cur.execute(sql, params)
        for row in cur:
            yield row

Named psycopg2 cursors keep the result set on the server and fetch itersize rows per network round-trip. For viewport queries returning thousands of features, this keeps the Python process memory flat.

Step 4 — Full Worked Example (connect → index → query)

python
import psycopg2
import psycopg2.extras

DSN = "host=localhost dbname=gisdb user=gis password=secret"

def setup_index(conn: psycopg2.extensions.connection) -> None:
    """Idempotent: creates the partial index only if it does not already exist."""
    with conn.cursor() as cur:
        cur.execute("""
            SELECT 1
            FROM pg_indexes
            WHERE tablename = 'map_pois'
              AND indexname = 'idx_map_pois_active'
        """)
        if cur.fetchone() is None:
            # Must run outside a transaction block for CONCURRENTLY
            old_isolation = conn.isolation_level
            conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
            cur.execute("""
                CREATE INDEX CONCURRENTLY idx_map_pois_active
                ON map_pois USING GIST (geom)
                WHERE status = 'active'
            """)
            conn.set_isolation_level(old_isolation)


def main() -> None:
    conn = psycopg2.connect(DSN)

    setup_index(conn)

    # San Francisco downtown viewport
    results = list(stream_active_region(
        conn,
        minx=-122.42, miny=37.77,
        maxx=-122.39, maxy=37.80,
    ))
    print(f"Features returned: {len(results)}")

    conn.close()


if __name__ == "__main__":
    main()

How the Query Planner Selects the Index

PostgreSQL uses logical implication, not string matching, to decide whether a partial index applies. If the index predicate is status = 'active', any query whose WHERE clause logically implies that condition (whether alone or combined with additional filters) can use the index. A query that does not filter on status at all cannot use it.

The diagram below shows the two-phase evaluation path PostGIS follows when both the partial index and the && bounding-box operator are in play:

Two-phase spatial query evaluation flow A flowchart showing how PostgreSQL evaluates a spatial query: first the partial GiST index applies the status predicate and bounding-box filter, then ST_Intersects performs the exact geometry check on surviving candidates. Phase 1 — GiST index scan Predicate: status = 'active' Operator: geom && envelope (bounding-box intersection) candidates Phase 2 — exact geometry check Function: ST_Intersects(geom, env) Recheck eliminates false positives from bounding-box overlap Result rows streamed via server-side cursor Index predicate and query predicate must share status = 'active' for planner to select the partial index. Rows outside the partial index predicate are never read from the index tree.

Configuration and Tuning Knobs

Setting Recommended value Reason
work_mem 16MB64MB per query Allows sort and hash steps in spatial joins to avoid spilling to disk
random_page_cost 1.12.0 on SSD Lower value encourages index scans over sequential scans; default 4.0 is tuned for spinning disk
effective_cache_size 50–75% of total RAM Signals to the planner how much of the index can be expected in OS cache
cursor.itersize (psycopg2) 2001000 Controls rows fetched per round-trip in server-side cursors; tune to balance latency and memory

Apply these in postgresql.conf (or via ALTER SYSTEM) and reload with SELECT pg_reload_conf();. Do not set enable_seqscan = off in production — it masks the root cause of planner regressions.

Verification Steps

After creating the index and running a representative query, confirm the planner is using it:

sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT id, ST_AsGeoJSON(geom) AS geom_json
FROM map_pois
WHERE status = 'active'
  AND geom && ST_MakeEnvelope(-122.42, 37.77, -122.39, 37.80, 4326)
  AND ST_Intersects(geom, ST_MakeEnvelope(-122.42, 37.77, -122.39, 37.80, 4326));

Look for these signals in the output:

  • Index Scan using idx_map_pois_active on map_pois — the partial index is selected.
  • Index Cond: ((geom && '...'::geometry)) — bounding-box filter applied at index level.
  • Filter: (status = 'active') appearing here instead of in Index Cond means the planner is treating it as a filter on the full index rather than using the partial one. In that case, run ANALYZE map_pois; to refresh statistics, then re-check.
  • Rows Removed by Index Recheck: 0 or a low number — tight envelope alignment, few false positives.

For a quick row-count sanity check from Python:

python
def verify_index_usage(conn: psycopg2.extensions.connection) -> None:
    with conn.cursor() as cur:
        cur.execute("""
            SELECT idx_scan, idx_tup_read, idx_tup_fetch
            FROM pg_stat_user_indexes
            WHERE indexname = 'idx_map_pois_active'
        """)
        row = cur.fetchone()
        if row:
            print(f"Index scans: {row[0]}, tuples read: {row[1]}, tuples fetched: {row[2]}")
        else:
            print("Index not found in pg_stat_user_indexes — check index name.")

Run the spatial query a few times, then call verify_index_usage. idx_scan should increment with each query. If it stays at zero, the planner is not selecting the index.

Gotchas Checklist

  • SRID mismatch breaks index usage. If ST_MakeEnvelope uses SRID 4326 but the column stores geometries in a projected CRS such as 3857, PostgreSQL inserts an implicit ST_Transform call, which prevents the index from being used. Keep SRIDs consistent throughout, or use bounding-box filtering with an explicit cast.
  • CREATE INDEX CONCURRENTLY cannot run in a transaction block. If you wrap the DDL in a BEGIN/COMMIT block, PostgreSQL will raise an error. Set AUTOCOMMIT mode on the connection before issuing the statement (see the setup_index function above).
  • Predicate column updates cause index thrashing. If rows toggle between status = 'active' and status = 'archived' frequently — for example, during a bulk status migration — every update triggers an index insert and delete. Batch updates and schedule VACUUM ANALYZE immediately after.
  • The active subset must be a minority of the table. Once the indexed subset exceeds roughly 60% of all rows, a full GiST index is smaller and the planner will prefer it anyway. Verify selectivity with SELECT count(*) FILTER (WHERE status = 'active') * 100.0 / count(*) FROM map_pois;.
  • Index does not cover ORDER BY geom <-> point queries. The <-> KNN distance operator used in KNN nearest-neighbor queries requires a full GiST index scan without a bounding-box pre-filter. Keep a separate full GiST index for KNN workloads alongside the partial index.
  • REINDEX without CONCURRENTLY takes an exclusive lock. During maintenance windows, use REINDEX INDEX CONCURRENTLY idx_map_pois_active; to avoid blocking application queries. For query plan analysis after reindexing, run ANALYZE map_pois; before comparing EXPLAIN output.