Problem Statement
This page covers a narrow but high-impact scenario: your map_pois table (or equivalent spatial table) holds millions of geometry rows, but your application only ever queries a geographically or administratively bounded subset — the “active” region. Without targeted index design, every viewport query either scans the full GiST index or, worse, falls back to a sequential scan. The fix lives in partial GiST indexes, which limit index coverage to rows matching a fixed WHERE predicate, shrinking the index tree and reducing both query latency and write overhead for the rows that matter.
Why the Naive Approach Fails
A common first attempt is to create a full GiST index and rely on the query planner to “figure it out”:
-- Full GiST index — works, but indexes every row including archived/inactive ones
CREATE INDEX idx_map_pois_geom ON map_pois USING GIST (geom);Then from Python:
# Naive: fetchall pulls the entire result into memory before iteration
cur.execute("""
SELECT id, ST_AsText(geom), status
FROM map_pois
WHERE status = 'active'
AND ST_Intersects(geom, ST_MakeEnvelope(%s, %s, %s, %s, 4326))
""", (minx, miny, maxx, maxy))
rows = cur.fetchall() # OOM risk on large active setsTwo problems here. First, the full GiST index contains every geometry — including hundreds of thousands of archived, draft, and soft-deleted rows that will never appear in your active viewport. The index tree is larger than necessary, consumes more shared buffers, and takes longer to scan. Second, fetchall() pulls the entire matching set into process memory in one go, which becomes an OOM risk once the active region contains tens of thousands of geometries. The sections below fix both problems.
Production-Ready Implementation
Step 1 — Prerequisites and Validation
Confirm that PostGIS is installed and that the geometry column carries an explicit SRID. Mismatched or missing SRIDs are the single most common cause of index non-use.
-- Confirm PostGIS is installed
SELECT postgis_full_version();
-- Confirm SRID on the geometry column (must not be 0 or -1)
SELECT f_table_name, f_geometry_column, srid, type
FROM geometry_columns
WHERE f_table_name = 'map_pois';
-- If SRID is missing, fix it before creating the index
ALTER TABLE map_pois
ALTER COLUMN geom TYPE geometry(Point, 4326)
USING ST_SetSRID(geom, 4326);Step 2 — Index Creation
A status-only predicate is the simplest and most flexible form: any query that filters status = 'active' can use it, regardless of the spatial envelope being queried.
-- Partial GiST index covering only active rows
CREATE INDEX CONCURRENTLY idx_map_pois_active
ON map_pois USING GIST (geom)
WHERE status = 'active';CONCURRENTLY avoids an AccessShareLock on the table, so you can build the index in production without blocking reads or writes. It does require two table scans and cannot run inside a transaction block.
To further restrict by a fixed geographic region — for example, a metropolitan area that your application always scopes queries to — add a spatial predicate to the WHERE clause:
-- Partial index scoped to active POIs within the San Francisco metro bounding box
CREATE INDEX CONCURRENTLY idx_map_pois_active_sf
ON map_pois USING GIST (geom)
WHERE status = 'active'
AND geom && ST_MakeEnvelope(-122.52, 37.60, -122.33, 37.82, 4326);Keep the envelope coordinates literal constants, not function calls or parameters. The planner caches index predicate statistics at CREATE INDEX time; a dynamic or parameterized WHERE clause would produce an index that the planner can never confidently select.
Step 3 — Python Query with Server-Side Cursor
The matching Python query must include the same predicate columns so the planner can apply logical implication. Use a named cursor (server-side) to stream results row-by-row instead of loading the full result set into memory.
import psycopg2
from typing import Iterator, Tuple, Any
def stream_active_region(
conn: psycopg2.extensions.connection,
minx: float,
miny: float,
maxx: float,
maxy: float,
srid: int = 4326,
itersize: int = 500,
) -> Iterator[Tuple[Any, ...]]:
"""
Stream active-region POIs through a server-side cursor.
The WHERE clause mirrors the partial index predicate (status = 'active')
so the planner selects idx_map_pois_active rather than doing a seq scan.
The && pre-filter is listed explicitly before ST_Intersects so PostGIS
applies the cheap bounding-box test first, then refines with the exact
geometry check only for candidates that pass.
"""
sql = """
SELECT id, ST_AsGeoJSON(geom) AS geom_json, status
FROM map_pois
WHERE status = 'active'
AND geom && ST_MakeEnvelope(%(minx)s, %(miny)s, %(maxx)s, %(maxy)s, %(srid)s)
AND ST_Intersects(
geom,
ST_MakeEnvelope(%(minx)s, %(miny)s, %(maxx)s, %(maxy)s, %(srid)s)
)
"""
params = {"minx": minx, "miny": miny, "maxx": maxx, "maxy": maxy, "srid": srid}
# Named cursor → server-side; itersize controls rows fetched per round-trip
with conn.cursor("active_region_cursor") as cur:
cur.itersize = itersize
cur.execute(sql, params)
for row in cur:
yield rowNamed psycopg2 cursors keep the result set on the server and fetch itersize rows per network round-trip. For viewport queries returning thousands of features, this keeps the Python process memory flat.
Step 4 — Full Worked Example (connect → index → query)
import psycopg2
import psycopg2.extras
DSN = "host=localhost dbname=gisdb user=gis password=secret"
def setup_index(conn: psycopg2.extensions.connection) -> None:
"""Idempotent: creates the partial index only if it does not already exist."""
with conn.cursor() as cur:
cur.execute("""
SELECT 1
FROM pg_indexes
WHERE tablename = 'map_pois'
AND indexname = 'idx_map_pois_active'
""")
if cur.fetchone() is None:
# Must run outside a transaction block for CONCURRENTLY
old_isolation = conn.isolation_level
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
cur.execute("""
CREATE INDEX CONCURRENTLY idx_map_pois_active
ON map_pois USING GIST (geom)
WHERE status = 'active'
""")
conn.set_isolation_level(old_isolation)
def main() -> None:
conn = psycopg2.connect(DSN)
setup_index(conn)
# San Francisco downtown viewport
results = list(stream_active_region(
conn,
minx=-122.42, miny=37.77,
maxx=-122.39, maxy=37.80,
))
print(f"Features returned: {len(results)}")
conn.close()
if __name__ == "__main__":
main()How the Query Planner Selects the Index
PostgreSQL uses logical implication, not string matching, to decide whether a partial index applies. If the index predicate is status = 'active', any query whose WHERE clause logically implies that condition (whether alone or combined with additional filters) can use the index. A query that does not filter on status at all cannot use it.
The diagram below shows the two-phase evaluation path PostGIS follows when both the partial index and the && bounding-box operator are in play:
Configuration and Tuning Knobs
| Setting | Recommended value | Reason |
|---|---|---|
work_mem |
16MB–64MB per query |
Allows sort and hash steps in spatial joins to avoid spilling to disk |
random_page_cost |
1.1–2.0 on SSD |
Lower value encourages index scans over sequential scans; default 4.0 is tuned for spinning disk |
effective_cache_size |
50–75% of total RAM | Signals to the planner how much of the index can be expected in OS cache |
cursor.itersize (psycopg2) |
200–1000 |
Controls rows fetched per round-trip in server-side cursors; tune to balance latency and memory |
Apply these in postgresql.conf (or via ALTER SYSTEM) and reload with SELECT pg_reload_conf();. Do not set enable_seqscan = off in production — it masks the root cause of planner regressions.
Verification Steps
After creating the index and running a representative query, confirm the planner is using it:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT id, ST_AsGeoJSON(geom) AS geom_json
FROM map_pois
WHERE status = 'active'
AND geom && ST_MakeEnvelope(-122.42, 37.77, -122.39, 37.80, 4326)
AND ST_Intersects(geom, ST_MakeEnvelope(-122.42, 37.77, -122.39, 37.80, 4326));Look for these signals in the output:
Index Scan using idx_map_pois_active on map_pois— the partial index is selected.Index Cond: ((geom && '...'::geometry))— bounding-box filter applied at index level.Filter: (status = 'active')appearing here instead of inIndex Condmeans the planner is treating it as a filter on the full index rather than using the partial one. In that case, runANALYZE map_pois;to refresh statistics, then re-check.Rows Removed by Index Recheck: 0or a low number — tight envelope alignment, few false positives.
For a quick row-count sanity check from Python:
def verify_index_usage(conn: psycopg2.extensions.connection) -> None:
with conn.cursor() as cur:
cur.execute("""
SELECT idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE indexname = 'idx_map_pois_active'
""")
row = cur.fetchone()
if row:
print(f"Index scans: {row[0]}, tuples read: {row[1]}, tuples fetched: {row[2]}")
else:
print("Index not found in pg_stat_user_indexes — check index name.")Run the spatial query a few times, then call verify_index_usage. idx_scan should increment with each query. If it stays at zero, the planner is not selecting the index.
Gotchas Checklist
- SRID mismatch breaks index usage. If
ST_MakeEnvelopeuses SRID4326but the column stores geometries in a projected CRS such as3857, PostgreSQL inserts an implicitST_Transformcall, which prevents the index from being used. Keep SRIDs consistent throughout, or use bounding-box filtering with an explicit cast. CREATE INDEX CONCURRENTLYcannot run in a transaction block. If you wrap the DDL in aBEGIN/COMMITblock, PostgreSQL will raise an error. SetAUTOCOMMITmode on the connection before issuing the statement (see thesetup_indexfunction above).- Predicate column updates cause index thrashing. If rows toggle between
status = 'active'andstatus = 'archived'frequently — for example, during a bulk status migration — every update triggers an index insert and delete. Batch updates and scheduleVACUUM ANALYZEimmediately after. - The active subset must be a minority of the table. Once the indexed subset exceeds roughly 60% of all rows, a full GiST index is smaller and the planner will prefer it anyway. Verify selectivity with
SELECT count(*) FILTER (WHERE status = 'active') * 100.0 / count(*) FROM map_pois;. - Index does not cover
ORDER BY geom <-> pointqueries. The<->KNN distance operator used in KNN nearest-neighbor queries requires a full GiST index scan without a bounding-box pre-filter. Keep a separate full GiST index for KNN workloads alongside the partial index. REINDEXwithoutCONCURRENTLYtakes an exclusive lock. During maintenance windows, useREINDEX INDEX CONCURRENTLY idx_map_pois_active;to avoid blocking application queries. For query plan analysis after reindexing, runANALYZE map_pois;before comparing EXPLAIN output.
Related Topics
- Partial GiST Indexes — parent page covering the full range of partial index strategies for PostGIS
- Advanced GiST Indexing & Optimization — the broader indexing context this technique fits into
- Query Plan Analysis with EXPLAIN — how to read EXPLAIN ANALYZE output for spatial queries
- Bounding-Box Filtering — complementary approach using the
&&operator directly in high-throughput viewport queries - Composite Spatial Indexes — when you need to index geometry alongside a non-spatial column such as a timestamp