Bounding box filtering is the foundational spatial pre-filtering technique in Mastering Core Spatial Query Patterns: it prunes millions of candidate rows using fast, index-backed envelope comparisons before any expensive geometric predicate runs. In production pipelines handling millions of features, evaluating precise topology against every row is computationally prohibitive. Instead, PostGIS compares axis-aligned minimum bounding rectangles (MBRs) using the && operator, which maps directly to the GiST index and returns results in milliseconds. This page walks through every step — from index validation to exact post-filtering — and explains the failure modes that silently eliminate index usage.


Two-phase spatial filtering pipeline Diagram showing all rows entering a GiST index bounding box filter (&&), producing a candidate set, which then passes through an exact predicate (ST_Intersects) to yield the final result set. All rows N features in table GiST Index && operator MBR overlap check ST_MakeEnvelope index-only scan Candidates small subset may include false positives Exact filter ST_Intersects or ST_DWithin final result set Phase 1 — fast prune Phase 2 — exact match

Prerequisites and Infrastructure Validation

Skipping any of these steps typically causes the query planner to fall back to a full sequential scan, silent coordinate mismatches, or unpredictable cardinality estimates. Verify each item before writing queries.

Required extensions

sql
-- Confirm PostGIS is installed (requires 3.2+)
SELECT postgis_full_version();

-- If missing:
CREATE EXTENSION IF NOT EXISTS postgis;

GiST index on the geometry column

A spatial index is mandatory for bounding box pruning. Verify it exists, then create it if not:

sql
-- Check for an existing GiST index on the target table
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'spatial_features'
  AND indexdef ILIKE '%gist%';

-- Create if absent (CONCURRENTLY avoids a table lock on live data)
CREATE INDEX CONCURRENTLY idx_features_geom_gist
ON spatial_features USING GIST (geom);

For deeper strategies — including partial indexes that cover only active regions or composite indexes that combine a geometry column with a timestamp — see Advanced GiST Indexing and Optimization.

Python packages

bash
pip install psycopg2-binary "shapely>=2.0" geojson

Use asyncpg in place of psycopg2-binary for high-concurrency async workloads. shapely>=2.0 is required for the vectorised geometry operations used in post-filtering.

SRID consistency check

sql
-- Verify the stored SRID of the geometry column
SELECT Find_SRID('public', 'spatial_features', 'geom');

All query envelopes must match this SRID. Mixing 4326 (WGS84) with 3857 (Web Mercator) without an explicit ST_Transform disables index scans and forces costly row-level reprojections.


Core Execution Workflow

The bounding box filtering workflow separates into two phases: a fast, index-backed envelope prune and a precise topology check on the reduced candidate set.

Step 1 — Validate the Index Exists and is Active

Before executing any spatial query, confirm the index is present and has been scanned at least once:

sql
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE indexrelname = 'idx_features_geom_gist';

A fresh index shows idx_scan = 0 until the first query runs. If idx_scan remains zero after queries, the planner is bypassing the index — investigate statistics freshness first.

Step 2 — Define the Search Envelope in Python

Construct a rectangular search area from your coordinates. For web mapping applications, derive bounds directly from the frontend viewport:

python
from shapely.geometry import box

# Viewport bounds from a map library (e.g., Mapbox or Leaflet)
minx, miny, maxx, maxy = -122.45, 37.72, -122.38, 37.78

# Validate — swapped coordinates produce invalid envelopes
assert minx < maxx, "minx must be less than maxx"
assert miny < maxy, "miny must be less than maxy"

search_envelope = box(minx, miny, maxx, maxy)

Always validate that minx < maxx and miny < maxy. Swapped coordinates silently produce degenerate envelopes that either bypass the index or return zero rows.

Step 3 — Construct the Index-Aware SQL Query

The && operator compares MBRs and is the only operator that maps directly to the GiST operator class. ST_MakeEnvelope constructs the query rectangle with an explicit SRID, which the planner requires for index selectivity estimation:

sql
SELECT id, name, geom
FROM spatial_features
WHERE geom && ST_MakeEnvelope(%s, %s, %s, %s, 4326);

For full details on operator class selection, index-only scans, and operator class tuning for this query pattern, see Optimizing Bounding Box Queries with the && Operator.

The && operator intentionally returns false positives: geometries whose envelopes overlap the search rectangle but do not geometrically intersect it. This is by design — the next step eliminates them.

Step 4 — Execute with Parameterized Queries

Never interpolate raw coordinates into SQL strings. Parameterized queries prevent injection, enable query plan reuse, and ensure the driver sends coordinates as the correct numeric type:

python
import psycopg2
from psycopg2.extras import RealDictCursor

conn = psycopg2.connect(dsn="postgresql://user:pass@localhost/gisdb")

query = """
    SELECT id, name, ST_AsText(geom) AS geom_wkt
    FROM spatial_features
    WHERE geom && ST_MakeEnvelope(%s, %s, %s, %s, 4326);
"""

with conn.cursor(cursor_factory=RealDictCursor) as cursor:
    cursor.execute(query, (minx, miny, maxx, maxy))
    candidates = cursor.fetchall()

For large result sets (beyond ~10,000 rows), replace fetchall() with a server-side cursor to avoid loading the full candidate set into Python memory at once:

python
with conn.cursor(name="bbox_cursor", cursor_factory=RealDictCursor) as cursor:
    cursor.itersize = 500
    cursor.execute(query, (minx, miny, maxx, maxy))
    for row in cursor:
        process(row)

Step 5 — Apply Exact Post-Filtering

&& returns the candidate set. Apply a precise spatial predicate to discard false positives. You can do this in Python using Shapely or push it back to PostGIS in a single query:

Option A — push exact filter to PostGIS (preferred for large candidate sets):

sql
SELECT id, name, geom
FROM spatial_features
WHERE geom && ST_MakeEnvelope(%s, %s, %s, %s, 4326)
  AND ST_Intersects(geom, ST_MakeEnvelope(%s, %s, %s, %s, 4326));

PostGIS evaluates ST_Intersects only on the rows already pruned by &&, so the exact predicate touches a small fraction of the table. This pattern scales naturally into spatial joins, where bounding box pre-filtering prevents O(N×M) topology explosions.

Option B — post-filter in Python using Shapely:

python
from shapely import wkt

exact_matches = []
for row in candidates:
    geom = wkt.loads(row["geom_wkt"])
    if geom.is_valid and search_envelope.intersects(geom):
        exact_matches.append(row)

Use Option B when you need to run additional Python-side business logic on each feature before deciding whether to include it, avoiding a second round-trip to the database.

Step 6 — Verify the Execution Plan

After writing the query, confirm the planner uses the index rather than a sequential scan:

sql
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, name, geom
FROM spatial_features
WHERE geom && ST_MakeEnvelope(-122.45, 37.72, -122.38, 37.78, 4326);

Expected output pattern:

text
Index Scan using idx_features_geom_gist on spatial_features
  (cost=0.28..8.30 rows=12 width=120) (actual time=0.041..0.198 rows=9 loops=1)
  Index Cond: (geom && '0103000020E6100000...'::geometry)
  Buffers: shared hit=15
Planning Time: 0.312 ms
Execution Time: 0.231 ms

Key signals: Index Scan using idx_features_geom_gist is present, Buffers: shared hit is low relative to table size, and actual rows is a small fraction of total rows. For a deep walkthrough of reading EXPLAIN (ANALYZE, BUFFERS) output for spatial queries, see Reading EXPLAIN ANALYZE Output for Spatial Joins.


Performance Considerations

Statistics Freshness and Planner Cost Estimation

The query planner uses column statistics to estimate how selective && will be. After bulk loads or significant data changes, statistics go stale and the planner may underestimate index selectivity, choosing a sequential scan instead. Refresh statistics explicitly:

sql
ANALYZE spatial_features;

For geometry-heavy tables, the default default_statistics_target = 100 is often insufficient. Increase it per-column for better selectivity estimates:

sql
ALTER TABLE spatial_features
  ALTER COLUMN geom SET STATISTICS 500;
ANALYZE spatial_features;

Relevant GUC Settings

GUC Typical value Effect on spatial queries
random_page_cost 1.1 (SSD) Lower values favor index scans over sequential scans
effective_cache_size 75% RAM Higher values tell the planner more pages are cached, favoring index paths
work_mem 64MB–256MB Affects sort and hash operations in spatial joins

Set these per-session for spatial-heavy workloads if you cannot change them cluster-wide:

sql
SET random_page_cost = 1.1;
SET effective_cache_size = '8GB';

Large Geometry Bloat

Complex polygon boundaries — detailed coastlines, parcel datasets with thousands of vertices — produce bounding boxes that cover large areas. This degrades prune efficiency because many geometries whose MBRs overlap the search box do not actually intersect it. Mitigate with:

sql
-- Break large polygons into smaller index-friendly chunks
INSERT INTO spatial_features_subdivided (geom, parent_id)
SELECT ST_Subdivide(geom, 256), id
FROM spatial_features
WHERE ST_NPoints(geom) > 1000;

-- Rebuild the index on the subdivided table
CREATE INDEX idx_subdivided_geom_gist
ON spatial_features_subdivided USING GIST (geom);

Common Failure Modes and Fixes

Sequential Scan Instead of Index Scan

Diagnosis: EXPLAIN output shows Seq Scan on spatial_features instead of Index Scan using idx_features_geom_gist.

Causes and fixes:

  1. Missing index. Run the index creation SQL in the Prerequisites section.
  2. Stale statistics. Run ANALYZE spatial_features;.
  3. SRID mismatch. The SRID in ST_MakeEnvelope does not match the column’s declared SRID. The planner cannot use the index when it must apply an implicit cast. Fix by matching SRIDs or using ST_Transform.
  4. Very small table. The planner correctly chooses a sequential scan when the table has fewer than a few hundred rows — the index overhead exceeds the scan cost.
  5. Low random_page_cost. On spinning disks, the planner may still prefer a sequential scan. Lower random_page_cost to reflect SSD storage: SET random_page_cost = 1.1;.

SRID Mismatch Producing Zero Results or Wrong Rows

Diagnosis: Query returns zero rows despite features clearly existing in the viewport, or returns rows far outside the expected area.

sql
-- Check column SRID
SELECT Find_SRID('public', 'spatial_features', 'geom');

-- Check SRID of a sample geometry
SELECT ST_SRID(geom) FROM spatial_features LIMIT 1;

If your application receives coordinates in EPSG:4326 but the column stores EPSG:3857, transform the envelope once rather than each row:

sql
SELECT id, name, geom
FROM spatial_features
WHERE geom && ST_Transform(ST_MakeEnvelope(%s, %s, %s, %s, 4326), 3857)
  AND ST_Intersects(geom, ST_Transform(ST_MakeEnvelope(%s, %s, %s, %s, 4326), 3857));

Python Memory Exhaustion on Large Candidate Sets

Diagnosis: Python process memory climbs steeply during query execution; MemoryError or garbage collection pauses on large viewports.

Fix: Use a named server-side cursor with itersize set to control how many rows are fetched per network round-trip:

python
conn.autocommit = False  # server-side cursors require an open transaction

with conn.cursor(name="bbox_stream") as cursor:
    cursor.itersize = 500
    cursor.execute(query, params)
    for row in cursor:
        process(row)

Server-side cursors keep the result set on the PostgreSQL server and stream rows in batches, capping Python’s peak memory to roughly itersize × row_size.

Index Not Used After Bulk Insert

After large bulk inserts, the index may become bloated and the statistics may be severely out of date. Detect bloat and refresh:

sql
-- Check index size vs table size
SELECT
    pg_size_pretty(pg_relation_size('idx_features_geom_gist')) AS index_size,
    pg_size_pretty(pg_total_relation_size('spatial_features')) AS table_size;

-- Refresh statistics
ANALYZE spatial_features;

-- Rebuild index if bloat is significant (non-blocking on PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_features_geom_gist;

Verification

Confirm the implementation is working correctly with these checks:

sql
-- 1. Row count sanity check: candidates should be a small fraction of total rows
SELECT COUNT(*) AS total FROM spatial_features;

SELECT COUNT(*) AS candidates
FROM spatial_features
WHERE geom && ST_MakeEnvelope(-122.45, 37.72, -122.38, 37.78, 4326);

-- 2. Timing comparison: index scan vs forced sequential scan
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT id FROM spatial_features
WHERE geom && ST_MakeEnvelope(-122.45, 37.72, -122.38, 37.78, 4326);

-- Force a sequential scan to measure baseline (never do this in production)
SET enable_indexscan = off;
EXPLAIN (ANALYZE, BUFFERS)
SELECT id FROM spatial_features
WHERE geom && ST_MakeEnvelope(-122.45, 37.72, -122.38, 37.78, 4326);
RESET enable_indexscan;

A healthy implementation shows the indexed query executing 10x–1000x faster than the forced sequential scan, depending on table size and data distribution. The index scan’s actual rows should be close to the final result count after exact filtering — large discrepancies indicate that envelope overlap is high and ST_Subdivide may be worth applying.

For proximity-based use cases that build directly on this pattern, see ST_DWithin Radius Searches, which adds distance-based pruning on top of the same bounding box foundation. When applying bounding box filtering across two tables simultaneously, the workflow extends into Spatial Joins.