Spatial workloads introduce execution characteristics that standard relational queries rarely surface. When PostGIS evaluates geometric predicates, the query planner must weigh bounding box intersections against exact geometry recheck overhead, balance index selectivity against sequential I/O cost, and adjust for statistics that age rapidly under bulk spatial loads. This page, part of the Advanced GiST Indexing & Optimization section, establishes a repeatable workflow for capturing, parsing, and acting on execution plans in PostGIS environments — treating plan output as structured telemetry rather than opaque terminal text.
Without inspecting the actual execution tree, engineers frequently misattribute latency to network overhead or Python serialisation when the real bottleneck is a missing GiST index, stale planner statistics, or an ORM that silently bypasses indexable operators. The techniques here give you the deterministic visibility needed to validate index utilisation, enforce performance SLAs in CI/CD pipelines, and resolve spatial query regressions before they reach production.
Prerequisites and Infrastructure Validation
Before capturing execution plans programmatically, confirm these baseline requirements.
Required extensions and versions:
- PostgreSQL 14+ with PostGIS 3.2+. Later versions include improved planner cost models for spatial operators and more accurate selectivity estimates for geometry columns.
pg_stat_statementsenabled (optional but strongly recommended). It surfaces baseline query frequency and total execution time, letting you triage which spatial queries most deserve targetedEXPLAINruns.
Verify extensions:
SELECT name, default_version, installed_version
FROM pg_available_extensions
WHERE name IN ('postgis', 'pg_stat_statements')
ORDER BY name;Python packages:
psycopg2-binary>=2.9 # or psycopg>=3.1 for async workloads
Binary drivers are strongly preferred for spatial workloads — they avoid serialisation overhead when handling large coordinate arrays returned from ST_AsText() or ST_AsBinary().
Verify GiST index existence before analysis:
-- Check that spatial indexes exist on the target table before running EXPLAIN
SELECT
indexname,
indexdef,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE tablename = 'parcels'
AND indexdef ILIKE '%gist%'
ORDER BY indexname;If no GiST index is listed, the planner has nothing to consider for spatial acceleration — sequential scans are guaranteed. Create the index before proceeding:
CREATE INDEX CONCURRENTLY idx_parcels_geom
ON public.parcels
USING GIST (geom);Core Execution Workflow
Step 1: Capture a Baseline Plan
Start with EXPLAIN (FORMAT JSON) against your target query. JSON output eliminates formatting inconsistencies across PostgreSQL versions and is directly consumable by Python’s json module without text parsing.
EXPLAIN (FORMAT JSON)
SELECT p.parcel_id, z.zone_code
FROM public.parcels p
JOIN public.zoning z
ON ST_Intersects(p.geom, z.geom)
WHERE p.status = 'active'
AND z.zone_type = 'RESIDENTIAL';This returns the estimated plan — no query execution occurs. Use it to confirm that the planner intends to use a GiST index before spending time on ANALYZE runs.
Step 2: Capture Real Execution Metrics
Run EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) to force full execution and capture real timing, actual row counts, and buffer hit/miss ratios.
import json
import psycopg2
from psycopg2.extras import RealDictCursor
SPATIAL_QUERY = """
SELECT p.parcel_id, z.zone_code
FROM public.parcels p
JOIN public.zoning z
ON p.geom && z.geom
AND ST_Intersects(p.geom, z.geom)
WHERE p.status = %s
AND z.zone_type = %s
"""
def capture_explain(conn_str: str, query: str, params: tuple) -> dict:
"""
Runs EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) and returns the parsed plan.
Always run on a representative dataset — empty tables produce misleading cost estimates.
"""
explain_sql = f"EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) {query}"
with psycopg2.connect(conn_str) as conn:
with conn.cursor(cursor_factory=RealDictCursor) as cur:
cur.execute(explain_sql, params)
raw = cur.fetchone()
if not raw or "QUERY PLAN" not in raw:
raise ValueError("Unexpected EXPLAIN output structure")
# PostgreSQL returns the plan inside a list
return raw["QUERY PLAN"][0]Step 3: Parse the Plan Tree in Python
EXPLAIN JSON is a recursive tree of plan nodes. A recursive parser lets you flatten it into a list of metrics regardless of join depth.
def extract_nodes(node: dict, path: str = "root") -> list[dict]:
"""
Recursively extracts execution metrics from every node in a PostgreSQL
EXPLAIN ANALYZE JSON plan, preserving the node path for traceability.
"""
node_type = node.get("Node Type", "Unknown")
record = {
"path": path,
"node_type": node_type,
"startup_cost": node.get("Startup Cost"),
"total_cost": node.get("Total Cost"),
"plan_rows": node.get("Plan Rows"),
"actual_rows": node.get("Actual Rows"),
"actual_time_ms": node.get("Actual Total Time"),
"loops": node.get("Loops", 1),
"index_name": node.get("Index Name"),
"rows_removed_by_filter": node.get("Rows Removed by Filter", 0),
"rows_removed_by_recheck": node.get("Rows Removed by Index Recheck", 0),
"shared_hit_blocks": node.get("Shared Hit Blocks", 0),
"shared_read_blocks": node.get("Shared Read Blocks", 0),
}
results = [record]
for child in node.get("Plans", []):
child_path = f"{path} → {child.get('Node Type', '?')}"
results.extend(extract_nodes(child, child_path))
return results
def analyse_spatial_query(conn_str: str, query: str, params: tuple) -> list[dict]:
plan = capture_explain(conn_str, query, params)
return extract_nodes(plan["Plan"])Step 4: Identify Index Usage and Bottlenecks
With node metrics in hand, apply targeted heuristics for spatial workloads:
def audit_plan(nodes: list[dict]) -> None:
"""
Prints actionable warnings for common spatial plan anti-patterns.
"""
for n in nodes:
node_type = n["node_type"]
path = n["path"]
# Sequential scan on what may be a large table
if node_type == "Seq Scan":
print(f"[WARN] Seq Scan at '{path}' — verify GiST index exists "
"and that random_page_cost reflects your storage medium.")
# Actual row count deviates significantly from planner estimate
if n["plan_rows"] and n["actual_rows"] is not None:
ratio = n["actual_rows"] / max(n["plan_rows"], 1)
if ratio > 10 or ratio < 0.1:
print(f"[WARN] Row estimate mismatch at '{path}': "
f"planned={n['plan_rows']}, actual={n['actual_rows']}. "
"Run ANALYZE to refresh statistics.")
# High recheck count signals poor bounding-box selectivity
if n["rows_removed_by_recheck"] > 500:
print(f"[WARN] High recheck at '{path}': "
f"{n['rows_removed_by_recheck']} rows removed after index scan. "
"Consider a partial GiST index or tighter spatial predicate.")
# Expensive filter outside the index
if n["rows_removed_by_filter"] > 1000:
print(f"[WARN] High filter removal at '{path}': "
f"{n['rows_removed_by_filter']} rows discarded post-scan. "
"A composite or partial index may help.")Step 5: Verify the Corrected Plan
After applying index adjustments or statistics refreshes, re-run the analysis and assert that the plan now uses a GiST index and that row estimates have improved:
def assert_index_used(nodes: list[dict], expected_index: str) -> bool:
"""
Confirms that the expected GiST index appears in at least one plan node.
Raises AssertionError if the planner fell back to a sequential scan instead.
"""
used = [n["index_name"] for n in nodes if n["index_name"]]
assert expected_index in used, (
f"Index '{expected_index}' not used. "
f"Indexes found: {used or 'none'}. "
"Run ANALYZE and verify the query predicate matches the index definition."
)
return TruePerformance Considerations
Reading EXPLAIN Output: Index Scan vs Sequential Scan
A well-optimised spatial query plan looks like this (abbreviated text format for illustration):
Nested Loop (cost=0.28..312.44 rows=12 width=40)
(actual time=0.241..4.312 rows=9 loops=1)
-> Index Scan using idx_zoning_geom on zoning z
Index Cond: (geom && p.geom)
Filter: ((zone_type)::text = 'RESIDENTIAL'::text)
-> Index Scan using idx_parcels_geom on parcels p
Index Cond: (geom && z.geom)
Filter: (status = 'active'::text)
Rows Removed by Filter: 2
Planning Time: 1.2 ms
Execution Time: 4.7 ms
Key indicators:
Index Scan using idx_parcels_geom— GiST index is active.Rows Removed by Index Recheck: 0— bounding box filter is selective enough that no false positives escaped to exact predicate evaluation.actual rows=9 / planned rows=12— planner estimate is within an acceptable range; statistics are current.
A degraded plan shows Seq Scan instead of Index Scan, a high Rows Removed by Index Recheck count, or a large gap between planned and actual rows.
Relevant GUC Settings
| Setting | Relevance | Typical adjustment |
|---|---|---|
random_page_cost |
Controls planner’s cost of a random disk page. Lower values favour index scans. | 1.1 on SSD, 4.0 (default) on HDD |
effective_cache_size |
Tells the planner how much OS page cache is available. Larger values make index scans more attractive. | Set to ~75% of total RAM |
default_statistics_target |
Controls histogram bucket count for ANALYZE. Geometry columns benefit from higher values. |
200–500 for spatial columns |
work_mem |
Memory available for sort and hash operations within a spatial join. | 64MB–256MB per connection for join-heavy workloads |
Set statistics target per column rather than globally to avoid inflating planning time for simple queries:
ALTER TABLE public.parcels
ALTER COLUMN geom SET STATISTICS 400;
ANALYZE public.parcels;Common Failure Modes and Fixes
Stale Statistics Causing Row Estimate Skew
Symptom: plan_rows is orders of magnitude off from actual_rows. The planner may choose a hash join where a nested loop is faster, or vice versa.
Diagnosis:
SELECT relname, n_live_tup, n_dead_tup, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE relname = 'parcels';Fix:
ANALYZE public.parcels;
-- Or increase autovacuum sensitivity for high-churn tables:
ALTER TABLE public.parcels SET (autovacuum_analyze_scale_factor = 0.02);Planner Choosing Sequential Scan Despite GiST Index
Symptom: Seq Scan appears in the plan even though \d parcels shows the GiST index exists.
Diagnosis:
-- Force index use to compare estimated costs (development only)
SET enable_seqscan = off;
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT ... FROM public.parcels WHERE geom && ST_MakeEnvelope(-74.1, 40.6, -73.9, 40.8, 4326);
RESET enable_seqscan;If the forced-index plan is dramatically faster, the statistics are misleading the planner. Run ANALYZE and lower random_page_cost.
Common causes and fixes:
- Low selectivity query: The planner correctly estimates that more than ~15–20% of the table will match, making sequential I/O cheaper. Narrow the spatial predicate or add a non-spatial filter to reduce the estimated row count.
random_page_costtoo high for SSD: Setrandom_page_cost = 1.1inpostgresql.confor per-session.- Index bloat after heavy DML: Run
REINDEX INDEX CONCURRENTLY idx_parcels_geomto rebuild the index without a table lock.
High Recheck Overhead on GiST Bitmap Scans
Symptom: Rows Removed by Index Recheck is large relative to actual_rows. The plan shows Bitmap Index Scan on the GiST index followed by a Recheck Cond in a Bitmap Heap Scan.
What is happening: GiST indexes store bounding boxes, not exact geometries. When the planner uses a Bitmap Index Scan (common for larger result sets), it marks pages containing candidate rows. During the Bitmap Heap Scan, each candidate must be rechecked against the exact geometry predicate. Densely packed or irregularly shaped geometries cause many false positives.
Fix:
-- Use a tighter bounding box pre-filter before the exact predicate
SELECT p.parcel_id, z.zone_code
FROM public.parcels p
JOIN public.zoning z
ON p.geom && z.geom -- bounding box filter (uses GiST index)
AND ST_Intersects(p.geom, z.geom) -- exact predicate (recheck)
WHERE p.status = 'active';For workloads where recheck overhead consistently dominates, partial GiST indexes that pre-filter on the status column reduce the candidate set before the bounding box stage.
ORM-Generated SQL Bypassing Indexable Operators
Symptom: Your Python ORM generates a query that looks correct, but EXPLAIN shows a sequential scan. The SQL contains ST_Distance(geom, ?) < threshold instead of ST_DWithin(geom, ?, threshold).
Diagnosis:
# Log the raw SQL generated by SQLAlchemy
from sqlalchemy import event
from sqlalchemy.engine import Engine
import logging
logging.basicConfig()
logging.getLogger("sqlalchemy.engine").setLevel(logging.INFO)Fix: Replace ST_Distance comparisons with ST_DWithin, which is explicitly index-aware and maps to the <-> operator path in the planner. Review ST_DWithin radius searches for the canonical implementation. Similarly, check that your ORM or query builder does not wrap predicates in COALESCE() or CASE expressions that prevent the planner from recognising the indexable operator.
SRID Mismatch Causing Implicit Cast Overhead
Symptom: The plan includes a ST_Transform call that was not in your original SQL, adding significant execution time.
Diagnosis:
SELECT ST_SRID(geom), COUNT(*)
FROM public.parcels
GROUP BY ST_SRID(geom);If geometries carry mixed SRIDs (e.g., 4326 and 32618), PostGIS inserts implicit transforms. Standardise to a single SRID at load time:
-- Reproject all geometries to SRID 4326 in place
UPDATE public.parcels
SET geom = ST_Transform(geom, 4326)
WHERE ST_SRID(geom) <> 4326;
-- Update the geometry column type constraint
ALTER TABLE public.parcels
ALTER COLUMN geom TYPE geometry(Geometry, 4326)
USING ST_SetSRID(geom, 4326);Verification
After applying fixes, run the full analysis pipeline and confirm all assertions pass:
import psycopg2
CONN = "host=localhost dbname=gisdb user=gis password=secret"
QUERY = """
SELECT p.parcel_id, z.zone_code
FROM public.parcels p
JOIN public.zoning z
ON p.geom && z.geom
AND ST_Intersects(p.geom, z.geom)
WHERE p.status = %s
AND z.zone_type = %s
"""
nodes = analyse_spatial_query(CONN, QUERY, ("active", "RESIDENTIAL"))
# 1. No sequential scans
seq_scans = [n for n in nodes if n["node_type"] == "Seq Scan"]
assert not seq_scans, f"Unexpected Seq Scan nodes: {seq_scans}"
# 2. Expected GiST index is in use
assert_index_used(nodes, "idx_parcels_geom")
# 3. Row estimate quality: actual/planned ratio within 5×
for n in nodes:
if n["plan_rows"] and n["actual_rows"] is not None:
ratio = n["actual_rows"] / max(n["plan_rows"], 1)
assert 0.2 <= ratio <= 5.0, (
f"Row estimate off at '{n['path']}': "
f"planned={n['plan_rows']}, actual={n['actual_rows']}"
)
# 4. Total execution time under threshold
root = nodes[0]
assert root["actual_time_ms"] < 50, (
f"Query too slow: {root['actual_time_ms']:.1f} ms (budget: 50 ms)"
)
print("All spatial plan assertions passed.")For deeper inspection of how join node ordering affects recheck cost in multi-table spatial queries, see Reading EXPLAIN ANALYZE Output for Spatial Joins.
Frequently Asked Questions
Why does PostgreSQL choose a sequential scan even when a GiST index exists?
The planner compares estimated cost of an index scan against a sequential scan. If table statistics are stale or row estimates are wildly off, the planner may incorrectly believe a full scan is cheaper. Run ANALYZE on the table and, if needed, lower random_page_cost to reflect SSD storage. You can also temporarily set SET enable_seqscan = off; in a development session to force the index plan and compare actual execution times directly.
What does "Rows Removed by Index Recheck" mean in a PostGIS plan?
GiST indexes operate on bounding boxes. After the index returns candidate rows, PostgreSQL verifies each one against the exact geometry predicate. A high Recheck count means many bounding boxes overlapped without the geometries actually satisfying the predicate. Consider tighter spatial predicates, spatial clustering (CLUSTER), or partial GiST indexes to reduce the candidate set.
When should I use EXPLAIN versus EXPLAIN ANALYZE?
EXPLAIN shows the estimated plan without executing the query — use it for quick plan inspection and to check which join strategy the planner intends to use. EXPLAIN ANALYZE BUFFERS runs the query in full and reports real timing, actual row counts, and buffer statistics alongside the estimates. Use the latter on a staging dataset that mirrors production data volume; empty or synthetic tables produce misleading cost estimates.
Scaling to CI/CD and Observability Pipelines
Programmatic EXPLAIN analysis becomes most valuable when integrated into automated workflows:
-
Regression testing in CI: Run
EXPLAIN (ANALYZE, FORMAT JSON)against a staging snapshot during deployment. Fail the build if any plan node showsSeq Scanon tables exceeding a row threshold, or if total execution time regresses by more than 20%. -
Plan caching and diffing: Store hashed plan signatures alongside schema migration records. A change in plan structure (e.g., switching from
Index ScantoBitmap Index Scan) after a minor PostgreSQL upgrade can be detected automatically. -
Statistics scheduling: Use
pg_cronto triggerANALYZEon geometry-heavy tables after bulk spatial data loads, ensuring the planner never acts on significantly stale histograms. -
ORM audit pipeline: Periodically extract raw SQL from ORM query logs and pass them through the analysis pipeline. ORMs that generate
ST_Distancecomparisons, implicit casts, or unboundedSELECT *projections can silently degrade spatial query performance — thepg_stat_statementsview surfaces these by total execution cost. Pair this with composite spatial indexes to handle the mixed spatial and non-spatial filter patterns that ORMs commonly emit.
By treating execution plans as first-class telemetry rather than one-off debugging output, platform teams can enforce spatial performance SLAs, reduce infrastructure costs, and eliminate guesswork during incident response.
Related Topics
- Advanced GiST Indexing & Optimization — parent section covering the full GiST index lifecycle
- Partial GiST Indexes — reduce recheck overhead by restricting index entries to a deterministic row subset
- Composite Spatial Indexes — combine spatial and attribute filters into a single index scan
- Index-Only Scan Strategies — eliminate heap fetches for point-data queries
- Reading EXPLAIN ANALYZE Output for Spatial Joins — detailed walkthrough of join node ordering and recheck cost in multi-table plans