Query Plan Analysis with EXPLAIN

Q: When should I use EXPLAIN versus EXPLAIN ANALYZE?

EXPLAIN shows the estimated plan without executing the query. EXPLAIN ANALYZE actually runs the query and reports real timing and row counts alongside estimates. Use EXPLAIN for quick plan inspection and EXPLAIN ANALYZE BUFFERS on a staging dataset to measure real performance and buffer utilization.

Spatial workloads introduce execution characteristics that standard relational queries rarely surface. When PostGIS evaluates geometric predicates, the query planner must weigh bounding box intersections against exact geometry recheck overhead, balance index selectivity against sequential I/O cost, and adjust for statistics that age rapidly under bulk spatial loads. This page, part of the Advanced GiST Indexing & Optimization section, establishes a repeatable workflow for capturing, parsing, and acting on execution plans in PostGIS environments — treating plan output as structured telemetry rather than opaque terminal text.

Without inspecting the actual execution tree, engineers frequently misattribute latency to network overhead or Python serialisation when the real bottleneck is a missing GiST index, stale planner statistics, or an ORM that silently bypasses indexable operators. The techniques here give you the deterministic visibility needed to validate index utilisation, enforce performance SLAs in CI/CD pipelines, and resolve spatial query regressions before they reach production.

Prerequisites and Infrastructure Validation

Before capturing execution plans programmatically, confirm these baseline requirements.

Required extensions and versions:

PostgreSQL 14+ with PostGIS 3.2+. Later versions include improved planner cost models for spatial operators and more accurate selectivity estimates for geometry columns.
pg_stat_statements enabled (optional but strongly recommended). It surfaces baseline query frequency and total execution time, letting you triage which spatial queries most deserve targeted EXPLAIN runs.

Verify extensions:

sql

SELECT name, default_version, installed_version
FROM pg_available_extensions
WHERE name IN ('postgis', 'pg_stat_statements')
ORDER BY name;

Python packages:

psycopg2-binary>=2.9     # or psycopg>=3.1 for async workloads

Binary drivers are strongly preferred for spatial workloads — they avoid serialisation overhead when handling large coordinate arrays returned from ST_AsText() or ST_AsBinary().

Verify GiST index existence before analysis:

sql

-- Check that spatial indexes exist on the target table before running EXPLAIN
SELECT
    indexname,
    indexdef,
    pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE tablename = 'parcels'
  AND indexdef ILIKE '%gist%'
ORDER BY indexname;

If no GiST index is listed, the planner has nothing to consider for spatial acceleration — sequential scans are guaranteed. Create the index before proceeding:

sql

CREATE INDEX CONCURRENTLY idx_parcels_geom
ON public.parcels
USING GIST (geom);

Core Execution Workflow

Step 1: Capture a Baseline Plan

Start with EXPLAIN (FORMAT JSON) against your target query. JSON output eliminates formatting inconsistencies across PostgreSQL versions and is directly consumable by Python’s json module without text parsing.

sql

EXPLAIN (FORMAT JSON)
SELECT p.parcel_id, z.zone_code
FROM public.parcels p
JOIN public.zoning z
  ON ST_Intersects(p.geom, z.geom)
WHERE p.status = 'active'
  AND z.zone_type = 'RESIDENTIAL';

This returns the estimated plan — no query execution occurs. Use it to confirm that the planner intends to use a GiST index before spending time on ANALYZE runs.

Step 2: Capture Real Execution Metrics

Run EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) to force full execution and capture real timing, actual row counts, and buffer hit/miss ratios.

python

import json
import psycopg2
from psycopg2.extras import RealDictCursor

SPATIAL_QUERY = """
    SELECT p.parcel_id, z.zone_code
    FROM public.parcels p
    JOIN public.zoning z
      ON p.geom && z.geom
     AND ST_Intersects(p.geom, z.geom)
    WHERE p.status = %s
      AND z.zone_type = %s
"""

def capture_explain(conn_str: str, query: str, params: tuple) -> dict:
    """
    Runs EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) and returns the parsed plan.
    Always run on a representative dataset — empty tables produce misleading cost estimates.
    """
    explain_sql = f"EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) {query}"
    with psycopg2.connect(conn_str) as conn:
        with conn.cursor(cursor_factory=RealDictCursor) as cur:
            cur.execute(explain_sql, params)
            raw = cur.fetchone()
    if not raw or "QUERY PLAN" not in raw:
        raise ValueError("Unexpected EXPLAIN output structure")
    # PostgreSQL returns the plan inside a list
    return raw["QUERY PLAN"][0]

Step 3: Parse the Plan Tree in Python

EXPLAIN JSON is a recursive tree of plan nodes. A recursive parser lets you flatten it into a list of metrics regardless of join depth.

python

def extract_nodes(node: dict, path: str = "root") -> list[dict]:
    """
    Recursively extracts execution metrics from every node in a PostgreSQL
    EXPLAIN ANALYZE JSON plan, preserving the node path for traceability.
    """
    node_type = node.get("Node Type", "Unknown")
    record = {
        "path":                    path,
        "node_type":               node_type,
        "startup_cost":            node.get("Startup Cost"),
        "total_cost":              node.get("Total Cost"),
        "plan_rows":               node.get("Plan Rows"),
        "actual_rows":             node.get("Actual Rows"),
        "actual_time_ms":          node.get("Actual Total Time"),
        "loops":                   node.get("Loops", 1),
        "index_name":              node.get("Index Name"),
        "rows_removed_by_filter":  node.get("Rows Removed by Filter", 0),
        "rows_removed_by_recheck": node.get("Rows Removed by Index Recheck", 0),
        "shared_hit_blocks":       node.get("Shared Hit Blocks", 0),
        "shared_read_blocks":      node.get("Shared Read Blocks", 0),
    }
    results = [record]
    for child in node.get("Plans", []):
        child_path = f"{path} → {child.get('Node Type', '?')}"
        results.extend(extract_nodes(child, child_path))
    return results


def analyse_spatial_query(conn_str: str, query: str, params: tuple) -> list[dict]:
    plan = capture_explain(conn_str, query, params)
    return extract_nodes(plan["Plan"])

Step 4: Identify Index Usage and Bottlenecks

With node metrics in hand, apply targeted heuristics for spatial workloads:

python

def audit_plan(nodes: list[dict]) -> None:
    """
    Prints actionable warnings for common spatial plan anti-patterns.
    """
    for n in nodes:
        node_type = n["node_type"]
        path      = n["path"]

        # Sequential scan on what may be a large table
        if node_type == "Seq Scan":
            print(f"[WARN] Seq Scan at '{path}' — verify GiST index exists "
                  "and that random_page_cost reflects your storage medium.")

        # Actual row count deviates significantly from planner estimate
        if n["plan_rows"] and n["actual_rows"] is not None:
            ratio = n["actual_rows"] / max(n["plan_rows"], 1)
            if ratio > 10 or ratio < 0.1:
                print(f"[WARN] Row estimate mismatch at '{path}': "
                      f"planned={n['plan_rows']}, actual={n['actual_rows']}. "
                      "Run ANALYZE to refresh statistics.")

        # High recheck count signals poor bounding-box selectivity
        if n["rows_removed_by_recheck"] > 500:
            print(f"[WARN] High recheck at '{path}': "
                  f"{n['rows_removed_by_recheck']} rows removed after index scan. "
                  "Consider a partial GiST index or tighter spatial predicate.")

        # Expensive filter outside the index
        if n["rows_removed_by_filter"] > 1000:
            print(f"[WARN] High filter removal at '{path}': "
                  f"{n['rows_removed_by_filter']} rows discarded post-scan. "
                  "A composite or partial index may help.")

Step 5: Verify the Corrected Plan

After applying index adjustments or statistics refreshes, re-run the analysis and assert that the plan now uses a GiST index and that row estimates have improved:

python

def assert_index_used(nodes: list[dict], expected_index: str) -> bool:
    """
    Confirms that the expected GiST index appears in at least one plan node.
    Raises AssertionError if the planner fell back to a sequential scan instead.
    """
    used = [n["index_name"] for n in nodes if n["index_name"]]
    assert expected_index in used, (
        f"Index '{expected_index}' not used. "
        f"Indexes found: {used or 'none'}. "
        "Run ANALYZE and verify the query predicate matches the index definition."
    )
    return True

Performance Considerations

Reading EXPLAIN Output: Index Scan vs Sequential Scan

A well-optimised spatial query plan looks like this (abbreviated text format for illustration):

Nested Loop  (cost=0.28..312.44 rows=12 width=40)
             (actual time=0.241..4.312 rows=9 loops=1)
  ->  Index Scan using idx_zoning_geom on zoning z
        Index Cond: (geom && p.geom)
        Filter: ((zone_type)::text = 'RESIDENTIAL'::text)
  ->  Index Scan using idx_parcels_geom on parcels p
        Index Cond: (geom && z.geom)
        Filter: (status = 'active'::text)
        Rows Removed by Filter: 2
Planning Time: 1.2 ms
Execution Time: 4.7 ms

Key indicators:

Index Scan using idx_parcels_geom — GiST index is active.
Rows Removed by Index Recheck: 0 — bounding box filter is selective enough that no false positives escaped to exact predicate evaluation.
actual rows=9 / planned rows=12 — planner estimate is within an acceptable range; statistics are current.

A degraded plan shows Seq Scan instead of Index Scan, a high Rows Removed by Index Recheck count, or a large gap between planned and actual rows.

Relevant GUC Settings

Setting	Relevance	Typical adjustment
`random_page_cost`	Controls planner’s cost of a random disk page. Lower values favour index scans.	`1.1` on SSD, `4.0` (default) on HDD
`effective_cache_size`	Tells the planner how much OS page cache is available. Larger values make index scans more attractive.	Set to ~75% of total RAM
`default_statistics_target`	Controls histogram bucket count for `ANALYZE`. Geometry columns benefit from higher values.	`200–500` for spatial columns
`work_mem`	Memory available for sort and hash operations within a spatial join.	`64MB–256MB` per connection for join-heavy workloads

Set statistics target per column rather than globally to avoid inflating planning time for simple queries:

sql

ALTER TABLE public.parcels
  ALTER COLUMN geom SET STATISTICS 400;

ANALYZE public.parcels;

Common Failure Modes and Fixes

Stale Statistics Causing Row Estimate Skew

Symptom: plan_rows is orders of magnitude off from actual_rows. The planner may choose a hash join where a nested loop is faster, or vice versa.

Diagnosis:

sql

SELECT relname, n_live_tup, n_dead_tup, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE relname = 'parcels';

Fix:

sql

ANALYZE public.parcels;
-- Or increase autovacuum sensitivity for high-churn tables:
ALTER TABLE public.parcels SET (autovacuum_analyze_scale_factor = 0.02);

Planner Choosing Sequential Scan Despite GiST Index

Symptom: Seq Scan appears in the plan even though \d parcels shows the GiST index exists.

Diagnosis:

sql

-- Force index use to compare estimated costs (development only)
SET enable_seqscan = off;
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT ... FROM public.parcels WHERE geom && ST_MakeEnvelope(-74.1, 40.6, -73.9, 40.8, 4326);
RESET enable_seqscan;

If the forced-index plan is dramatically faster, the statistics are misleading the planner. Run ANALYZE and lower random_page_cost.

Common causes and fixes:

Low selectivity query: The planner correctly estimates that more than ~15–20% of the table will match, making sequential I/O cheaper. Narrow the spatial predicate or add a non-spatial filter to reduce the estimated row count.
random_page_cost too high for SSD: Set random_page_cost = 1.1 in postgresql.conf or per-session.
Index bloat after heavy DML: Run REINDEX INDEX CONCURRENTLY idx_parcels_geom to rebuild the index without a table lock.

High Recheck Overhead on GiST Bitmap Scans

Symptom: Rows Removed by Index Recheck is large relative to actual_rows. The plan shows Bitmap Index Scan on the GiST index followed by a Recheck Cond in a Bitmap Heap Scan.

What is happening: GiST indexes store bounding boxes, not exact geometries. When the planner uses a Bitmap Index Scan (common for larger result sets), it marks pages containing candidate rows. During the Bitmap Heap Scan, each candidate must be rechecked against the exact geometry predicate. Densely packed or irregularly shaped geometries cause many false positives.

Fix:

sql

-- Use a tighter bounding box pre-filter before the exact predicate
SELECT p.parcel_id, z.zone_code
FROM public.parcels p
JOIN public.zoning z
  ON p.geom && z.geom              -- bounding box filter (uses GiST index)
 AND ST_Intersects(p.geom, z.geom) -- exact predicate (recheck)
WHERE p.status = 'active';

For workloads where recheck overhead consistently dominates, partial GiST indexes that pre-filter on the status column reduce the candidate set before the bounding box stage.

ORM-Generated SQL Bypassing Indexable Operators

Symptom: Your Python ORM generates a query that looks correct, but EXPLAIN shows a sequential scan. The SQL contains ST_Distance(geom, ?) < threshold instead of ST_DWithin(geom, ?, threshold).

Diagnosis:

python

# Log the raw SQL generated by SQLAlchemy
from sqlalchemy import event
from sqlalchemy.engine import Engine
import logging

logging.basicConfig()
logging.getLogger("sqlalchemy.engine").setLevel(logging.INFO)

Fix: Replace ST_Distance comparisons with ST_DWithin, which is explicitly index-aware and maps to the <-> operator path in the planner. Review ST_DWithin radius searches for the canonical implementation. Similarly, check that your ORM or query builder does not wrap predicates in COALESCE() or CASE expressions that prevent the planner from recognising the indexable operator.

SRID Mismatch Causing Implicit Cast Overhead

Symptom: The plan includes a ST_Transform call that was not in your original SQL, adding significant execution time.

Diagnosis:

sql

SELECT ST_SRID(geom), COUNT(*)
FROM public.parcels
GROUP BY ST_SRID(geom);

If geometries carry mixed SRIDs (e.g., 4326 and 32618), PostGIS inserts implicit transforms. Standardise to a single SRID at load time:

sql

-- Reproject all geometries to SRID 4326 in place
UPDATE public.parcels
SET geom = ST_Transform(geom, 4326)
WHERE ST_SRID(geom) <> 4326;

-- Update the geometry column type constraint
ALTER TABLE public.parcels
  ALTER COLUMN geom TYPE geometry(Geometry, 4326)
  USING ST_SetSRID(geom, 4326);

Verification

After applying fixes, run the full analysis pipeline and confirm all assertions pass:

python

import psycopg2

CONN = "host=localhost dbname=gisdb user=gis password=secret"

QUERY = """
    SELECT p.parcel_id, z.zone_code
    FROM public.parcels p
    JOIN public.zoning z
      ON p.geom && z.geom
     AND ST_Intersects(p.geom, z.geom)
    WHERE p.status = %s
      AND z.zone_type = %s
"""

nodes = analyse_spatial_query(CONN, QUERY, ("active", "RESIDENTIAL"))

# 1. No sequential scans
seq_scans = [n for n in nodes if n["node_type"] == "Seq Scan"]
assert not seq_scans, f"Unexpected Seq Scan nodes: {seq_scans}"

# 2. Expected GiST index is in use
assert_index_used(nodes, "idx_parcels_geom")

# 3. Row estimate quality: actual/planned ratio within 5×
for n in nodes:
    if n["plan_rows"] and n["actual_rows"] is not None:
        ratio = n["actual_rows"] / max(n["plan_rows"], 1)
        assert 0.2 <= ratio <= 5.0, (
            f"Row estimate off at '{n['path']}': "
            f"planned={n['plan_rows']}, actual={n['actual_rows']}"
        )

# 4. Total execution time under threshold
root = nodes[0]
assert root["actual_time_ms"] < 50, (
    f"Query too slow: {root['actual_time_ms']:.1f} ms (budget: 50 ms)"
)
print("All spatial plan assertions passed.")

For deeper inspection of how join node ordering affects recheck cost in multi-table spatial queries, see Reading EXPLAIN ANALYZE Output for Spatial Joins.

Frequently Asked Questions

Why does PostgreSQL choose a sequential scan even when a GiST index exists?

The planner compares estimated cost of an index scan against a sequential scan. If table statistics are stale or row estimates are wildly off, the planner may incorrectly believe a full scan is cheaper. Run ANALYZE on the table and, if needed, lower random_page_cost to reflect SSD storage. You can also temporarily set SET enable_seqscan = off; in a development session to force the index plan and compare actual execution times directly.

What does "Rows Removed by Index Recheck" mean in a PostGIS plan?

GiST indexes operate on bounding boxes. After the index returns candidate rows, PostgreSQL verifies each one against the exact geometry predicate. A high Recheck count means many bounding boxes overlapped without the geometries actually satisfying the predicate. Consider tighter spatial predicates, spatial clustering (CLUSTER), or partial GiST indexes to reduce the candidate set.

When should I use EXPLAIN versus EXPLAIN ANALYZE?

EXPLAIN shows the estimated plan without executing the query — use it for quick plan inspection and to check which join strategy the planner intends to use. EXPLAIN ANALYZE BUFFERS runs the query in full and reports real timing, actual row counts, and buffer statistics alongside the estimates. Use the latter on a staging dataset that mirrors production data volume; empty or synthetic tables produce misleading cost estimates.

Scaling to CI/CD and Observability Pipelines

Programmatic EXPLAIN analysis becomes most valuable when integrated into automated workflows:

Regression testing in CI: Run EXPLAIN (ANALYZE, FORMAT JSON) against a staging snapshot during deployment. Fail the build if any plan node shows Seq Scan on tables exceeding a row threshold, or if total execution time regresses by more than 20%.
Plan caching and diffing: Store hashed plan signatures alongside schema migration records. A change in plan structure (e.g., switching from Index Scan to Bitmap Index Scan) after a minor PostgreSQL upgrade can be detected automatically.
Statistics scheduling: Use pg_cron to trigger ANALYZE on geometry-heavy tables after bulk spatial data loads, ensuring the planner never acts on significantly stale histograms.
ORM audit pipeline: Periodically extract raw SQL from ORM query logs and pass them through the analysis pipeline. ORMs that generate ST_Distance comparisons, implicit casts, or unbounded SELECT * projections can silently degrade spatial query performance — the pg_stat_statements view surfaces these by total execution cost. Pair this with composite spatial indexes to handle the mixed spatial and non-spatial filter patterns that ORMs commonly emit.

By treating execution plans as first-class telemetry rather than one-off debugging output, platform teams can enforce spatial performance SLAs, reduce infrastructure costs, and eliminate guesswork during incident response.

Advanced GiST Indexing & Optimization — parent section covering the full GiST index lifecycle
Partial GiST Indexes — reduce recheck overhead by restricting index entries to a deterministic row subset
Composite Spatial Indexes — combine spatial and attribute filters into a single index scan
Index-Only Scan Strategies — eliminate heap fetches for point-data queries
Reading EXPLAIN ANALYZE Output for Spatial Joins — detailed walkthrough of join node ordering and recheck cost in multi-table plans