Radius-based proximity queries are a core capability in Mastering Core Spatial Query Patterns, powering store locators, geofencing alerts, asset tracking, and service-area lookups. ST_DWithin is PostGIS’s dedicated predicate for this work: rather than computing a distance for every row and filtering afterwards, it cooperates with the GiST spatial index to prune the search space before the exact distance calculation runs. This two-phase filtering is what separates a millisecond lookup from a multi-second sequential scan.

The diagram below shows the data flow from Python through the index and back.

ST_DWithin Query Flow A flowchart showing how a Python psycopg call passes lon, lat, and radius_m parameters to PostgreSQL, which uses the GiST index to narrow bounding-box candidates, then applies the exact ST_DWithin distance predicate, and returns matching rows to Python. Python psycopg / pool GiST Index bounding-box prune ST_DWithin exact distance check Result rows dict / model lon,lat,r candidates matches skips rows outside bounding box

Prerequisites and Infrastructure Validation

Before writing a single line of query code, confirm that the runtime environment is correctly configured.

Required extensions

sql
-- Confirm PostGIS is active in the target database
SELECT extname, extversion
FROM pg_extension
WHERE extname IN ('postgis', 'postgis_topology');

Both rows should be present. If postgis is missing, run CREATE EXTENSION postgis; as a superuser.

Python packages

psycopg>=3.1          # async-capable; use psycopg2-binary if you need the legacy API
sqlalchemy>=2.0       # optional ORM layer
geoalchemy2>=0.14     # geometry column mapping for SQLAlchemy — see the GeoAlchemy2 integration guide
pyproj>=3.5           # optional: client-side coordinate transforms

Confirm your column’s SRID

sql
SELECT f_table_name, f_geometry_column, srid, type
FROM geometry_columns
WHERE f_table_name = 'locations';

Note the srid value. If it is 4326 and the type is geometry (not geography), the distance parameter to ST_DWithin is in degrees unless you cast to geography — a critical distinction covered in Step 1.

Verify the spatial index exists

sql
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'locations'
  AND indexdef ILIKE '%gist%';

If no row is returned, create the index before proceeding (Step 2).

Step 1: Validate SRID Alignment and Distance Units

Distance semantics depend entirely on the column type and SRID.

Column type Distance unit Suited for
geometry (SRID 4326) degrees Never use for real-world radius
geometry (projected, e.g. EPSG:32618) metres Single country / metro area
geography (SRID 4326) metres (spheroidal) Global datasets

For data stored as geometry(Point, 4326), cast to geography at query time so the radius is interpreted in metres:

sql
-- Correct: geography cast makes the radius parameter metres
WHERE ST_DWithin(
    geom::geography,
    ST_SetSRID(ST_MakePoint($1, $2), 4326)::geography,
    $3          -- metres
)

Casting adds a small per-row overhead for the spheroidal calculation. If your data is confined to a single metropolitan area, storing coordinates in a local UTM projection (e.g. EPSG:32618 for New York) and querying with plain geometry can be 2–4× faster with negligible accuracy loss. Use pyproj to transform the incoming lat/lon to the projected CRS before the query:

python
from pyproj import Transformer

# Transform WGS 84 → UTM zone 18N once, outside the hot path
_to_utm = Transformer.from_crs("EPSG:4326", "EPSG:32618", always_xy=True)

def to_utm(lon: float, lat: float) -> tuple[float, float]:
    return _to_utm.transform(lon, lat)

Mismatched SRIDs between the column and the query point either raise ERROR: Operation on mixed SRID geometries or silently return wrong results if PostGIS cannot detect the mismatch. Always call ST_SetSRID on literal points.

Step 2: Provision and Maintain the GiST Index

Without a spatial index, ST_DWithin degrades to a full table scan. Create the index immediately after initial data load:

sql
-- Standard GiST index on geometry column
CREATE INDEX idx_locations_geom ON locations USING GIST (geom);

-- For geography columns use the same syntax; PostGIS selects the geography operator class automatically
CREATE INDEX idx_locations_geom_geo ON locations USING GIST (geom::geography);

-- Refresh planner statistics immediately after bulk loads
ANALYZE locations;

For partial GiST indexes — e.g. indexing only active = true rows — add a WHERE clause to the CREATE INDEX statement. Partial indexes are smaller and faster to scan when the query predicate matches.

For tables with concurrent high-write traffic, use CREATE INDEX CONCURRENTLY to avoid locking the table. Monitor bloat via pg_stat_user_indexes:

sql
SELECT indexrelname,
       idx_scan,
       idx_tup_read,
       pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE relname = 'locations';

A high ratio of idx_tup_read to idx_scan with a large index_size is a signal of bloat. Rebuild with REINDEX INDEX CONCURRENTLY idx_locations_geom during a maintenance window, or use pg_repack for zero-downtime rebuilds.

Step 3: Construct Parameterised Queries

Hard-coded coordinate literals prevent plan reuse and open SQL-injection vectors. Always use %s placeholders with psycopg or bound parameters with SQLAlchemy.

psycopg3 (sync and async)

python
import psycopg
from psycopg.rows import dict_row

def find_nearby(
    conn: psycopg.Connection,
    lon: float,
    lat: float,
    radius_m: float,
    limit: int = 50,
) -> list[dict]:
    """Return up to `limit` locations within `radius_m` metres of (lon, lat).

    The geography cast ensures the radius is interpreted in metres
    regardless of the underlying geometry SRID.
    """
    query = """
        SELECT
            id,
            name,
            ST_AsGeoJSON(geom)::json AS geojson,
            ST_Distance(geom::geography,
                        ST_SetSRID(ST_MakePoint(%s, %s), 4326)::geography) AS distance_m
        FROM locations
        WHERE ST_DWithin(
            geom::geography,
            ST_SetSRID(ST_MakePoint(%s, %s), 4326)::geography,
            %s
        )
        ORDER BY distance_m
        LIMIT %s;
    """
    with conn.cursor(row_factory=dict_row) as cur:
        cur.execute(query, (lon, lat, lon, lat, radius_m, limit))
        return cur.fetchall()

Note that lon and lat appear twice in the parameter tuple — once for ST_Distance and once for ST_DWithin. If you need to avoid repeating them, use a CTE:

sql
WITH origin AS (
    SELECT ST_SetSRID(ST_MakePoint(%s, %s), 4326)::geography AS pt
)
SELECT id, name,
       ST_Distance(geom::geography, origin.pt) AS distance_m
FROM locations, origin
WHERE ST_DWithin(geom::geography, origin.pt, %s)
ORDER BY distance_m
LIMIT %s;

GeoAlchemy2 / SQLAlchemy 2

For ORM-mapped models, use GeoAlchemy2 column mapping to avoid raw WKB handling:

python
from sqlalchemy import select, func
from sqlalchemy.orm import Session
from geoalchemy2 import Geography
from geoalchemy2.functions import ST_DWithin, ST_Distance, ST_MakePoint, ST_SetSRID

def find_nearby_orm(
    session: Session,
    lon: float,
    lat: float,
    radius_m: float,
    limit: int = 50,
) -> list:
    origin = func.ST_SetSRID(func.ST_MakePoint(lon, lat), 4326).cast(Geography)
    stmt = (
        select(Location)
        .where(ST_DWithin(Location.geom.cast(Geography), origin, radius_m))
        .order_by(ST_Distance(Location.geom.cast(Geography), origin))
        .limit(limit)
    )
    return session.scalars(stmt).all()

Step 4: Handle Results and Connection Management

Connection pooling

Open one connection per query in a web server context and you will saturate the database within seconds. Use a pool:

python
from psycopg_pool import ConnectionPool

pool = ConnectionPool(
    conninfo="postgresql://user:pass@localhost/gisdb",
    min_size=2,
    max_size=10,
    kwargs={"row_factory": dict_row},
)

def find_nearby_pooled(lon: float, lat: float, radius_m: float) -> list[dict]:
    with pool.connection() as conn:
        return find_nearby(conn, lon, lat, radius_m)

Server-side cursors for large result sets

A 5 km radius in a dense city can return tens of thousands of rows. Fetching all into memory with fetchall() causes OOM errors under concurrent load. Stream with a named cursor instead:

python
def stream_nearby(conn, lon, lat, radius_m, batch_size=500):
    query = """
        SELECT id, name, geom
        FROM locations
        WHERE ST_DWithin(
            geom::geography,
            ST_SetSRID(ST_MakePoint(%s, %s), 4326)::geography,
            %s
        )
        ORDER BY geom::geography <-> ST_SetSRID(ST_MakePoint(%s, %s), 4326)::geography;
    """
    with conn.cursor(name="nearby_cursor", row_factory=dict_row) as cur:
        cur.itersize = batch_size
        cur.execute(query, (lon, lat, radius_m, lon, lat))
        for row in cur:
            yield row

The KNN operator <-> in the ORDER BY uses the index for distance sorting without computing full distances for all candidates — a pattern explored in depth on the KNN nearest-neighbour queries page.

Performance Considerations

Reading EXPLAIN (ANALYZE, BUFFERS) output

Always validate a new query on representative data before deploying. The plan below is from a 2-million-row locations table with a GiST index:

sql
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, name
FROM locations
WHERE ST_DWithin(
    geom::geography,
    ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography,
    5000
);

Expected plan — index path:

Index Scan using idx_locations_geom on locations
  (cost=0.41..312.75 rows=87 width=52)
  (actual time=0.412..3.841 rows=94 loops=1)
  Index Cond: (geom && ...)
  Filter: ST_DWithin(geom::geography, ..., 5000)
  Buffers: shared hit=23 read=8
Planning Time: 0.382 ms
Execution Time: 3.921 ms

Key signals to look for:

  • Index Scan using idx_locations_geom — the index is active.
  • Index Cond: (geom && ...) — the bounding-box filter (&&) ran inside the index.
  • Filter: line after Index Cond — the exact ST_DWithin distance predicate ran on the candidates the index returned.
  • Low Buffers: shared read count — few pages loaded from disk.

For a detailed walkthrough of interpreting spatial EXPLAIN output, see Query Plan Analysis with EXPLAIN.

Relevant GUC settings

Parameter Recommended starting value Effect
random_page_cost 1.1 (SSD) Lowers the planner’s index-access cost estimate; favours index scans
effective_cache_size 50–75% of RAM Tells the planner how much OS page cache is available
work_mem 16MB64MB Affects sort memory for ORDER BY distance_m; set per session to avoid global impact
sql
-- Per-session tuning (safe; reverts at session end)
SET work_mem = '32MB';
SET random_page_cost = 1.1;

Common Failure Modes and Fixes

SRID mismatch

Symptom: ERROR: Operation on mixed SRID geometries or results that are obviously wrong (empty set, or the entire table returned).

Diagnosis:

sql
SELECT ST_SRID(geom) FROM locations LIMIT 5;

Fix: Wrap the query point in ST_SetSRID(ST_MakePoint($1, $2), <target_srid>) matching the column’s SRID. For cross-CRS queries, use ST_Transform to reproject the point before comparison.

Missing or invalid index

Symptom: Seq Scan on locations in the EXPLAIN output; query time scales linearly with table size.

Diagnosis:

sql
SELECT indexname FROM pg_indexes
WHERE tablename = 'locations' AND indexdef ILIKE '%gist%';

Fix: Create the index (CREATE INDEX CONCURRENTLY ...) and run ANALYZE locations;. The planner cannot use an index it does not know about.

Planner chooses sequential scan despite a valid index

Symptom: Index exists and statistics are current, but EXPLAIN still shows Seq Scan.

Cause: The search radius covers more than roughly 30% of the table’s spatial extent. The planner estimates a sequential scan is cheaper than random index lookups for a large fraction of the table.

Diagnosis:

sql
SELECT reltuples::bigint AS estimated_rows,
       (SELECT count(*) FROM locations
        WHERE ST_DWithin(geom::geography,
                         ST_SetSRID(ST_MakePoint(-122.4, 37.7), 4326)::geography,
                         50000)) AS actual_matches;

If actual_matches / estimated_rows > 0.3, the planner is right. Solutions: narrow the radius, add a secondary non-spatial filter (e.g. AND category = 'restaurant'), or use a partial GiST index to reduce the indexed row count.

OOM on large result sets

Symptom: Python process memory spikes; MemoryError under concurrent load.

Fix: Replace fetchall() with a named server-side cursor (see Step 4) and process rows in batches. Also add an explicit LIMIT to the query when the use case allows it.

Stale planner statistics after bulk load

Symptom: The planner significantly over- or under-estimates row counts; plan degrades after a large import.

Fix:

sql
ANALYZE locations;
-- For heavily bloated tables, run VACUUM first
VACUUM ANALYZE locations;

Autoanalyze triggers at autovacuum_analyze_scale_factor (default 20%) of changed rows. After bulk imports that exceed this threshold in a single transaction, run ANALYZE manually.

Verification

After deploying a change to the query or index, confirm correctness and performance with this checklist:

sql
-- 1. Confirm index is used
EXPLAIN (ANALYZE, BUFFERS)
SELECT id FROM locations
WHERE ST_DWithin(
    geom::geography,
    ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography,
    1000
);
-- Expected: "Index Scan using idx_..." — not "Seq Scan"

-- 2. Cross-check row count against ST_Distance
SELECT count(*) FROM locations
WHERE ST_Distance(
    geom::geography,
    ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography
) <= 1000;
-- Must match ST_DWithin result count exactly

-- 3. Confirm distance units are metres
SELECT ST_Distance(
    ST_SetSRID(ST_MakePoint(-122.4194, 37.7749), 4326)::geography,
    ST_SetSRID(ST_MakePoint(-122.4094, 37.7749), 4326)::geography
) AS dist_metres;
-- Should return ~870 metres for this ~0.01° longitude offset near San Francisco

In Python, a lightweight integration test:

python
def test_find_nearby_returns_metre_scale_results(conn):
    # San Francisco City Hall → Civic Center (~400 m apart)
    results = find_nearby(conn, lon=-122.4194, lat=37.7749, radius_m=500)
    assert len(results) > 0, "Expected at least one result within 500 m"
    for row in results:
        assert row["distance_m"] <= 500, f"Row {row['id']} outside radius"

Frequently Asked Questions

Why does ST_DWithin with my geometry column return wrong distances?

When the column type is geometry with SRID 4326, the distance parameter is in degrees, not metres. A value of 500 means 500 degrees — effectively the entire planet. Cast to geography (geom::geography) or store data in a metre-based projected CRS so the radius is meaningful.

How do I confirm ST_DWithin is using the spatial index?

Run EXPLAIN (ANALYZE, BUFFERS) and look for Index Scan using idx_locations_geom. A Seq Scan means either the index is missing, statistics are stale, or the radius is so large that the planner prefers a full scan.

When should I use geography vs a projected geometry CRS?

Use geography (or cast at query time) for global datasets or whenever sub-metre accuracy across large extents matters. For data confined to a single city or region, projecting to a local UTM zone and using geometry is 2–4× faster with negligible accuracy loss. Benchmark both on your actual data distribution before committing.

Can I combine ST_DWithin with other filters?

Yes, and you should. Adding a non-spatial equality or range filter (e.g. AND category = 'restaurant' AND active = true) before ST_DWithin in the WHERE clause gives the planner more options and can reduce the index scan’s candidate set. A partial GiST index on a frequently filtered subset is often the most effective combined optimisation.