Spatial joins with DuckDB
August 15, 2025
One of the interesting tasks I’ve had recently, was to assign an IATA location identifiers of the closest airports to the cities on the dosh.at website.
I’ve already had a table with cities, which was based on the database taken from https://simplemaps.com/data/world-cities. For the sake of this post we will download a database from their website:

Same story with airports, we can download the database from https://ourairports.com/data/ website (airports.csv).
Now the first nice thing about the DuckDB, is that it supports loading CSV files, so we can easily create tables from those files by running:
CREATE TABLE cities AS
FROM read_csv('~/Downloads/worldcities.csv');
CREATE TABLE airports AS
FROM read_csv('~/Downloads/airports.csv');Running those queries resulted in two tables, airports and cities:

Both tables contain geographical coordinates that we can use to perform geospatial joins. We will need to compute the distance between each city and airport to find the closest combinations. To compute a distance between 2 points, we can either compute an Euclidean distance manually, or even better option would be to use the spatialextension, provided by DuckDB:
INSTALL spatial;
LOAD spatial;Spatial extension provides ST_Distance function, allowing computing the distance between 2 points. We can then multiply it by (roughly) 111 to convert degrees to kilometers.
Here is the SQL-query:
SELECT
c.city,
c.country,
c.lat as city_lat,
c.lng as city_lng,
a.name as airport_name,
a.latitude_deg as airport_lat,
a.longitude_deg as airport_lng,
a.iata_code,
ST_Distance (ST_Point (city_lng, city_lat), ST_Point (airport_lng, airport_lat)) * 111 as distance_km
FROM
cities c
CROSS JOIN airports a
WHERE
c.capital = 'primary'
AND a.iata_code IS NOT NULL
QUALIFY
ROW_NUMBER() OVER (
PARTITION BY
city
ORDER BY
ST_Distance (ST_Point (city_lng, city_lat), ST_Point (airport_lng, airport_lat))
) = 1
ORDER BY city;What happens under the hood:
ST_Distanceis used to compute the distances between 2 points- We use the technique called CROSS JOIN — it joins each row of the source table with the each row in the destination table, producing
m*ncombinations. - Then QUALIFY is used to reduce those combinations. We use
cityas a partitioning key, and distance between cities and airports as ORDER BY clause, taking the first row from each partition.
Here is the results of the query:

Do you like how simple this complex task turned out to be?
That is for me a sign of a good technology, to make seemingly hard tasks simple. Which DuckDB definitely is. Thank you for reading, please write in the comments what improvements you’d also make in the flow.
P.S.: Installing and running DuckDB is as easy as executing:
> brew install duckdb
> duckdb -ui