The Downside:
I wanted to generate a transect each 10 toes throughout each avenue in New York Metropolis, and I needed to do it in a New York Minute. The Generate Transects Alongside Strains software constructed into ArcGIS Professional was going to take a number of days to run, so I needed to provide you with one thing at the very least barely sooner.
To resolve this drawback, I turned to 2 of my favourite bears, Geopandas and Polars, to make use of spatial algorithms and a touch of linear algebra to make 4.2 million transects in round 60 seconds.
Why Transects? On this circumstance, I’d go on to make use of these transects to separate up roadway traces into equal intervals for storm injury evaluation. It’s simpler to separate a line with one other line than it’s to separate a line with a degree, since you’ll be able to assure they are going to intersect and also you don’t have to fret about floating level imprecision. However, transects are super-useful for plenty of software, equivalent to sampling river depth profiles and creating an everyday grid from a curvilinear line.
The Course of:
First, I generated factors alongside every line at 10 ft intervals utilizing Geopandas built-in interpolate operate. As soon as I had these factors, I swapped them over to Polars, the place I used to be in a position to make the most of Polars’ built-in parallelism and intuitive syntax to shortly convert these factors into (roughly) perpendicular transects. Lastly, I put the transects right into a GeoDataFrame for additional spatial enjoyable.
It could sound annoying to trip between Geopandas and Polars, however every has its personal benefits — GeoPandas has spatial operations, and Polars is depraved quick and I strongly choose the syntax. The Polars .to_pandas() and .from_pandas() strategies work so effectively that I barely discover the swap. Whereas placing a panda bear and a polar bear into the identical zoo enclosure could have disasterous penalties, on this circumstance they’re able to play collectively fairly properly.
Necessities: This code was developed in a mamba surroundings with the next packages and has not been examined with different variations
geopandas==1.0.1
pandas==2.2.3
polars==1.18.0
Shapely==2.0.6
The information used on this article comes from NYC Open Knowledge, and is on the market for obtain here. On this instance, I’m utilizing the December 24, 2024 launch of the NYC Road Centerlines.
Step 1: Generate factors alongside traces each 10 toes with Geopandas
As a way to create transects alongside every line, we first want to seek out the centerpoint of every transect. Since we wish a transect each 10 ft, we are going to seize a degree each 10 ft alongside each linestring within the dataset. We’re additionally seize a further level barely offset from every transect level; it will enable us to findthe orientation wanted to make the transect perpendicular to the unique line. Earlier than we get to all that, we have to load the info.
import geopandas as gpdROADS_LOC = r"NYC Road CenterlineNYC_Roads.shp"
ROADS_GDF = (
gpd.read_file(ROADS_LOC)
.reset_index(names="unique_id")
.to_crs("epsg:6539")
)
Along with loading the info, we additionally assign a unique_id to every highway based mostly on its index, and make sure the GeoDataFrame is in a projected coordinate system with toes as its base unit — since these highway traces are in New York Metropolis, we use New York — Long Island State Plane because the coordinate reference system. The unique_id column will enable us to hyperlink our generated transects again to the unique highway section.
As soon as the info has been loaded, we have to pattern alongside each line at 10 ft intervals. We will accomplish this utilizing the GeoPandas interpolate operate, as beneath:
import pandas as pd
from typing import Tuple, Uniondef interpolate_points(
gdf: gpd.GeoDataFrame, distance: Union[float, int], is_transect: bool
) -> Tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
gdf = gdf.loc[gdf.length > distance]
new_points = gdf.interpolate(distance)
new_gdf = gpd.GeoDataFrame(
geometry=new_points,
knowledge={
"unique_id": gdf["unique_id"],
"distance": distance,
"is_transect": is_transect,
},
crs=gdf.crs,
)
return gdf, new_gdf
def get_road_points(roads: gpd.GeoDataFrame, segment_length: int)
-> gpd.GeoDataFrame:
working_roads = roads.copy()
road_points = []
for segment_distance in vary(
segment_length, int(roads.size.max()) + 1, segment_length
):
working_roads, new_gdf = interpolate_points(
working_roads, segment_distance - 1, False
)
road_points.append(new_gdf)
working_roads, new_gdf = interpolate_points(
working_roads, segment_distance, True
)
road_points.append(new_gdf)
return pd.concat(road_points, ignore_index=True)
The operate get_road_points() loops by all of the section distances within the roads GeoDataFrame, given a section size s. Each iteration i, we choose solely roads that are longer than s * i. We pattern a degree alongside every line at s * i distance. These factors would be the centerpoints of our new transects. Since we wish to orient our new transects perpendicular to the road from which they spawned, we additionally pattern a degree at distance (s * i) -1. This provides us the approximate orientation of the road, which we are going to use afterward when establishing the transects. As this methodology outputs a GeoSeries, we construct a brand new GeoDataFrame, conserving the space of the purpose alongside the road, whether or not the purpose shall be used as a transect, and our beforehand created unique_id column. Lastly, we concatenate all these GeoDataframes collectively, giving us one thing that appears like this:
Lets strive it out with our roads dataset and see if it really works:
import matplotlib.pyplot as pltpoints_10ft = get_road_points(ROADS_GDF, 10)
fig, ax = plt.subplots(figsize=(10, 12))
ROADS_GDF.loc[ROADS_GDF['unique_id'] == 100].plot(ax=ax, shade='blue')
points_10ft.loc[(points_10ft['unique_id'] == 100) &
(points_10ft['is_transect'])].plot(ax=ax, shade='crimson')
Step 2: Use Polars and rotation matrices to show these factors into transects
Now that we’ve got the factors we want, we are able to use Polar’s from_pandas() operate to transform our GeoDataFrame to Polars. Earlier than we do this, we wish to save the knowledge from the geometry column — our x and y places — and drop the geometry column.
import polars as pl def roads_to_polars(road_points: gpd.GeoDataFrame) -> pl.DataFrame:
road_points.loc[:, "x"] = road_points.geometry.x
road_points.loc[:, "y"] = road_points.geometry.y
pl_points = pl.from_pandas(road_points.drop("geometry", axis=1))
return pl_points
Lastly, we’ve got every part we have to assemble our transects. In essence, all we’ve got to do is use the angle of every transect, assemble a transect of the requested size, and rotate and translate the transect so it’s correctly positioned in area. Which will sound like a doozy, but when we break it down into steps we’re not doing something overly sophisticated.
- Align every transect level with the purpose previous to it. It will enable us to orient every transect perpendicular to the unique line by pivoting across the prior level. Actually, lets name the prior level the ‘pivot level’ to make this extra clear.
- Translate the transect level/pivot level pair in order that the pivot level is at 0, 0. By doing this, we are able to simply rotate the transect level in order that it sits at y = 0.
- Rotate the transect level in order that it sits at y = 0. By doing this, the maths to assemble new factors which can type our transect turns into easy addition.
- Assemble new factors above and beneath the rotated transect level in order that we’ve got a brand new line with distance equal to line size.
- Rotate the brand new factors again to the unique orientation relative to the pivot level.
- Translate the brand new factors in order that the pivot level returns to its authentic place.
Lets check out the code to do that. On my outdated laptop computer this code takes about 3 seconds to generate 4.2 million transects.
def make_lines(
pl_points: pl.DataFrame, line_length: Union[float, int])
-> pl.DataFrame:
return (
pl_points.type("unique_id", "distance")
# Pair every transect level with its previous pivot level
# over() teams by distinctive id, and shift(1) strikes
# every column down by 1 row
.with_columns(
origin_x=pl.col("x").shift(1).over(pl.col("unique_id")),
origin_y=pl.col("y").shift(1).over(pl.col("unique_id")),
)
# Take away pivot level rows
.filter(pl.col("is_transect"))
# Translate every level so the pivot is now (0, 0)
.with_columns(
translated_x=pl.col("x") - pl.col("origin_x"),
translated_y=pl.col("y") - pl.col("origin_y"),
)
# Calculate angle theta between translated transect level and x-axis
.with_columns(theta=pl.arctan2(
pl.col("translated_y"), pl.col("translated_x")
)
)
# Calculate phrases in rotation matrix
.with_columns(
theta_cos=pl.col("theta").cos(),
theta_sin=pl.col("theta").sin()
)
# Add y-coordinates above and beneath x axis so we've got a line of desired
# size. Since we all know that x = 1 (the space between the transect
# and pivot factors), we do not have to explicitly set that time period.
.with_columns(
new_y1=-line_length / 2,
new_y2=line_length / 2,
)
# Apply rotation matrix to factors (1, new_y1) and (1, new_y2)
# and translate again to the unique location
.with_columns(
new_x1=pl.col("theta_cos")
- (pl.col("new_y1") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y1=pl.col("theta_sin")
+ (pl.col("new_y1") * pl.col("theta_cos"))
+ pl.col("origin_y"),
new_x2=pl.col("theta_cos")
- (pl.col("new_y2") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y2=pl.col("theta_sin")
+ (pl.col("new_y2") * pl.col("theta_cos"))
+ pl.col("origin_y"),
)
# Assemble a Nicely-Identified Textual content illustration of the transects
# for straightforward conversion to GeoPandas
.with_columns(
wkt=pl.concat_str(
pl.lit("LINESTRING ("),
pl.col("new_x1"),
pl.col("new_y1"),
pl.lit(","),
pl.col("new_x2"),
pl.col("new_y2"),
pl.lit(")"),
separator=" ",
)
)
)
How are we truly conducting the rotation?
We rotate the transect level in regards to the origin utilizing a rotation matrix. It is a 2×2 matrix which, when utilized to some extent in a 2nd-plane, rotates the purpose θ levels across the origin (0, 0).
To make use of it, all we have to do is calculate θ, then multiply the purpose (x, y) and the rotation matrix. This provides us the components new_x = x*cos(θ)-y*sin(θ), new_y = x*sin(θ) + y*cos(θ). We calculate the outcome utilizing this code:
.with_columns(
new_x1=pl.col("theta_cos")
- (pl.col("new_y1") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y1=pl.col("theta_sin")
+ (pl.col("new_y1") * pl.col("theta_cos"))
+ pl.col("origin_y"),
new_x2=pl.col("theta_cos")
- (pl.col("new_y2") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y2=pl.col("theta_sin")
+ (pl.col("new_y2") * pl.col("theta_cos"))
+ pl.col("origin_y"),
)
We’re in a position to simplify the calculation, since we all know that x is all the time 1 for our factors (x, y). As an alternative of calculating x*cos(θ)-y*sin(θ) to seek out the x worth of a rotated level, we are able to simply do cos(θ)-y*sin(θ), saving us just a little little bit of computation. After all, this gained’t work in case your pivot level and transect level aren’t 1 unit aside, however for our functions it’s simply fantastic. As soon as we’ve got the rotation accomplished, all we’ve got to do is translate the newly rotated factors by the space between the pivot level and (0,0), and increase, we’re achieved. In case you are fascinated about studying extra about linear algebra and affine transformations, I extremely reccomend the 3Blue1Brown sequence Essence of Linear Algebra.
Utilizing the above features, we are able to take our GeoDataFrame, convert it to Polars, and calculate new transects like so:
points_10ft_pl = roads_to_polars(points_10ft)
transects_10ft = make_lines(points_10ft_pl, 10)
transects_10ft.choose(['unique_id', 'wkt'])
Step 3: Convert the transects again to Geopandas
Lastly! We’ve got the geometric coordinates of the transects we want, however we wish them in a GeoDataFrame so we are able to return to the consolation of GIS operations. Since we constructed our new transects as WKT Linestrings on the finish of the make_lines() operate, the code to take action is straightforward:
transects_gdf = gpd.GeoDataFrame(
crs=ROADS_GDF.crs,
knowledge={"unique_id": transects_10ft.choose("unique_id").to_series()},
geometry=gpd.GeoSeries.from_wkt(
transects_10ft.choose("wkt").to_series().to_list()
),
)
And thats it! We now have a GeoDataFrame with 4.2 million transects, every linked to their authentic roadway utilizing the unique_id column.
Lets take a look at our new geometry:
ROAD_ID = 98237
fig, ax = plt.subplots(figsize=(10, 12))
ROADS_GDF.loc[ROADS_GDF['unique_id'] == ROAD_ID].plot(ax=ax, shade='blue')
transects_gdf.loc[transects_gdf['unique_id'] == ROAD_ID].plot(ax=ax, shade='inexperienced')
ax.set_aspect("equal")
Transects spaced 10ft aside alongside a pedestrian pathLooks good! We’re shedding just a little little bit of precision with the WKT conversion, so there may be in all probability a greater method to do this, nevertheless it works effectively sufficient for me.
Placing all of it Collectively
On your comfort, I’ve mixed the code to generate the transects beneath. It will solely work with knowledge projected to a planar coordinate system, and if you wish to use a pivot level that’s additional than 1 unit from a transect level, you will have to switch the code.
from typing import Tuple, Unionimport geopandas as gpd
import pandas as pd
import polars as pl
def interpolate_points(
gdf: gpd.GeoDataFrame, distance: Union[float, int], is_transect: bool
) -> Tuple[gpd.GeoDataFrame, gpd.GeoDataFrame]:
gdf = gdf.loc[gdf.length > distance]
new_points = gdf.interpolate(distance)
new_gdf = gpd.GeoDataFrame(
geometry=new_points,
knowledge={
"unique_id": gdf["unique_id"],
"distance": distance,
"is_transect": is_transect,
},
crs=gdf.crs,
)
return gdf, new_gdf
def get_road_points(roads: gpd.GeoDataFrame, segment_length: int) -> gpd.GeoDataFrame:
working_roads = roads.copy()
road_points = []
for segment_distance in vary(
segment_length, int(roads.size.max()) + 1, segment_length
):
working_roads, new_gdf = interpolate_points(
working_roads, segment_distance - 1, False
)
road_points.append(new_gdf)
working_roads, new_gdf = interpolate_points(
working_roads, segment_distance, True
)
road_points.append(new_gdf)
return pd.concat(road_points, ignore_index=True)
def roads_to_polars(road_points: gpd.GeoDataFrame) -> pl.DataFrame:
road_points.loc[:, "x"] = road_points.geometry.x
road_points.loc[:, "y"] = road_points.geometry.y
pl_points = pl.from_pandas(road_points.drop("geometry", axis=1))
return pl_points
def make_lines(pl_points: pl.DataFrame, line_length: Union[float, int]) -> pl.DataFrame:
return (
pl_points.type("unique_id", "distance")
.with_columns(
origin_x=pl.col("x").shift(1).over(pl.col("unique_id")),
origin_y=pl.col("y").shift(1).over(pl.col("unique_id")),
)
.filter(pl.col("is_transect"))
.with_columns(
translated_x=pl.col("x") - pl.col("origin_x"),
translated_y=pl.col("y") - pl.col("origin_y"),
)
.with_columns(theta=pl.arctan2(pl.col("translated_y"), pl.col("translated_x")))
.with_columns(theta_cos=pl.col("theta").cos(), theta_sin=pl.col("theta").sin())
.with_columns(
new_y1=-line_length / 2,
new_y2=line_length / 2,
)
.with_columns(
new_x1=pl.col("theta_cos")
- (pl.col("new_y1") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y1=pl.col("theta_sin")
+ (pl.col("new_y1") * pl.col("theta_cos"))
+ pl.col("origin_y"),
new_x2=pl.col("theta_cos")
- (pl.col("new_y2") * pl.col("theta_sin"))
+ pl.col("origin_x"),
new_y2=pl.col("theta_sin")
+ (pl.col("new_y2") * pl.col("theta_cos"))
+ pl.col("origin_y"),
)
.with_columns(
wkt=pl.concat_str(
pl.lit("LINESTRING ("),
pl.col("new_x1"),
pl.col("new_y1"),
pl.lit(","),
pl.col("new_x2"),
pl.col("new_y2"),
pl.lit(")"),
separator=" ",
)
)
)
ROADS_LOC = r"NYC Road CenterlineNYC_Roads.shp"
ROADS_GDF = gpd.read_file(ROADS_LOC).reset_index(names="unique_id").to_crs("epsg:6539")points_10ft = get_road_points(roads=ROADS_GDF, segment_length=10)
points_10ft_pl = roads_to_polars(road_points=points_10ft)
transects_10ft = make_lines(pl_points=points_10ft_pl, line_length=10)
transects_gdf = gpd.GeoDataFrame(
crs=ROADS_GDF.crs,
knowledge={"unique_id": transects_10ft.choose("unique_id").to_series()},
geometry=gpd.GeoSeries.from_wkt(transects_10ft.choose("wkt").to_series().to_list()),
)
Thanks for studying! There are in all probability much more methods this code might be optimized or made extra versatile, nevertheless it solved an enormous drawback for me and reduce down compute time from days to seconds. With just a little little bit of concept and realizing when to make use of what software, issues that beforehand appear insurmountable grow to be shockingly straightforward. It solely took me about half-hour and just a little wikipedia linear algebra referesher to put in writing the code for this put up, saving me tons of time and, extra importantly, defending me from having to clarify to a consumer why a venture was pointlessly delayed. Let me know if this was useful to you!
Except in any other case famous, all pictures are by the writer.