- Notifications
You must be signed in to change notification settings - Fork0
ad-freiburg/spatialjoin
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Compute a spatial self-join (on intersects, contains, covers, touches, crosses, overlaps and equals) on line-separated WKT geometries read from stdin. Also supports within-distance joins for arbitrary meter distances if the input geometries are given as WGS84 coordinates.
Relations are written to stdout (or to a BZ2/GZ/plain file specified with-o
).
Can handle massive amounts of input data. For example, the full self-join on the complete ~1.5 B geometries of OpenStreetMap can be computed (excluding the time required for input parsing and output writing) in around 90 minutes on an AMDRyzen 9 7950X machine with 16 physical and 32 virtual cores, 128 GB of RAM (DDR5), and 7.7 TB of disk space (NVMe SSD).
Additional materials required to do a full evaluation of our tool and a comparison against PostgreSQL+PostGIS can be foundhere.
cmake
gcc >= 5.0
(orclang >= 3.9
)libbz2
Fetch this repository and init submodules:
git clone --recurse-submodules https://github.com/ad-freiburg/spatialjoin
mkdir build && cd buildcmake ..make -j
To install, type
make install
$ cat example.txtPOLYGON((0 0, 10 0 ,10 10, 0 10, 0 0))POLYGON((0 0, 10 0, 10 10, 0 10, 0 0), (1 1, 9 1, 9 9, 1 9, 1 1))MULTIPOLYGON(((0 0, 10 0, 10 10, 0 10, 0 0), (1 1, 9 1, 9 9, 1 9, 1 1)))POLYGON((4 4, 5 4, 5 5, 4 5, 4 4))POLYGON((4 4, 5 4, 5 11, 4 11, 4 4))LINESTRING(1 1, 1 2)LINESTRING(0.5 1.5, 1.5 1.5)LINESTRING(-10 1, 100 1)POINT(0.5 0.5)
$ spatialjoin < example.txt1 contains 99 intersects 1[...]
You may specify a custom geometry string ID, outputted instead of the line number, before the WKT, separated by a tab:
$ cat example.txtpolygon1POLYGON((0 0, 10 0 ,10 10, 0 10, 0 0))polygon2POLYGON((0 0, 10 0, 10 10, 0 10, 0 0), (1 1, 9 1, 9 9, 1 9, 1 1))multipolygon3MULTIPOLYGON(((0 0, 10 0, 10 10, 0 10, 0 0), (1 1, 9 1, 9 9, 1 9, 1 1)))polygon4POLYGON((4 4, 5 4, 5 5, 4 5, 4 4))polygon5POLYGON((4 4, 5 4, 5 11, 4 11, 4 4))linestring6LINESTRING(1 1, 1 2)linestring7LINESTRING(0.5 1.5, 1.5 1.5)linestring8LINESTRING(-10 1, 100 1)point9POINT(0.5 0.5)
$ spatialjoin < example.txtpolygon1 contains point9point9 intersects polygon1[...]
You may specify a "side" (either 0 or 1) per geometry, as an additional tab-separated field after the custom geometry ID. If sides are defined, only geometries from different sides are compared. Note that a custom geometry IDmust be given, otherwise the side will be interpreted as the custom geometry ID. The default side is 0.
$ cat example.txtpolygon10POLYGON((0 0, 10 0 ,10 10, 0 10, 0 0))polygon20POLYGON((0 0, 10 0, 10 10, 0 10, 0 0), (1 1, 9 1, 9 9, 1 9, 1 1))multipolygon30MULTIPOLYGON(((0 0, 10 0, 10 10, 0 10, 0 0), (1 1, 9 1, 9 9, 1 9, 1 1)))polygon41POLYGON((4 4, 5 4, 5 5, 4 5, 4 4))polygon51POLYGON((4 4, 5 4, 5 11, 4 11, 4 4))linestring61LINESTRING(1 1, 1 2)linestring71LINESTRING(0.5 1.5, 1.5 1.5)linestring80LINESTRING(-10 1, 100 1)point91POINT(0.5 0.5)
One use case ofspatialjoin
is to add triples for the relationscontains
andintersects
to an RDF dataset with WKT literals. The following example showsthe process for the OSM data for germany.
NAME=osm-germanywget -O ${NAME}.pbf https://download.geofabrik.de/europe/germany-latest.osm.pbfosm2rdf ${NAME}.pbf -o ${NAME}.ttl --simplify-wkt 0 --write-ogc-geo-triples none
Note:osm2rdf
by default computes and outputs the predicatesogc:sfContains
andogc:sfIntersects
. The--write-ogc-geo-triples none
option disablesthis. To have both theosm2rdf
predicatesand thespatiajoin
predicates(for comparison or debugging), just omit the option.
time lbzcat -n 2 ${NAME}.ttl.bz2 | \grep "^osm2rdf" | sed -En 's/^osm2rdf(geom)?:(osm_)?(node|rel|way)[a-z]*_([0-9]+) geo:asWKT "([^\"]+)".*/osm\3:\4\t\5/p' | spatialjoin --contains ' ogc:sfContains ' --intersects ' ogc:sfIntersects ' --suffix $' .\n' -o ${NAME}.spatialjoin-triples.ttl.bz2
Note: This reconstructs the OSM IDs from osm2rdf's geo:asWKT triples, where thesubject is of one of these forms (note the very confusing inconsistency forways):osm2rdfgeom:osm_node_(\d+)
,osm2rdfgeom:osm_rel_(\d+)
,osm2rdf:way_(\d+)
,osm2rdfgeom:osm_wayarea_(\d+)
,osm2rdfgeom:osm_relarea_(\d+)
.
ulimit -Sn 1048576; bzcat ${NAME}.ttl.bz2 ${NAME}.spatialjoin-triples.ttl.bz2 | IndexBuilderMain -F ttl -f - -i ${NAME} -s ${NAME}.settings.json --stxxl-memory 10G | tee ${NAME}.index-log.txtServerMain -i ${NAME} -j 8 -p ${PORT} -m 20G -c 10G -e 3G -k 200 -s 300s
With the right QLeverfile (modify the one obtained viaqlever setup-config osm-planet
), it's simply:
qlever indexqlever start