If you work with fish telemetry data on the east coast of the USA, chances are that you’re now at least tangentially related to the Ocean Tracking Network (OTN).
The Ocean Tracking Network houses their data on a GeoServer, which often uses PostgreSQL/PostGIS behind the scenes. These databases store their spatial data in a format called “well-known binary” – as opposed to the human-readable “well-known text” you see in the output of an sf object.
OTN data extracts export the WKB in a column called “the_geom”; it looks like a long string of numbers and letters. To investigate this, I’ll use the data set from Trudel 2018.
# Download toy datatd <-file.path(tempdir(), 'otn_files')dir.create(td)download.file('https://members.oceantrack.org/data/repository/pbsm/detection-extracts/pbsm_qualified_detections_2018.zip',destfile =file.path(td, 'pbsm_qualified_detections_2018.zip'))unzip(file.path(td, 'pbsm_qualified_detections_2018.zip'),exdir = td)# Read in dataotn <-read.csv(file.path(td, 'pbsm_qualified_detections_2018.csv'))head(otn[, c('latitude', 'longitude', 'the_geom')])
The main difference here is that the_geom can contain all of the information we may need, like coordinates, geometry type (points? polygons? multipoints? MULTIPOLYGONS???), and coordinate reference system. The latitude and longitude columns are just text: we need to infer/assume all of the other information.
In this particular case, that’s pretty easy. The latitude/longitude combinations are representing deployed receivers (points) and the are almost certainly in WGS84 (EPSG 4326) as that’s the system most-commonly used by a handheld GPS. We can provide this information directly and convert the CSV into an sf object.
library(sf)
Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
Simple feature collection with 6 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -66.93253 ymin: 45.03992 xmax: -66.89643 ymax: 45.06803
Geodetic CRS: WGS 84
the_geom geometry
1 0101000000CF49EF1B5FB950C0693A3B191C854640 POINT (-66.89643 45.03992)
2 01010000007D224F92AEBB50C00E677E3507884640 POINT (-66.93253 45.06272)
3 0101000000DAC9E02879BB50C0CD920035B5884640 POINT (-66.92927 45.06803)
4 0101000000DAC9E02879BB50C0CD920035B5884640 POINT (-66.92927 45.06803)
5 01010000007D224F92AEBB50C00E677E3507884640 POINT (-66.93253 45.06272)
6 0101000000DAC9E02879BB50C0CD920035B5884640 POINT (-66.92927 45.06803)
The information we provided (coordinates and a coordinate reference system) helped fill out the metadata in the header and the well-known text (WKT) representation of the points in the “geometry” column. The analogous well-known binary (WKB) is contained within “the_geom” column. At this point, the WKB are just character strings.
We can convert the WKB as well, but it necessitates us jumping through some strange hoops. First we need to make “the_geom” have a “WKB” class.
otn_wkb <-structure(otn$the_geom, class ='WKB')attributes(otn_wkb)
$class
[1] "WKB"
We can then convert this to a simple features collection via st_as_sfc. Note that you may have to pass the EWKB = T argument if you come across some WKB in the wild, as PostGIS can create two types of WKB: Extended WKB and ISO WKB. EWKB allows other dimensions (like depth) and embedding a spatial reference identifier (SRID). ISO WKB also allows other identifiers, but no SRID. OTN seems to use ISO WKB as there is no CRS associated with the data.
# ISO WKB, no CRSst_crs(st_as_sfc(otn_wkb, EWKB = F))
Coordinate Reference System: NA
# Would have a CRS if EWKBst_crs(st_as_sfc(otn_wkb, EWKB = T))
Coordinate Reference System: NA
To complete the cycle, we will convert the column to a simple features collection, then set it as the geometry of the original dataset.
Simple feature collection with 622 features and 0 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -67.09352 ymin: 44.93334 xmax: -61.62142 ymax: 46.54271
CRS: NA
First 10 features:
geometry
1 POINT (-66.89643 45.03992)
2 POINT (-66.93253 45.06272)
3 POINT (-66.92927 45.06803)
4 POINT (-66.92927 45.06803)
5 POINT (-66.93253 45.06272)
6 POINT (-66.92927 45.06803)
7 POINT (-66.9235 45.07047)
8 POINT (-66.92927 45.06803)
9 POINT (-66.9235 45.07047)
10 POINT (-66.92927 45.06803)
So, is there any advantage to jumping through these hoops? Let’s benchmark it.
Unit: microseconds
expr min lq mean median uq max neval
from_binary 2478.8 2559.90 3096.142 2642.40 3558.20 7430.1 100
from_coord 955.6 994.35 1214.427 1029.45 1320.25 2386.9 100
Sure doesn’t seem like it. After 100 iterations, parsing the binary is about four times slower than using st_as_sf. This result, combined with the fact that the code is more confusing and a PostGIS database is likely not being utilized by OTN’s end-users, suggests that the column may not get much use. Converting to EWKB may provide more use via adding a CRS, but the changes in the back end to make this happen probably make it so “the juice ain’t worth the squeeze.”
References
Trudel, Marc. “A Pilot Study to Investigate the Migration of Atlantic Salmon Post-Smolts and Their Interactions with Aquaculture in Passamaquoddy Bay, New Brunswick, Canada.” Ocean Tracking Network, 2018. https://members.oceantrack.org/project?ccode=PBSM.