For a better version of the stars vignettes seehttps://r-spatial.github.io/stars/articles/
This vignette explains the data model ofstars objects,illustrated using artificial and real datasets.
stars objects consist of
dim) attributedimensions of classdimensions that carries dimension metadatastarsAdimensions object is a named list ofdimension elements, each describing the semantics adimension of the data arrays (space, time, type etc). In addition tothat, adimensions object has an attribute calledraster of classstars_raster, which is a namedlist with three elements:
dimensions length 2 character; the dimension names thatconstitute a spatial raster (or NA)affine length 2 numeric; the two affine parameters ofthe geotransform (or NA)curvilinear a boolean indicating whether a raster is acurvilinear raster (or NA)Theaffine andcurvilinear values are onlyrelevant in case of raster data, indicated bydimensions tohave non-NA values.
Adimension object describes asingledimension; it is a list with named elements
from: (numeric length 1): the start index of thearrayto: (numeric length 1): the end index of the arrayoffset: (numeric length 1): the start coordinate (ortime) value of the first pixel (i.e., a pixel/cell boundary)delta: (numeric length 1): the increment, or cellsizerefsys: (character, orcrs): objectdescribing the reference system; e.g. the PROJ string, or stringPOSIXct orPCICt (for 360 and 365 days/yearcalendars), or object of classcrs (containing both EPSGcode and proj4string)point: (logical length 1): boolean indicating whethercells/pixels refer to areas/periods, or to points/instances (may beNA)values: one ofNULL (missing),POSIXct,PCICt, orsfc),intervals (a list with two vectors,start andend, with interval start- andend-values), orfrom andto will usually be 1 and thedimension size, butfrom may be larger than 1 in case asub-grid got was selected (or cropped).
offset anddelta only apply toregularly discretized dimensions, and areNA ifthis is not the case. If they areNA, dimension values maybe held in thevalues field. Rectilinear and curvilineargrids need grid values invalues that can be either:
Alternatively,values can contains a set of spatialgeometries encoded in ansfc vector (“list-column”), inwhich case we have avector datacube.
With a very simple file created from a\(4\times 5\) matrix
suppressPackageStartupMessages(library(stars))m=matrix(1:20,nrow =5,ncol =4)dim(m)=c(x =5,y =4)# named dim(s =st_as_stars(m))## stars object with 2 dimensions and 1 attribute## attribute(s):## Min. 1st Qu. Median Mean 3rd Qu. Max.## A1 1 5.75 10.5 10.5 15.25 20## dimension(s):## from to offset delta point x/y## x 1 5 0 1 FALSE [x]## y 1 4 0 1 FALSE [y]we see that
from andto fields of each dimensiondefine a range that corresponds to the array dimension:When we plot this object, using theimage method forstars objects,
we see that\((0,0)\) is the originof the grid (grid corner), and\(1\)the coordinate value increase from one index (row, col) to the next. Itmeans that consecutive matrix columns represent grid lines, going fromsouth to north. Grids defined this way areregular:grid cell size is constant everywhere.
Many actual grid datasets have y coordinates (grid rows) going fromNorth to South (top to bottom); this is realised with a negative valuefordelta. We see that the grid origin\((0,0)\) did not change:
An example is the GeoTIFF carried in the package, which, as probablyall data sources read through GDAL, has a negativedeltafor they-coordinate:
Dimension tables ofstars objects carry araster attribute:
str(attr(st_dimensions(s),"raster"))## List of 4## $ affine : num [1:2] 0 0## $ dimensions : chr [1:2] "x" "y"## $ curvilinear: logi FALSE## $ blocksizes : NULL## - attr(*, "class")= chr "stars_raster"which is a list that holds
dimensions: character, the names of raster dimensions(if any), as opposed to e.g. spectral, temporal or other dimensionsaffine: numeric, the affine parameterscurvilinear: a logical indicating whether the raster iscurvilinearThese fields are needed at this level, because they describeproperties of the array at a higher level than individual dimensions do:a pair of dimensions forms a raster, bothaffine andcurvilinear describe how x and yas a pair arederived from grid indexes (see below) when this cannot be done on aper-dimension basis.
With two affine parameters\(a_1\)and\(a_2\),\(x\) and\(y\) coordinates are derived from (1-based)grid indexes\(i\) and\(j\), grid offset values\(o_x\) and\(o_y\), and grid cell sizes\(d_x\) and\(d_y\) by
\[x = o_x + (i-1) d_x + (j-1)a_1\]
\[y = o_y + (i-1) a_2 + (j-1) d_y\]Clearly, when\(a_1=a_2=0\),\(x\) and\(y\) are entirely derived from theirrespective index, offset and cellsize.
Note that for integer indexes, the coordinates are that of thestarting edge of a grid cell; to get the grid cell center of the topleft grid cell (in case of a negative\(d_y\)), use\(i=1.5\) and\(j=1.5\).
We can rotate grids by setting\(a_1\) and\(a_2\) to a non-zero value:
attr(attr(s,"dimensions"),"raster")$affine=c(0.1,0.1)plot(st_as_sf(s,as_points =FALSE),axes =TRUE,nbreaks =20)The rotation angle, in degrees, is
Sheared grids are obtained when the two rotation coefficients,\(a_1\) and\(a_2\), are unequal:
attr(attr(s,"dimensions"),"raster")$affine=c(0.1,0.2)plot(st_as_sf(s,as_points =FALSE),axes =TRUE,nbreaks =20)Now, the y-axis and x-axis have different rotation in degrees ofrespectively
Rectilineargrids have orthogonal axes, but do not have congruent (equally sizedand shaped) cells: each axis has its own irregular subdivision.
We can define a rectilinear grid by specifying the cellboundaries, meaning for every dimension we specifyonemore value than the dimension size:
x=c(0,0.5,1,2,4,5)# 6 numbers: boundaries!y=c(0.3,0.5,1,2,2.2)# 5 numbers: boundaries!(r =st_as_stars(list(m = m),dimensions =st_dimensions(x = x,y = y)))## stars object with 2 dimensions and 1 attribute## attribute(s):## Min. 1st Qu. Median Mean 3rd Qu. Max.## m 1 5.75 10.5 10.5 15.25 20## dimension(s):## from to point values x/y## x 1 5 FALSE [0,0.5),...,[4,5) [x]## y 1 4 FALSE [0.3,0.5),...,[2,2.2) [y]st_bbox(r)## xmin ymin xmax ymax## 0.0 0.3 5.0 2.2image(r,axes =TRUE,col =grey((1:20)/20))Would we leave out the last value, thanstars may comeup with adifferent cell boundary for the last cell, as this isnow derived from the width of the one-but-last cell:
x=c(0,0.5,1,2,4)# 5 numbers: offsets only!y=c(0.3,0.5,1,2)# 4 numbers: offsets only!(r =st_as_stars(list(m = m),dimensions =st_dimensions(x = x,y = y)))## stars object with 2 dimensions and 1 attribute## attribute(s):## Min. 1st Qu. Median Mean 3rd Qu. Max.## m 1 5.75 10.5 10.5 15.25 20## dimension(s):## from to point values x/y## x 1 5 FALSE [0,0.5),...,[4,6) [x]## y 1 4 FALSE [0.3,0.5),...,[2,3) [y]st_bbox(r)## xmin ymin xmax ymax## 0.0 0.3 6.0 3.0This is not problematic if cells have a constant width, in which casethe boundaries are reduced to anoffset anddelta value, irrespective whether an upper boundary isgiven:
x=c(0,1,2,3,4)# 5 numbers: offsets only!y=c(0.5,1,1.5,2)# 4 numbers: offsets only!(r =st_as_stars(list(m = m),dimensions =st_dimensions(x = x,y = y)))## stars object with 2 dimensions and 1 attribute## attribute(s):## Min. 1st Qu. Median Mean 3rd Qu. Max.## m 1 5.75 10.5 10.5 15.25 20## dimension(s):## from to offset delta point x/y## x 1 5 0 1 FALSE [x]## y 1 4 0.5 0.5 FALSE [y]st_bbox(r)## xmin ymin xmax ymax## 0.0 0.5 5.0 2.5Alternatively, one can also set thecell midpoints byspecifying argumentscell_midpoints to thest_dimensions call:
When the dimension is regular, this results inoffsetbeing shifted back with half adelta, or else in intervalsderived from the distances between cell centers. This should obviouslynot be done when cell boundaries are specified.
Curvilinear grids are grids whose grid lines are not straight. Ratherthan describing the curvature parametrically, the typical (HDF5 orNetCDF) files in which they are found have two raster layers with thelongitudes and latitudes for every corresponding pixel of remaininglayers.
As an example, we will use a Sentinel 5P dataset available frompackagestarsdata; this package can be installed with
The dataset is found here:
(s5p =system.file("sentinel5p/S5P_NRTI_L2__NO2____20180717T120113_20180717T120613_03932_01_010002_20180717T125231.nc",package ="starsdata"))## [1] "/home/edzer/R/x86_64-pc-linux-gnu-library/4.4/starsdata/sentinel5p/S5P_NRTI_L2__NO2____20180717T120113_20180717T120613_03932_01_010002_20180717T125231.nc"We can construct the curvilinearstars raster by callingread_stars on the right sub-array:
subs=gdal_subdatasets(s5p)subs[[6]]## [1] "NETCDF:\"/home/edzer/R/x86_64-pc-linux-gnu-library/4.4/starsdata/sentinel5p/S5P_NRTI_L2__NO2____20180717T120113_20180717T120613_03932_01_010002_20180717T125231.nc\":/PRODUCT/nitrogendioxide_tropospheric_column"For this array, we can see the GDAL metadata under itemGEOLOCATION:
gdal_metadata(subs[[6]],"GEOLOCATION")## $SRS## [1] "GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]"#### $X_DATASET## [1] "NETCDF:\"/home/edzer/R/x86_64-pc-linux-gnu-library/4.4/starsdata/sentinel5p/S5P_NRTI_L2__NO2____20180717T120113_20180717T120613_03932_01_010002_20180717T125231.nc\":/PRODUCT/longitude"#### $X_BAND## [1] "1"#### $Y_DATASET## [1] "NETCDF:\"/home/edzer/R/x86_64-pc-linux-gnu-library/4.4/starsdata/sentinel5p/S5P_NRTI_L2__NO2____20180717T120113_20180717T120613_03932_01_010002_20180717T125231.nc\":/PRODUCT/latitude"#### $Y_BAND## [1] "1"#### $PIXEL_OFFSET## [1] "0"#### $PIXEL_STEP## [1] "1"#### $LINE_OFFSET## [1] "0"#### $LINE_STEP## [1] "1"#### $GEOREFERENCING_CONVENTION## [1] "PIXEL_CENTER"#### attr(,"class")## [1] "gdal_metadata"which reveals where, in this dataset, the longitude and latitudearrays are kept.
nit.c=read_stars(subs[[6]])## Warning in CPL_read_gdal(as.character(x), as.character(options),## as.character(driver), : GDAL Message 1: The dataset has several variables that## could be identified as vector fields, but not all share the same primary## dimension. Consequently they will be ignored.## Warning in CPL_read_gdal(as.character(x), as.character(options),## as.character(driver), : GDAL Message 1: The dataset has several variables that## could be identified as vector fields, but not all share the same primary## dimension. Consequently they will be ignored.## Warning in CPL_read_gdal(as.character(x), as.character(options),## as.character(driver), : GDAL Message 1: The dataset has several variables that## could be identified as vector fields, but not all share the same primary## dimension. Consequently they will be ignored.## Warning in CPL_read_gdal(as.character(x), as.character(options),## as.character(driver), : GDAL Message 1: The dataset has several variables that## could be identified as vector fields, but not all share the same primary## dimension. Consequently they will be ignored.## Warning in CPL_read_gdal(as.character(x), as.character(options),## as.character(driver), : GDAL Message 1: The dataset has several variables that## could be identified as vector fields, but not all share the same primary## dimension. Consequently they will be ignored.## Warning in CPL_read_gdal(as.character(x), as.character(options),## as.character(driver), : GDAL Message 1: The dataset has several variables that## could be identified as vector fields, but not all share the same primary## dimension. Consequently they will be ignored.## Warning in CPL_read_gdal(as.character(x), as.character(options),## as.character(driver), : GDAL Message 1: The dataset has several variables that## could be identified as vector fields, but not all share the same primary## dimension. Consequently they will be ignored.## Warning in CPL_read_gdal(as.character(x), as.character(options),## as.character(driver), : GDAL Message 1: The dataset has several variables that## could be identified as vector fields, but not all share the same primary## dimension. Consequently they will be ignored.## Warning in CPL_read_gdal(as.character(x), as.character(options),## as.character(driver), : GDAL Message 1: The dataset has several variables that## could be identified as vector fields, but not all share the same primary## dimension. Consequently they will be ignored.## Warning in CPL_read_gdal(as.character(x), as.character(options),## as.character(driver), : GDAL Message 1: The dataset has several variables that## could be identified as vector fields, but not all share the same primary## dimension. Consequently they will be ignored.threshold= units::set_units(9e+36, mol/m^2)nit.c[[1]][nit.c[[1]]> threshold]=NAnit.c## stars object with 3 dimensions and 1 attribute## attribute(s):## Min. 1st Qu.## nitrogendioxide_tropospheri... [mol/m^2] -3.301083e-05 1.868205e-05## Median Mean 3rd Qu.## nitrogendioxide_tropospheri... [mol/m^2] 2.622178e-05 2.898976e-05 3.629641e-05## Max. NA's## nitrogendioxide_tropospheri... [mol/m^2] 0.0003924858 330## dimension(s):## from to offset refsys values## x 1 450 NA WGS 84 (CRS84) [450x278] -5.811 [°],...,30.95 [°]## y 1 278 NA WGS 84 (CRS84) [450x278] 28.36 [°],...,51.47 [°]## time 1 1 2018-07-17 UTC POSIXct NULL## x/y## x [x]## y [y]## time## curvilinear gridThe curvilinear array has the actual arrays (raster layers, matrices)with longitude and latitude values read in its dimension table. We canplot this file:
plot(nit.c,breaks ="equal",reset =FALSE,axes =TRUE,as_points =TRUE,pch =16,logz =TRUE,key.length =1)## Warning in NextMethod(): NaNs produced## Warning in plot.sf(x, pal = col, ...): NaNs producedmaps::map('world',add =TRUE,col ='red')plot(nit.c,breaks ="equal",reset =FALSE,axes =TRUE,as_points =FALSE,border =NA,logz =TRUE,key.length =1)## Warning in NextMethod(): NaNs produced## Warning in plot.sf(x, pal = col, ...): NaNs producedmaps::map('world',add =TRUE,col ='red')We can downsample the data by
(nit.c_ds = stars:::st_downsample(nit.c,8))## stars object with 3 dimensions and 1 attribute## attribute(s):## Min. 1st Qu. Median## nitrogendioxide_tropospheri... [mol/m^2] -1.847503e-05 1.85778e-05 2.700901e-05## Mean 3rd Qu. Max.## nitrogendioxide_tropospheri... [mol/m^2] 2.9113e-05 3.642568e-05 0.0001363282## NA's## nitrogendioxide_tropospheri... [mol/m^2] 32## dimension(s):## from to offset refsys values x/y## x 1 50 NA WGS 84 (CRS84) [50x31] -5.811 [°],...,30.14 [°] [x]## y 1 31 NA WGS 84 (CRS84) [50x31] 28.78 [°],...,51.47 [°] [y]## time 1 1 2018-07-17 UTC POSIXct NULL## curvilinear gridplot(nit.c_ds,breaks ="equal",reset =FALSE,axes =TRUE,as_points =TRUE,pch =16,logz =TRUE,key.length =1)## Warning in NextMethod(): NaNs produced## Warning in plot.sf(x, pal = col, ...): NaNs producedmaps::map('world',add =TRUE,col ='red')which doesn’t look nice, but plotting the cells as polygons looksbetter:
plot(nit.c_ds,breaks ="equal",reset =FALSE,axes =TRUE,as_points =FALSE,border =NA,logz =TRUE,key.length =1)## Warning in NextMethod(): NaNs produced## Warning in plot.sf(x, pal = col, ...): NaNs producedmaps::map('world',add =TRUE,col ='red')Another approach would be to warp the curvilinear grid to a regulargrid, e.g. by
w=st_warp(nit.c,crs =4326,cellsize =0.25)## Warning in transform_grid_grid(st_as_stars(src), st_dimensions(dest),## threshold): using Euclidean distance measures on geodetic coordinates## threshold set to 0.108545 : set a larger value if you see missing values where there shouldn't beplot(w)