bcgov/fasstrPublic

NotificationsYou must be signed in to change notification settings
Fork10
Star60

An R package to analyze, summarize, and visualize daily streamflow data 💧

License

Apache-2.0 license

60 stars 10 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 906 Commits
.github		.github
R		R
docs		docs
man		man
revdep		revdep
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COMPLIANCE.yaml		COMPLIANCE.yaml
CONTRIBUTING.md		CONTRIBUTING.md
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
cran-comments.md		cran-comments.md
fasstr.Rproj		fasstr.Rproj
fasstr_cheatsheet.pdf		fasstr_cheatsheet.pdf
index.Rmd		index.Rmd
index.md		index.md
release_process.md		release_process.md

Repository files navigation

fasstr

The Flow Analysis Summary Statistics Tool for R (‘fasstr’) is a set ofR functions to tidy, summarize, analyze,trend, and visualize streamflow data. This package summarizes continuousdaily mean streamflow data into various daily, monthly, annual, andlong-term statistics, completes annual trends and frequency analyses, inboth table and plot formats.

Reference

fasstr package 📦 home page and referenceguide

Features

This package provides functions for streamflow data analysis, including:

data tidying (to prepare data for analyses;add_* andfill_*functions),
data screening (to identify data range, outliers and missing data;screen_* functions),
calculating summary statistics (long-term, annual, monthly and dailystatistics;calc_*functions),
computing analyses (volume frequency analyses and annual trending;compute_* functions), and,
visualizing (data plotting the various statistics;plot_*functions).

Useful features of functions include:

the integration of thetidyhydat package to pull streamflow datafrom a Water Survey of CanadaHYDATdatabase for analyses;
arguments for filtering of years and months in analyses and plotting;
choosing the start month of your water year;
selecting for rolling day averages (e.g. 7-day rolling average); and,
choosing how missing dates are handled, amongst others.

This package is maintained by the Water Management Branch of the BritishColumbia Ministry of Water, Land and Resource Stewardship.

Installation

You can installfasstr directly fromCRAN:

install.packages("fasstr")

To install the development version fromGitHub, use theremotes package then thefasstr package:

if(!requireNamespace("remotes")) install.packages("remotes")remotes::install_github("bcgov/fasstr")

To use thestation_number argument and pull data directly from aWater Survey of Canada HYDATdatabaseintofasstr functions, download a HYDAT file using the following code:

tidyhydat::download_hydat()

Using fasstr

There are several vignettes and a cheatsheet to provide more informationon the usage offasstr functions and how to customize various argumentoptions.

Cheatsheet

Data Input

All functions infasstr require a daily mean streamflow data set fromone or more hydrometric stations. Long-term and continuous data sets arepreferred for most analyses, but seasonal and partial data can be used.Other daily time series data, like temperature, precipitation or waterlevels, may also be used, but with certain caution as somecalculations/conversions are based on units of streamflow (cubic metresper second). Data is provided to each function using the either thedata argument as a data frame of flow values, or thestation_numberargument as a list of Water Survey of Canada HYDAT station numbers.

When using thedata option, a data frame of daily data containingcolumns of dates (YYYY-MM-DD in date format), values (mean dailydischarge in cubic metres per second in numeric format), and,optionally, grouping identifiers (character string of station names ornumbers) is called. By default the functions will look for columnsidentified as ‘Date’, ‘Value’, and ‘STATION_NUMBER’, respectively, to becompatible with the ‘tidyhydat’ defaults, but columns of different namescan be identified using thedates,values,groups column arguments(ex.values = Yield_mm). The following is an example of an appropriatedata frame (STATION_NUMBER not required):

#>   STATION_NUMBER       Date Value#> 1        08NM116 1949-04-01  1.13#> 2        08NM116 1949-04-02  1.53#> 3        08NM116 1949-04-03  2.07#> 4        08NM116 1949-04-04  2.07#> 5        08NM116 1949-04-05  2.21#> 6        08NM116 1949-04-06  2.21

Alternatively, you can directly pull a flow data set directly from aHYDAT database (if installed) by providing a list of station numbers inthestation_number argument (ex.station_number = "08NM116" orstation_number = c("08NM116", "08NM242")) while leaving the dataarguments blank. A data frame of daily streamflow data for all stationslisted will be extracted usingtidyhydat and thenfasstrcalculations will produce results of the functions.

This package allows for multiple stations (or other groupings) to beanalyzed in many of the functions provided identifiers are providedusing thegroups column argument (defaults to STATION_NUMBER). Ifgrouping column doesn’t exist or is improperly named, then all valueslisted in thevalues column will be summarized.

Function Types

Tidying

These functions, start with eitheradd_* orfill_*, add columns androws, respectively, to streamflow data frames to help set up your datafor further analysis. Examples include adding rolling means, adding datevariables (WaterYear, Month, DayofYear, etc.), adding basin areas,adding columns of volumetric discharge and water yields, and fillingdates with missing flow values withNA.

Analysis

The analysis functions summarize your discharge values into variousstatistics.screen_* functions summarize annual data for outliers andmissing dates.calc_* functions calculate daily, monthly, annual, andlong-term statistics (e.g. mean, median, maximum, minimum, percentiles,amongst others) of daily, rolling days, and cumulative flow data.compute_* functions also analyze data but produce more in-depthanalyses, like frequency and trending analysis, and may produce multipleplots and tables as a result. All tables are in tibble data frameformats. Can usewrite_flow_data() orwrite_results() to customizesaving tibbles to a local drive.

Visualization

The visualization functions, which begin withplot_*, plot the varioussummary statistics and analyses as a way to visualize the data. Whilemost plotting function statistics can be customized, some come pre-setwith statistics that cannot be changed. Plots can be further modified bythe user using theggplot2 package and its functions. All plotsfunctions produce lists of plots (even if just one produced). Can usewrite_plots() to customize saving the lists of plots to a local drive(within folders or PDF documents).

Function Options

Daily Rolling Means

If certain n-day rolling mean statistics are desired to be analyzed(e.g. 3- or 7-day rolling means) some functions provide the ability toselect for that as function arguments (e.g. rolling_days = 7 androlling_align = "right"). The rolling day align is the placement ofthe date amongst the n-day means, where “right” averages the day-of andprevious n-1 days, “centre” date is in the middle of the averages, and“left” averages the day-of and the following n-1 days. For your ownanalyses you can add rolling means to your data set using theadd_rolling_means() function.

Year and Month Filtering

To customize your analyses for specific time periods, you can designatethe start and end years of your analysis using thestart_year andend_year arguments and remove any unwanted years (for partial datasets for example) by listing them in theexcluded_years argument(e.g. excluded_years = c(1990, 1992:1994)). Alternatively, somefunctions have an argument calledcomplete_years that summarizes datafrom just those years which have complete flow records. Some functionswill also allow you to select the months of a year to analyze, using themonths argument, as opposed to all months (if you want just summerlow-flows, for example). Leaving these arguments blank will result inthe summary/analysis of all years and months of the provided data set.

To group analyses by water, or hydrologic, years instead of calendaryears, if desired, you can setwater_year_start within most functionsto another month than 1 (for January). A water year can be defined as a12-month period that comprises a complete hydrologic cycle (wet seasonscan typically cross calendar year), typically starting with the monthwith minimum flows (the start of a new water recharge cycle). If anotherstart month is desired, you can choose it using thewater_year_startargument (numeric month). The water year identifier is designated by theyear it ends in (e.g. a water year from Oct 1, 1999 to Sep 30, 2000 isdesignated as 2000). Start, end and excluded years will be based on thespecified water year.

For your own analyses, you can add date variables to your data set usingtheadd_date_variables() oradd_seasons() functions.

Drainage Basin Area

Yield runoff statistics (in millimetres) calculated in the some of thefunctions require an upstream drainage basin area (in sq. km) using thebasin_area argument. If no basin areas are supplied, all yield resultswill beNA. To apply a basin area (10 sqkm for example) to all dailyobservations, set the argument asbasin_area = 10. If there aremultiple stations or groups to apply multiple basin areas (using thegroups argument), set them individually using this option:basin_area = c("08NM116" = 795, "08NM242" = 22). If a STATION_NUMBERcolumn exists with HYDAT station numbers, the function willautomatically use the basin areas provided in HYDAT, if available, sobasin_area is not required. For your own analyses, you can add basinareas to your data set using theadd_basin_area() function.

Handling Missing Dates

With the use of theignore_missing argument in most functions, you candecide how to handle dates with missing flow values in calculations.When you setignore_missing = TRUE a statistic will be calculated fora given year, all years, or month regardless of if there are missingflow values. Whenignore_missing = FALSE the returned value for theperiod will beNA if there are missing values. To allow some missingdates and still calculate statistics, some functions also including theallowed_missing argument where you provide a percentage (0 to 100) ofmissing days per time period.

Some functions have an argument calledcomplete_years which can beused, when set toTRUE, to filter out years that have partial datasets (for seasonal or other reasons) and only years with full data areused to calculate statistics.

Examples

Summary statistics example: long-term statistics

To determine the long-term summary statistics of daily data for eachmonth (mean, median, maximum, minimum, and some percentiles) you can usethecalc_longterm_daily_stats() function. If the ‘Mission Creek nearEast Kelowna’ hydrometric station is of interest you can list thestation number in thestation_number argument to obtain the data (iftidyhydat and HYDAT are installed). Statistics over several months canalso be calculated, if of interest. See the summer statistics (from Julyto September) in this example.

calc_longterm_daily_stats(station_number="08NM116",start_year=1981,end_year=2010,custom_months=7:9,custom_months_label="Summer")#> # A tibble: 14 × 8#>    STATION_NUMBER Month      Mean Median Maximum Minimum   P10   P90#>    <chr>          <fct>     <dbl>  <dbl>   <dbl>   <dbl> <dbl> <dbl>#>  1 08NM116        Jan        1.22  1        9.5    0.160 0.540  1.85#>  2 08NM116        Feb        1.16  0.970    4.41   0.140 0.474  1.99#>  3 08NM116        Mar        1.85  1.40     9.86   0.380 0.705  3.80#>  4 08NM116        Apr        8.32  6.26    37.9    0.505 1.63  17.5#>  5 08NM116        May       23.6  20.8     74.4    3.83  9.33  41.2#>  6 08NM116        Jun       21.5  19.5     84.5    0.450 6.10  38.9#>  7 08NM116        Jul        6.48  3.90    54.5    0.332 1.02  15#>  8 08NM116        Aug        2.13  1.57    13.3    0.427 0.775  4.29#>  9 08NM116        Sep        2.19  1.58    14.6    0.364 0.735  4.35#> 10 08NM116        Oct        2.10  1.60    15.2    0.267 0.794  3.98#> 11 08NM116        Nov        2.04  1.73    11.7    0.260 0.560  3.90#> 12 08NM116        Dec        1.30  1.05     7.30   0.342 0.5    2.33#> 13 08NM116        Long-term  6.17  1.89    84.5    0.140 0.680 19.3#> 14 08NM116        Summer     3.61  1.98    54.5    0.332 0.799  7.64

Plotting example: daily summary statistics

To visualize the daily streamflow patterns on an annual basis, theplot_daily_stats() function will plot out various summary statisticsfor each day of the year. Data can also be filtered for certain years ofinterest (a 1981-2010 normals period for this example) using thestart_year andend_year arguments. We can also compare individualyears against the statistics usingadd_year argument like below.

plot_daily_stats(station_number="08NM116",start_year=1981,end_year=2010,log_discharge=TRUE,add_year=1991)#> $Daily_Statistics

Plotting example: flow duration curves

Flow duration curves can be produced using theplot_flow_duration()function.

plot_flow_duration(station_number="08NM116",start_year=1981,end_year=2010)#> $Flow_Duration

Analysis example: low-flow frequency analysis

This package also provides a function,compute_annual_frequencies(),to complete a volume frequency analysis by fitting annual minimums ormaximums to Log-Pearson Type III or Weibull probability distributions.See the volume frequency analyses documentation for more information.For this example, the 7-day low-flow quantiles are calculated for theMission Creek hydrometric station using the Log-Pearson Type IIIdistribution and method of moments fitting method (both default). Withthis, several low-flow indicators can be determined (i.e. 7Q5, 7Q10).

freq_results<- compute_annual_frequencies(station_number="08NM116",start_year=1981,end_year=2010,roll_days=7,fit_distr="PIII",fit_distr_method="MOM")freq_results$Freq_Fitted_Quantiles#> # A tibble: 11 × 4#>    Distribution Probability `Return Period` `7-Day`#>    <chr>              <dbl>           <dbl>   <dbl>#>  1 PIII               0.01           100      0.193#>  2 PIII               0.05            20      0.277#>  3 PIII               0.1             10      0.332#>  4 PIII               0.2              5      0.408#>  5 PIII               0.5              2      0.588#>  6 PIII               0.8              1.25   0.812#>  7 PIII               0.9              1.11   0.946#>  8 PIII               0.95             1.05   1.07#>  9 PIII               0.975            1.03   1.17#> 10 PIII               0.98             1.02   1.21#> 11 PIII               0.99             1.01   1.31

The probability of observed extreme events can also be plotted (usingselected plotting position) along with the computed quantiles curve forcomparison.

freq_results<- compute_annual_frequencies(station_number="08NM116",start_year=1981,end_year=2010,roll_days= c(1,3,7,30))freq_results$Freq_Plot

Project Status

This package is set for delivery. This package is maintained by theWater Management Branch of the British Columbia Ministry of Water, Landand Resource Stewardship.

Getting Help or Reporting an Issue

To report bugs/issues/feature requests, please file anissue.

How to Contribute

If you would like to contribute to the package, please see ourCONTRIBUTING guidelines.

Please note that this project is released with aContributor Code ofConduct. By participating in this project you agreeto abide by its terms.

License

    Copyright 2023 Province of British Columbia    Licensed under the Apache License, Version 2.0 (the "License");    you may not use this file except in compliance with the License.    You may obtain a copy of the License at        http://www.apache.org/licenses/LICENSE-2.0    Unless required by applicable law or agreed to in writing, software    distributed under the License is distributed on an "AS IS" BASIS,    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.    See the License for the specific language governing permissions and    limitations under the License.