Movatterモバイル変換

Title:

Processing Agro-Environmental Data

Version:

0.2.0

Description:

A set of tools for processing and analyzing data developed in the context of the "Who Has Eaten the Planet" (WHEP) project, funded by the European Research Council (ERC). For more details on multi-regional input–output model "Food and Agriculture Biomass Input–Output" (FABIO) see Bruckner et al. (2019) <doi:10.1021/acs.est.9b03554>.

License:

MIT + file LICENSE

Imports:

cli, dplyr, fs, FAOSTAT, httr, mipfp, nanoparquet, pins,purrr, readr, rlang, stringr, tidyr, withr, yaml, zoo

Encoding:

UTF-8

RoxygenNote:

7.3.2

Suggests:

ggplot2, googlesheets4, here, knitr, pointblank, rmarkdown,testthat (≥ 3.0.0), tibble

Config/testthat/edition:

VignetteBuilder:

knitr

URL:

https://eduaguilera.github.io/whep/,https://github.com/eduaguilera/whep

BugReports:

https://github.com/eduaguilera/whep/issues

Depends:

R (≥ 4.2.0)

LazyData:

true

NeedsCompilation:

Packaged:

2025-10-15 13:57:15 UTC; catalin

Author:

Catalin Covaci

[aut, cre], Eduardo Aguilera

[aut, cph], João Serra

[ctb], European Research Council [fnd]

Maintainer:

Catalin Covaci <catalin.covaci@csic.es>

Repository:

CRAN

Date/Publication:

2025-10-15 15:20:02 UTC

whep: Processing Agro-Environmental Data

Description

logo

Author(s)

Maintainer: Catalin Covacicatalin.covaci@csic.es (ORCID)

Authors:

Eduardo Aguileraeduardo.aguilera@csic.es (ORCID) [copyright holder]

Other contributors:

João Serrajserra@agro.au.dk (ORCID) [contributor]
European Research Council [funder]

Get area codes from area names

Description

Add a new column to an existing tibble with the corresponding codefor each name. The codes are assumed to be from those defined bytheFABIO model.

Usage

add_area_code(table, name_column = "area_name", code_column = "area_code")

Arguments

table

The table that will be modified with a new column.

name_column

The name of the column intable containing the names.

code_column

The name of the output column containing the codes.

Value

A tibble with all the contents oftable and an extra columnnamedcode_column, which contains the codes. If there is no code match,anNA is included.

Examples

table <- tibble::tibble(  area_name = c("Armenia", "Afghanistan", "Dummy Country", "Albania"))add_area_code(table)table |>  dplyr::rename(my_area_name = area_name) |>  add_area_code(name_column = "my_area_name")add_area_code(table, code_column = "my_custom_code")

Get area names from area codes

Description

Add a new column to an existing tibble with the corresponding namefor each code. The codes are assumed to be from those defined bytheFABIO model, which them themselves come fromFAOSTAT internalcodes. Equivalences with ISO 3166-1 numeric can be found in theArea Codes CSV from the zip file that can be downloaded fromFAOSTAT. TODO: Think aboutthis, would be nice to use ISO3 codes but won't be enough for our periods.

Usage

add_area_name(table, code_column = "area_code", name_column = "area_name")

Arguments

table

The table that will be modified with a new column.

code_column

The name of the column intable containing the codes.

name_column

The name of the output column containing the names.

Value

A tibble with all the contents oftable and an extra columnnamedname_column, which contains the names. If there is no name match,anNA is included.

Examples

table <- tibble::tibble(area_code = c(1, 2, 4444, 3))add_area_name(table)table |>  dplyr::rename(my_area_code = area_code) |>  add_area_name(code_column = "my_area_code")add_area_name(table, name_column = "my_custom_name")

Get commodity balance sheet item codes from item names

Description

Add a new column to an existing tibble with the corresponding codefor each commodity balance sheet item name. The codes are assumed to befrom those defined by FAOSTAT.

Usage

add_item_cbs_code(  table,  name_column = "item_cbs_name",  code_column = "item_cbs_code")

Arguments

table

The table that will be modified with a new column.

name_column

The name of the column intable containing the names.

code_column

The name of the output column containing the codes.

Value

A tibble with all the contents oftable and an extra columnnamedcode_column, which contains the codes. If there is no code match,anNA is included.

Examples

table <- tibble::tibble(  item_cbs_name = c("Cottonseed", "Eggs", "Dummy Item"))add_item_cbs_code(table)table |>  dplyr::rename(my_item_cbs_name = item_cbs_name) |>  add_item_cbs_code(name_column = "my_item_cbs_name")add_item_cbs_code(table, code_column = "my_custom_code")

Get commodity balance sheet item names from item codes

Description

Add a new column to an existing tibble with the corresponding namefor each commodity balance sheet item code. The codes are assumed to befrom those defined by FAOSTAT.

Usage

add_item_cbs_name(  table,  code_column = "item_cbs_code",  name_column = "item_cbs_name")

Arguments

table

The table that will be modified with a new column.

code_column

The name of the column intable containing the codes.

name_column

The name of the output column containing the names.

Value

A tibble with all the contents oftable and an extra columnnamedname_column, which contains the names. If there is no name match,anNA is included.

Examples

table <- tibble::tibble(item_cbs_code = c(2559, 2744, 9876))add_item_cbs_name(table)table |>  dplyr::rename(my_item_cbs_code = item_cbs_code) |>  add_item_cbs_name(code_column = "my_item_cbs_code")add_item_cbs_name(table, name_column = "my_custom_name")

Get production item codes from item names

Description

Add a new column to an existing tibble with the corresponding codefor each production item name. The codes are assumed to be from thosedefined by FAOSTAT.

Usage

add_item_prod_code(  table,  name_column = "item_prod_name",  code_column = "item_prod_code")

Arguments

table

The table that will be modified with a new column.

name_column

The name of the column intable containing the names.

code_column

The name of the output column containing the codes.

Value

A tibble with all the contents oftable and an extra columnnamedcode_column, which contains the codes. If there is no code match,anNA is included.

Examples

table <- tibble::tibble(  item_prod_name = c("Rice", "Cabbages", "Dummy Item"))add_item_prod_code(table)table |>  dplyr::rename(my_item_prod_name = item_prod_name) |>  add_item_prod_code(name_column = "my_item_prod_name")add_item_prod_code(table, code_column = "my_custom_code")

Get production item names from item codes

Description

Add a new column to an existing tibble with the corresponding namefor each production item code. The codes are assumed to be from thosedefined by FAOSTAT.

Usage

add_item_prod_name(  table,  code_column = "item_prod_code",  name_column = "item_prod_name")

Arguments

table

The table that will be modified with a new column.

code_column

The name of the column intable containing the codes.

name_column

The name of the output column containing the names.

Value

A tibble with all the contents oftable and an extra columnnamedname_column, which contains the names. If there is no name match,anNA is included.

Examples

table <- tibble::tibble(item_prod_code = c(27, 358, 12345))add_item_prod_name(table)table |>  dplyr::rename(my_item_prod_code = item_prod_code) |>  add_item_prod_name(code_column = "my_item_prod_code")add_item_prod_name(table, name_column = "my_custom_name")

Supply and use tables

Description

Create a table with processes, their inputs (use) and theiroutputs (supply).

Usage

build_supply_use(  cbs_version = NULL,  feed_intake_version = NULL,  primary_prod_version = NULL,  primary_residues_version = NULL,  processing_coefs_version = NULL)

Arguments

cbs_version

File version passed toget_wide_cbs() call.

feed_intake_version

File version passed toget_feed_intake() call.

primary_prod_version

File version passed toget_primary_production() call.

primary_residues_version

File version passed toget_primary_residues() call.

processing_coefs_version

File version passed toget_processing_coefs() call.

Value

A tibble with the supply and use data for processes.It contains the following columns:

year: The year in which the recorded event occurred.
area_code: The code of the country where the data is from. For codedetails see e.g.add_area_name().
proc_group: The type of process taking place. It can be one of:
- crop_production: Production of crops and their residues, e.g. riceproduction, coconut production, etc.
- husbandry: Animal husbandry, e.g. dairy cattle husbandry, non-dairycattle husbandry, layers chickens farming, etc.
- processing: Derived subproducts obtained from processing other items.The items used as inputs are those that have a non-zero processing use inthe commodity balance sheet. Seeget_wide_cbs() for more details.In each process there is a single input. In some processes like olive oilextraction or soyabean oil extraction this might make sense. Others likealcohol production need multiple inputs (e.g. multiple crops work), soin this data there would not be a process like alcohol production butrather avirtual process like 'Wheat and products processing', givingall its possible outputs. This is a constraint because of how the data wasobtained and might be improved in the future. Seeget_processing_coefs() for more details.
proc_cbs_code: The code of the main item in the process taking place.Together withproc_group, these two columns uniquely represent aprocess. The main item is predictable depending on the value ofproc_group:
- crop_production: The code is from the item for which seed usage(if any) is reported in the commodity balance sheet (seeget_wide_cbs() for more). For example, the rice code for a riceproduction process or the cottonseed code for the cotton production one.
- husbandry: The code of the farmed animal, e.g. bees for beekeeping,non-dairy cattle for non-dairy cattle husbandry, etc.
- processing: The code of the item that is used as input, i.e., the onethat is processed to get other derived products. This uniquely defines aprocess within the group because of the nature of the data that was used,which you can see inget_processing_coefs().
For code details see e.g.add_item_cbs_name().
item_cbs_code: The code of the item produced or used in the process.Note that this might be the same value asproc_cbs_code, e.g., in riceproduction process for the row defining the amount of rice produced orthe amount of rice seed as input, but it might also have a differentvalue, e.g. for the row defining the amount of straw residue from riceproduction. For code details see e.g.add_item_cbs_name().
type: Can have two values:
- use: The given item is an input of the process.
- supply: The given item is an output of the process.
value: Quantity in tonnes.

Examples

# Note: These are smaller samples to show outputs, not the real data.# For all data, call the function with default versions (i.e. no arguments).build_supply_use(  cbs_version = "example",  feed_intake_version = "example",  primary_prod_version = "example",  primary_residues_version = "example",  processing_coefs_version = "example")

Trade data sources

Description

Create a new dataframe where each row has a year range into one where eachrow is a single year, effectively 'expanding' the whole year range.

Usage

expand_trade_sources(trade_sources)

Arguments

trade_sources

A tibble dataframe where each row contains theyear range.

Value

A tibble dataframe where each row corresponds to a single year fora given source.

Examples

trade_sources <- tibble::tibble(  Name = c("a", "b", "c"),  Trade = c("t1", "t2", "t3"),  Info_Format = c("year", "partial_series", "year"),  Timeline_Start = c(1, 1, 2),  Timeline_End = c(3, 4, 5),  Timeline_Freq = c(1, 1, 2),  `Imp/Exp` = "Imp",  SACO_link = NA,)expand_trade_sources(trade_sources)

Bilateral trade data

Description

Reports trade between pairs of countries in given years.

Usage

get_bilateral_trade(trade_version = NULL, cbs_version = NULL)

Arguments

trade_version

File version used for bilateral trade input.Seewhep_inputs for version details.

cbs_version

File version passed toget_wide_cbs() call.

Value

A tibble with the reported trade between countries. For efficientmemory usage, the tibble is not exactly in tidy format.It contains the following columns:

year: The year in which the recorded event occurred.
item_cbs_code: FAOSTAT internal code for the item that is being traded.For code details see e.g.add_item_cbs_name().
bilateral_trade: Square matrix ofNxN dimensions whereN is thetotal number of countries being considered. The matrix row and columnnames are exactly equal and they represent country codes.
- Row name: The code of the country where the data is from. For codedetails see e.g.add_area_name().
- Column name: FAOSTAT internal code for the country that is importing theitem. See row name explanation above.
Ifm is the matrix, the value atm["A", "B"] is the trade in tonnesfrom country"A" to country"B", for the corresponding year and item.The matrix can be consideredbalanced. This means:
- The sum of all values from row"A", where"A" is any country,should match the total exports from country"A" reported in thecommodity balance sheet (which is considered more accurate for totals).
- The sum of all values from column"A", where"A" is any country,should match the total imports into country"A" reported in thecommodity balance sheet (which is considered more accurate for totals).
The sums may not be exactly the expected values because of precisionissues and/or the iterative proportional fitting algorithm not convergingfast enough, but should be relatively very close to the desired totals.

The step by step approach to obtain this data tries to follow the FABIOmodel and is explained below. All the steps are performed separately foreach group of year and item.

From the FAOSTAT reported bilateral trade, there are sometimes two valuesfor one trade flow: the exported amount claimed by the reporter countryand the import amount claimed by the partner country. Here, the exportdata was preferred, i.e., if country"A" says it exportedX tonnes tocountry"B" but country"B" claims they gotY tonnes from country"A", we trust the export dataX. This choice is only needed if thereexists a reported amount from both sides. Otherwise, the single existingreport is chosen.
Complete the country data, that is, add any missing combinations ofcountry trade with NAs, which will be estimated later. In the matrixform, this doesn't increase the memory usage since we had to build amatrix anyway (for the balancing algorithm), and theempty parts alsotake up memory. This is also done for total imports/exports from thecommodity balance sheet, but these are directly filled with 0s instead.
The total imports and exports from the commodity balance sheet arebalanced by downscaling the largest of the two to match the lowest.This is done in the following way:
- Iftotal_imports > total_exports: Setimport astotal_exports * import / total_import.
- Iftotal_exports > total_exports: Setexport astotal_exports * export / total_export.
The missing data in the matrix must be estimated. It's done like this:
- For each pair of exporteri and importerj, we estimate a bilateraltradem[i, j] using the export shares ofi and import shares ofjfrom the commodity balance sheet:
  - est_1 <- exports[i] * imports[j] / sum(imports), i.e., totalexports of countryi spread among other countries' import shares.
  - est_2 <- imports[j] * exports[i] / sum(exports), i.e. totalimports of countryj spread among other countries' export shares.
  - est <- (est_1 + est_2) / 2, i.e., the mean of both estimates.
  In the above computations, exports and imports are the original valuesbefore they were balanced.
- The estimates for data that already existed (i.e. non-NA) are discarded.For the ones left, for each row (i.e. exporter country), we get thedifference between its balanced total export and the sum of originalnon-estimated data. The result is thegap we can actually fill withestimates, so as to not get past the reported total export. If the sumof non-discarded estimates is larger, it must be downscaled and spreadby computinggap * non_discarded_estimate / sum(non_discarded_estimates).
- The estimates are divided by atrust factor, in the sense that wedon't rely on the whole value, thinking that a non-present value mightactually be because that specific trade was 0, so we don't overestimatetoo much. The chosen factor is 10%, so only 10% of the estimate's valueis actually used to fill the NA from the original bilateral tradematrix.
The matrix is balanced, as mentioned before, using theiterative proportional fitting algorithm. The target sums for rows and columns are respectively the balancedexports and imports computed from the commodity balance sheet. Thealgorithm is performed directly using themipfp R package.

Examples

# Note: These are smaller samples to show outputs, not the real data.# For all data, call the function with default versions (i.e. no arguments).get_bilateral_trade(  trade_version = "example",  cbs_version = "example")

Scrapes activity_data from FAOSTAT and slightly post-processes it

Description

Important: Dynamically allows for the introduction of subsets as"...".

Note: overhead by individually scraping FAOSTAT code QCL for crop data;it's fine.

Usage

get_faostat_data(activity_data, ...)

Arguments

activity_data

activity data required from FAOSTAT; needsto be one ofc('livestock','crop_area','crop_yield','crop_production').

...

can be whichever column name fromget_faostat_bulk,particularlyyear,area orISO3_CODE.

Value

data.frame of FAOSTAT foractivity_data; default is forall years and countries.

Examples

get_faostat_data("livestock", year = 2010, area = "Portugal")

Livestock feed intake

Description

Get amount of items used for feeding livestock.

Usage

get_feed_intake(version = NULL)

Arguments

version

File version to use as input. Seewhep_inputs for details.

Value

A tibble with the feed intake data.It contains the following columns:

year: The year in which the recorded event occurred.
area_code: The code of the country where the data is from. For codedetails see e.g.add_area_name().
live_anim_code: Commodity balance sheet code for the type of livestockthat is fed. For code details see e.g.add_item_cbs_name().
item_cbs_code: The code of the item that is used for feeding the animal.For code details see e.g.add_item_cbs_name().
feed_type: The type of item that is being fed. It can be one of:
- animals: Livestock product, e.g.⁠Bovine Meat⁠,⁠Butter, Ghee⁠, etc.
- crops: Crop product, e.g.⁠Vegetables, Other⁠,Oats, etc.
- residues: Crop residue, e.g.Straw,⁠Fodder legumes⁠, etc.
- grass: Grass, e.g.Grassland,⁠Temporary grassland⁠, etc.
- scavenging: Other residues. SingleScavenging item.
supply: The computed amount in tonnes of this item that should be fed tothis animal, when sharing the total itemfeed use from the CommodityBalance Sheet among all livestock.
intake: The actual amount in tonnes that the animal needs, which can beless than the theoretical used amount fromsupply.
intake_dry_matter: The amount specified byintake but only consideringdry matter, so it should be less thanintake.
loss: The amount that is not used for feed. This issupply - intake.
loss_share: The percent that is lost. This isloss / supply.

Examples

# Note: These are smaller samples to show outputs, not the real data.# For all data, call the function with default version (i.e. no arguments).get_feed_intake(version = "example")

Primary items production

Description

Get amount of crops, livestock and livestock products.

Usage

get_primary_production(version = NULL)

Arguments

version

File version to use as input. Seewhep_inputs for details.

Value

A tibble with the item production data.It contains the following columns:

year: The year in which the recorded event occurred.
area_code: The code of the country where the data is from. For codedetails see e.g.add_area_name().
item_prod_code: FAOSTAT internal code for each produced item.
item_cbs_code: FAOSTAT internal code for each commodity balance sheetitem. The commodity balance sheet contains an aggregated version ofproduction items. This field is the code for the correspondingaggregated item.
live_anim_code: Commodity balance sheet code for the type of livestockthat produces the livestock product. It can be:
- NA: The entry is not a livestock product.
- Non-NA: The code for the livestock type. The name can also beretrieved by usingadd_item_cbs_name().
unit: Measurement unit for the data. Here, keep in mind three groups ofitems: crops (e.g.⁠Apples and products⁠,Beans...), livestock (e.g.⁠Cattle, dairy⁠,Goats...) and livestock products (e.g.⁠Poultry Meat⁠,⁠Offals, Edible⁠...). Then the unit can be one of:
- tonnes: Available for crops and livestock products.
- ha: Hectares, available for crops.
- t_ha: Tonnes per hectare, available for crops.
- heads: Number of animals, available for livestock.
- LU: Standard Livestock Unit measure, available for livestock.
- t_head: tonnes per head, available for livestock products.
- t_LU: tonnes per Livestock Unit, available for livestock products.
value: The amount of item produced, measured inunit.

Examples

# Note: These are smaller samples to show outputs, not the real data.# For all data, call the function with default version (i.e. no arguments).get_primary_production(version = "example")

Crop residue items

Description

Get type and amount of residue produced for each crop production item.

Usage

get_primary_residues(version = NULL)

Arguments

version

File version to use as input. Seewhep_inputs for details.

Value

A tibble with the crop residue data.It contains the following columns:

year: The year in which the recorded event occurred.
area_code: The code of the country where the data is from. For codedetails see e.g.add_area_name().
item_cbs_code_crop: FAOSTAT internal code for each commodity balancesheet item. This is the crop that is generating the residue.
item_cbs_code_residue: FAOSTAT internal code for each commodity balancesheet item. This is the obtained residue. In the commodity balance sheet,this can be three different items right now:
- 2105:Straw
- 2106:⁠Other crop residues⁠
- 2107:Firewood
These are actually not FAOSTAT defined items, but custom defined by us.When necessary, FAOSTAT codes are extended for our needs.
value: The amount of residue produced, measured in tonnes.

Examples

# Note: These are smaller samples to show outputs, not the real data.# For all data, call the function with default version (i.e. no arguments).get_primary_residues(version = "example")

Processed products share factors

Description

Reports quantities of commodity balance sheet items used forprocessingand quantities of their corresponding processed output items.

Usage

get_processing_coefs(version = NULL)

Arguments

version

File version to use as input. Seewhep_inputs for details.

Value

A tibble with the quantities for each processed product.It contains the following columns:

year: The year in which the recorded event occurred.
area_code: The code of the country where the data is from. For codedetails see e.g.add_area_name().
item_cbs_code_to_process: FAOSTAT internal code for each one of theitems that are being processed and will give other subproduct items.For code details see e.g.add_item_cbs_name().
value_to_process: tonnes of this item that are being processed. Itmatches the amount found in theprocessing column from the dataobtained byget_wide_cbs().
item_cbs_code_processed: FAOSTAT internal code for each one of thesubproduct items that are obtained when processing. For code detailssee e.g.add_item_cbs_name().
initial_conversion_factor: estimate for the number of tonnes ofitem_cbs_code_processed obtained for each tonne ofitem_cbs_code_to_process. It will be used to compute thefinal_conversion_factor, which leaves everything balanced.TODO: explain how it's computed.
initial_value_processed: first estimate for the number of tonnes ofitem_cbs_code_processed obtained fromitem_cbs_code_to_process. Itis computed asvalue_to_process * initial_conversion_factor.
conversion_factor_scaling: computed scaling needed to adaptinitial_conversion_factor so as to get a final balanced total ofsubproduct quantities. TODO: explain how it's computed.
final_conversion_factor: final used estimate for the number of tonnes ofitem_cbs_code_processed obtained for each tonne ofitem_cbs_code_to_process. It is computed asinitial_conversion_factor * conversion_factor_scaling.
final_value_processed: final estimate for the number of tonnes ofitem_cbs_code_processed obtained fromitem_cbs_code_to_process. Itis computed asinitial_value_processed * final_conversion_factor.

For the final data obtained, the quantitiesfinal_value_processed arebalanced in the following sense: the total sum offinal_value_processedfor each unique tuple of⁠(year, area_code, item_cbs_code_processed)⁠should be exactly the quantity reported for that year, country anditem_cbs_code_processed item in theproduction column obtained fromget_wide_cbs(). This is because they are not primary products, so theamount from 'production' is actually the amount of subproduct obtained.TODO: Fix few data where this doesn't hold.

Examples

# Note: These are smaller samples to show outputs, not the real data.# For all data, call the function with default version (i.e. no arguments).get_processing_coefs(version = "example")

Commodity balance sheet data

Description

States supply and use parts for each commodity balance sheet (CBS) item.

Usage

get_wide_cbs(version = NULL)

Arguments

version

File version to use as input. Seewhep_inputs for details.

Value

A tibble with the commodity balance sheet data in wide format.It contains the following columns:

year: The year in which the recorded event occurred.
area_code: The code of the country where the data is from. For codedetails see e.g.add_area_name().
item_cbs_code: FAOSTAT internal code for each item. For code detailssee e.g.add_item_cbs_name().

The other columns are quantities (measured in tonnes), where total supplyand total use should be balanced.

For supply:

production: Produced locally.
import: Obtained from importing from other countries.
stock_retrieval: Available as net stock from previous years. For ease,only one stock column is included here as supply. If the value ispositive, there is a stock quantity available as supply. Otherwise, itmeans a larger quantity was stored for later years and cannot be used assupply, having to deduce it from total supply. Since in this case it isnegative, the total supply is still computed as the sum of all of these.

For use:

food: Food for humans.
feed: Food for animals.
export: Released as export for other countries.
seed: Intended for new production.
processing: The product will be used to obtain other subproducts.
other_uses: Any other use not included in the above ones.

There is an additional columndomestic_supply which is computed as thetotal use excludingexport.

Examples

# Note: These are smaller samples to show outputs, not the real data.# For all data, call the function with default version (i.e. no arguments).get_wide_cbs(version = "example")

Commodity balance sheet items

Description

Defines name/code correspondences for commodity balance sheet (CBS) items.

Usage

items_cbs

Format

A tibble where each row corresponds to one CBS item.It contains the following columns:

item_cbs_code: A numeric code used to refer to the CBS item.
item_cbs_name: A natural language name for the item.
item_type: An ad-hoc grouping of items. This is a work in progressevolving depending on our needs, so for now it only has two possiblevalues:
- livestock: The CBS item represents a live animal.
- other: Not any of the previous groups.

Source

Inspired byFAOSTAT data.

Primary production items

Description

Defines name/code correspondences for production items.

Usage

items_prod

Format

A tibble where each row corresponds to one production item.It contains the following columns:

item_prod_code: A numeric code used to refer to the item.
item_prod_name: A natural language name for the item.
item_type: An ad-hoc grouping of items. This is a work in progressevolving depending on our needs, so for now it only has two possiblevalues:
- crop_product: The CBS item represents a crop product.
- other: Not any of the previous groups.

Source

Inspired byFAOSTAT data.

Fill gaps by linear interpolation, or carrying forward or backward.

Description

Fills gaps (NA values) in a time-dependent variable bylinear interpolation between two points, or carrying forward or backwardsthe last or initial values, respectively. It also creates a new variableindicating the source of the filled values.

Usage

linear_fill(  df,  var,  time_index,  interpolate = TRUE,  fill_forward = TRUE,  fill_backward = TRUE,  .by = NULL)

Arguments

df

A tibble data frame containing one observation per row.

var

The variable of df containing gaps to be filled.

time_index

The time index variable (usually year).

interpolate

Logical. IfTRUE (default),performs linear interpolation.

fill_forward

Logical. IfTRUE (default),carries last value forward.

fill_backward

Logical. IfTRUE (default),carries first value backward.

.by

A character vector with the grouping variables (optional).

Value

A tibble data frame (ungrouped) where gaps in var have been filled,and a new "source" variable has been created indicating if the value isoriginal or, in case it has been estimated, the gapfilling method that hasbeen used.

Examples

sample_tibble <- tibble::tibble(  category = c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b"),  year = c(    "2015", "2016", "2017", "2018", "2019", "2020",    "2015", "2016", "2017", "2018", "2019", "2020"  ),  value = c(NA, 3, NA, NA, 0, NA, 1, NA, NA, NA, 5, NA),)linear_fill(sample_tibble, value, year, .by = c("category"))linear_fill(  sample_tibble,  value,  year,  interpolate = FALSE,  .by = c("category"),)

Polities

Description

Defines name/code correspondences for polities (political entities).

Usage

polities

Format

A tibble where each row corresponds to one polity.It contains the following columns:TODO: On polities Pull Request, coming soon

Fill gaps using a proxy variable

Description

Fills gaps in a variable based on changes in a proxy variable, using ratiosbetween the filled variable and the proxy variable, and labels outputaccordingly.

Usage

proxy_fill(df, var, proxy_var, time_index, ...)

Arguments

df

A tibble data frame containing one observation per row.

var

The variable of df containing gaps to be filled.

proxy_var

The variable to be used as proxy.

time_index

The time index variable (usually year).

...

Optionally, additional arguments that will be passed tolinear_fill() with the ratios. See that function to know the acceptedarguments.

Value

A tibble dataframe (ungrouped) where gaps in var have been filled,a new proxy_ratio variable has been created,and a new "source" variable has been created indicating if the value isoriginal or, in case it has been estimated, the gapfilling method that hasbeen used.

Examples

sample_tibble <- tibble::tibble(  category = c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b"),  year = c(    "2015", "2016", "2017", "2018", "2019", "2020",    "2015", "2016", "2017", "2018", "2019", "2020"  ),  value = c(NA, 3, NA, NA, 0, NA, 1, NA, NA, NA, 5, NA),  proxy_variable = c(1, 2, 2, 2, 2, 2, 1, 2, 3, 4, 5, 6))proxy_fill(sample_tibble, value, proxy_variable, year, .by = c("category"))

Fill gaps summing the previous value of a variable to the value ofanother variable.

Description

Fills gaps in a variable with the sum of its previous value and the valueof another variable. When a gap has multiple observations, the values areaccumulated along the series. When there is a gap at the start of theseries, it can either remain unfilled or assume an invisible 0 value beforethe first observation and start filling with cumulative sum.

Usage

sum_fill(df, var, change_var, start_with_zero = TRUE, .by = NULL)

Arguments

df

A tibble data frame containing one observation per row.

var

The variable of df containing gaps to be filled.

change_var

The variable whose values will be used to fill the gaps.

start_with_zero

Logical. If TRUE, assumes an invisible 0 value beforethe first observation and fills with cumulative sum starting from the firstchange_var value. If FALSE (default), starting NA values remain unfilled.

.by

A character vector with the grouping variables (optional).

Value

A tibble dataframe (ungrouped) where gaps in var have been filled,and a new "source" variable has been created indicating if the value isoriginal or, in case it has been estimated, the gapfilling method that hasbeen used.

Examples

sample_tibble <- tibble::tibble(  category = c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b"),  year = c(    "2015", "2016", "2017", "2018", "2019", "2020",    "2015", "2016", "2017", "2018", "2019", "2020"  ),  value = c(NA, 3, NA, NA, 0, NA, 1, NA, NA, NA, 5, NA),  change_variable = c(1, 2, 3, 4, 1, 1, 0, 0, 0, 0, 0, 1))sum_fill(  sample_tibble,  value,  change_variable,  start_with_zero = FALSE,  .by = c("category"))sum_fill(  sample_tibble,  value,  change_variable,  start_with_zero = TRUE,  .by = c("category"))

External inputs

Description

The information needed for accessing external datasets used as inputsin our modeling.

Usage

whep_inputs

Format

A tibble where each row corresponds to one external input dataset.It contains the following columns:

alias: An internal name used to refer to this dataset, which is theexpected name when trying to get the dataset withwhep_read_file().
board_url: The public static URL where the data is found, followingthe concept of aboard from thepins package, which is what weuse for storing these input datasets.
version: The specific version of the dataset, as defined by thepinspackage. The version is a string similar to"20250714T123343Z-114b5".This version is the one used by default if noversion is specified whencallingwhep_read_file(). If you want to use a different one, you canfind the available versions of a file by usingwhep_list_file_versions().

Source

Created by the package authors.

Input file versions

Description

Lists all existing versions of an input file fromwhep_inputs.

Usage

whep_list_file_versions(file_alias)

Arguments

file_alias

Internal name of the requested file. You can find thepossible values in thewhep_inputs dataset.

Value

A tibble where each row is a version. For details about its format,seepins::pin_versions().

Examples

whep_list_file_versions("read_example")

Download, cache and read files

Description

Used to fetch input files that are needed for the package's functionsand that were built in external sources and are too large to includedirectly. This is a public function for transparency purposes, so thatusers can inspect the original inputs of this package that were notdirectly processed here.

If the requested file doesn't exist locally, it is downloaded from a publiclink and cached before reading it. This is all implemented using thepins package. It supports multiplefile formats and file versioning.

Usage

whep_read_file(file_alias, type = "parquet", version = NULL)

Arguments

file_alias

Internal name of the requested file. You can find thepossible values in thealias column of thewhep_inputs dataset.

type

The extension of the file that must be read. Possible values:

parquet: This is the default value for code efficiency reasons.
csv: Mainly available for those who want a more human-readable option.If theparquet version is available, this is useless because thisfunction already returns the dataset in anR object, so the origin isirrelevant, andparquet is read faster.

Saving each file in both formats is for transparency and accessibilitypurposes, e.g., having to share the data with non-programmers who caneasily import a CSV into a spreadsheet. You will most likely never haveto set this option manually unless for some reason a file could not besupplied in e.g.parquet format but was in another one.

version

The version of the file that must be read. Possible values:

NULL: This is the default value. A frozen version is chosen to makethe code reproducible. Each release will have its own frozen versions.The version is the string that can be found inwhep_inputs in theversion column.
"latest": This overrides the frozen version and instead fetches thelatest one that is available. This might or might not match the frozenversion.
Other: A specific version can also be used. For more details read theversion column information fromwhep_inputs.

Value

A tibble with the dataset. Some information about each dataset canbe found in the code where it's used as input for further processing.

Examples

whep_read_file("read_example")whep_read_file("read_example", type = "parquet", version = "latest")whep_read_file(  "read_example",  type = "csv",  version = "20250721T152646Z-ce61b")

Movatterモバイル変換

whep: Processing Agro-Environmental Data

Description

Author(s)

See Also

Get area codes from area names

Description

Usage

Arguments

Value

Examples

Get area names from area codes

Description

Usage

Arguments

Value

Examples

Get commodity balance sheet item codes from item names

Description

Usage

Arguments

Value

Examples

Get commodity balance sheet item names from item codes

Description

Usage

Arguments

Value

Examples

Get production item codes from item names

Description

Usage

Arguments

Value

Examples

Get production item names from item codes

Description

Usage

Arguments

Value

Examples

Supply and use tables

Description

Usage

Arguments

Value

Examples

Trade data sources

Description

Usage

Arguments

Value

Examples

Bilateral trade data

Description

Usage

Arguments

Value

Examples

Scrapes activity_data from FAOSTAT and slightly post-processes it

Description

Usage

Arguments

Value

Examples

Livestock feed intake

Description

Usage

Arguments

Value

Examples

Primary items production

Description

Usage

Arguments

Value

Examples

Crop residue items

Description

Usage