Movatterモバイル変換


[0]ホーム

URL:


Title:Customisable Ranking of Numerical and Categorical Data
Version:0.2.0
Description:Provides a flexible alternative to the built-in rank() function called smartrank(). Optionally rank categorical variables by frequency (instead of in alphabetical order), and control whether ranking is based on descending/ascending order. smartrank() is suitable for both numerical and categorical data.
License:MIT + file LICENSE
Suggests:covr, dplyr, knitr, rmarkdown, testthat (≥ 3.0.0)
Config/testthat/edition:2
Encoding:UTF-8
RoxygenNote:7.3.3
URL:https://github.com/selkamand/rank,https://selkamand.github.io/rank/
BugReports:https://github.com/selkamand/rank/issues
VignetteBuilder:knitr
NeedsCompilation:no
Packaged:2025-12-01 09:17:11 UTC; selkamand
Author:Sam El-KamandORCID iD [aut, cre, cph]
Maintainer:Sam El-Kamand <sam.elkamand@gmail.com>
Repository:CRAN
Date/Publication:2025-12-01 09:40:02 UTC

Rank a character vector based on supplied priority values

Description

Rank a character vector based on supplied priority values

Usage

rank_by_priority(x, priority_values, ties.method = "average")

Arguments

x

A character vector.

priority_values

A character vector descibing "priority" values. Elements ofx matchingpriority_values will be ranked based on their order of appearance inpriority_values

ties.method

a character string specifying how ties are treated,see ‘Details’; can be abbreviated.

Value

A vector of ranks describingx such thatx[order(ranks)]will movepriority_values to the front of the vector

Examples

x <- c("A", "B", "C", "D", "E")rank_by_priority(x, c("C", "A"))#> "2" "4" "1" "4" "4"rank_by_priority(1:6, c(4, 2, 7))#>  4 2 1 3 5 6

Stratified hierarchical ranking across multiple variables

Description

rank_stratified() computes a single, combined rank for each row of adata frame usingstratified hierarchical ranking.The first variable is ranked globally; each subsequent variable is thenrankedwithin strata defined by all previous variables.

Usage

rank_stratified(  data,  cols = NULL,  sort_by = "frequency",  desc = FALSE,  ties.method = "average",  na.last = TRUE,  freq_tiebreak = "match_desc",  verbose = TRUE)

Arguments

data

A data frame. Each selected columnrepresents one level of the stratified hierarchy, in the order given bycols.

cols

Optional column specification indicating which variables indatato use for ranking, and in what order. Can be:

  • NULL (default): use all columns ofdata in their existing order.

  • A character vector of column names.

  • An integer vector of column positions.

sort_by

Character scalar or vector specifying how to rank eachnon-numeric column. Each element must be either"alphabetical" or"frequency", matching the behaviour ofsmartrank(). If a singlevalue is supplied it is recycled for all columns. For numeric columns,sort_by is ignored and ranking is always based on numeric order.

desc

Logical scalar or vector indicating whether to rank each columnin descending order. If a single value is supplied it is recycled for allcolumns.

ties.method

Passed tobase::rank() when resolving ties at eachlevel; must be one of"average","first","last","random","max", or"min". Seebase::rank() for details.

na.last

Logical, controlling the treatment of missing values,as inbase::rank(). IfTRUE,NAs are given the largest ranks; ifFALSE, the smallest. Unlikebase::rank() orsmartrank(),na.lastcannot be set toNA inrank_stratified(), because dropping rows wouldchange group membership and break stratified ranking.

freq_tiebreak

Character scalar or vector controlling howalphabetical tie-breaking works whensort_by = "frequency" and thecolumn is character/factor/logical. Each element must be one of:

  • "match_desc" (default): alphabetical tie-breaking followsdesc for that column (ascending whendesc = FALSE, descendingwhendesc = TRUE).

  • "asc": ties are always broken by ascending alphabetical order.

  • "desc": ties are always broken by descending alphabetical order.

If a single value is supplied, it is recycled for all columns.

verbose

Logical; ifTRUE, emit messages whensort_by is ignored(e.g. for numeric columns), mirroring the behaviour ofsmartrank().

Details

This is useful when you want a "truly hierarchical" ordering where,for example, rows are first grouped and ordered by the frequency ofgender, and then within eachgender group, ordered by the frequencyofpetwithin that gender, rather than globally.

The result is a single rank vector that can be passed directly tobase::order() to obtain a stratified, multi-levelordering.

Stratified ranking proceeds level by level:

  1. The first selected column is ranked globally, usingsort_by[1](for non-numeric) anddesc[1].

  2. For the second column, ranks are computedseparately within eachdistinct combination of values of all previous columns. Within eachstratum, the second column is ranked usingsort_by[2] /desc[2].

  3. This process continues for each subsequent column: at levelk,ranking is done within strata defined by columns 1, 2, ...,k-1.

This yields a single composite rank per row that reflects a "true"hierarchical (i.e. stratified) ordering: earlier variables define strata, and later variablesare only comparedwithin those strata (for example, by within-stratumfrequency).

Value

A numeric vector of lengthnrow(data), containing stratified ranks.Smaller values indicate "earlier" rows in the stratified hierarchy.

Examples

library(rank)data <- data.frame(  gender = c("male", "male", "male", "male", "female", "female", "male", "female"),  pet    = c("cat",  "cat",  "magpie", "magpie", "giraffe", "cat", "giraffe", "cat"))# Stratified ranking: first by gender frequency, then within each gender# by pet frequency *within that gender*r <- rank_stratified(  data,  cols = c("gender", "pet"),  sort_by = c("frequency", "frequency"),  desc = TRUE)data[order(r), ]

Bring specified values in a vector to the front

Description

Reorders a vector so that any elements matching the values invaluesappear first, in the order they appear invalues. All remaining elementsare returned afterward, preserving their original order.

Usage

reorder_by_priority(x, priority_values)

Arguments

x

A character or numeric vector to reorder.

priority_values

A vector of “priority” values. Elements ofx that matchentries inpriority_values are moved to the front in the order they appear inpriority_values. Values not found inx are ignored.

Value

A reordered vector with priority values first, followed by allremaining elements in their original order.

Examples

reorder_by_priority(c("A", "B", "C", "D", "E"), c("C", "A"))reorder_by_priority(1:6, c(4, 2, 7))

Rank a vector based on either alphabetical or frequency order

Description

This function acts as a drop-in replacement for the baserank() function with the added option to:

  1. Rank categorical factors based on frequency instead of alphabetically

  2. Rank in descending or ascending order

Usage

smartrank(  x,  sort_by = c("alphabetical", "frequency"),  desc = FALSE,  ties.method = "average",  na.last = TRUE,  freq_tiebreak = c("match_desc", "asc", "desc"),  verbose = TRUE)

Arguments

x

A numeric, character, or factor vector

sort_by

Sort ranking either by "alphabetical" or "frequency" . Default is "alphabetical"

desc

A logical indicating whether the ranking should be in descending ( TRUE ) or ascending ( FALSE ) order.When input is numeric, ranking is always based on numeric order.

ties.method

a character string specifying how ties are treated,see ‘Details’; can be abbreviated.

na.last

a logical or character string controlling the treatmentofNAs. IfTRUE, missing values in the data areput last; ifFALSE, they are put first; ifNA, theyare removed; if"keep" they are kept with rankNA.

freq_tiebreak

Controls how alphabetical tie-breaking works whensort_by = "frequency" andx is character/factor/logical. Must beone of:

  • "match_desc" (default): alphabetical tie-breaking directionfollowsdesc (ascending whendesc = FALSE, descending whendesc = TRUE).

  • "asc": ties are always broken byascending alphabeticalorder, regardless ofdesc.

  • "desc": ties are always broken bydescending alphabeticalorder, regardless ofdesc.

verbose

verbose (flag)

Details

Ifx includes ‘ties’ (equal values), theties.method argument determines how the rank value is decided. Must be one of:

NA values are never considered to be equal:for na.last = TRUE and na.last = FALSEthey are given distinct ranks in the order in which they occur in x.

Value

The ranked vector

Note

Whensort_by = "frequency", ties based on frequency are broken byalphabetical order of the terms. Usefreq_tiebreak to control whetherthat alphabetical tie-breaking is ascending, descending, or followsdesc.

Whensort_by = "frequency" and input is character, ties.method is ignored. Each distinct element level gets its own rank, and each rank is 1 unit away from the next element, irrespective of how many duplicates

Examples

# ------------------## CATEGORICAL INPUT# ------------------fruits <- c("Apple", "Orange", "Apple", "Pear", "Orange")# rank alphabeticallysmartrank(fruits)#> [1] 1.5 3.5 1.5 5.0 3.5# rank based on frequencysmartrank(fruits, sort_by = "frequency")#> [1] 2.5 4.5 2.5 1.0 4.5# rank based on descending order of frequencysmartrank(fruits, sort_by = "frequency", desc = TRUE)#> [1] 1.5 3.5 1.5 5.0 3.5# sort fruits vector based on rankranks <- smartrank(fruits,sort_by = "frequency", desc = TRUE)fruits[order(ranks)]#> [1] "Apple"  "Apple"  "Orange" "Orange" "Pear"# ------------------## NUMERICAL INPUT# ------------------# rank numericallysmartrank(c(1, 3, 2))#> [1] 1 3 2# rank numerically based on descending ordersmartrank(c(1, 3, 2), desc = TRUE)#> [1] 3 1 2# always rank numeric vectors based on values, irrespective of sort_bysmartrank(c(1, 3, 2), sort_by = "frequency")#> smartrank: Sorting a non-categorical variable. Ignoring `sort_by` and sorting numerically#> [1] 1 3 2

[8]ページ先頭

©2009-2025 Movatter.jp