Movatterモバイル変換

Some of the methods provided innetrankr arecomputationally very expensive. Computing all rankings of a partialranking, for instance, is a NP-hard problem such that using the functionexact_rank_prob() quickly becomes infeasible. This articleprovides some guidelines for when the use of this function is possible.Additionally, the quality of the approximation functionsapprox_* and sampling rankings withmcmc_rank_prob() forexpected ranks andrelative rank probabilities are assessed.

(The data and code to replicate the benchmark results can befound in thedata-raw folder ongithub)

Runtimes exact probabilities

The below figure shows the runtime ofexact_rank_prob()for a sample of 3,000 partial rankings with 10 to 20 nodes and varyingdegree of completeness.

Not surprisingly, the runtime increases quickly with the number ofnodes and the number of incomparable pairs in the partial ranking. As avery crude rule of thumb: As long as a partial ranking has less than 30elements, it is always save to runexact_rank_prob().Beyond 30 elements, it is advisable to only use the function if a highfraction of pairs of elements is already comparable. The more elements,the higher this fraction should be.

Approximating expected ranks

netrankr implements five methods to approximate expectedranks which are given by the functionsapprox_rank_expected() andmcmc_rank_prob().The four methods implemented in the former are only based on structuralfeatures of the partial ranking, while the latter is based on samplingrankings (almost) uniformly at random from the set of all rankings.Consult the help files for a more detailed description and references.The below figure shows the (averaged) mean absolute error compared tothe exact expected ranks of the five methods on the set of 3000 partialrankings from above. The number of drawn samples for the mcmc functionis set to\(n^5\), where\(n\) is the number of elements in thepartial ranking.

The basiclocal partial order model performs considerablyworse than the other methods on almost all partial rankings. Itsgeneralized version outperforms the methods based on the relative rankprobabilities (loof1 andloof2) if the number ofincomparable pairs is high. The mcmc method generally yields the bestapproximations, especially with increasing number of elements. However,its performance seems to get worse when almost none of the elements arecomparable. This issue is discussed in the section on choosing thenumber of samples further down.

Approximating relative ranks

Relative ranks can either be approximated with the iterative functionapprox_rank_relative() or again via sampling rankings(almost) uniformly at random withmcmc_rank_prob(). Thebelow figure shows the (averaged) mean absolute error compared to theexact relative rank probabilities on the set of 13000 partial rankingsfrom above. The number of drawn samples for the mcmc function is set to\(n^4\), where\(n\) is the number of elements in thepartial ranking. The number of iterative steps inapprox_rank_relative() is set to 1 (no iteration), 5, 10and 15 respectively.

Clearly, the non-iterative approximation performs worse on allpartial rankings. The more iterations the better the approximationquality seems to be, yet the gain in quality going from 10 to 15iterations seems negligible. The mcmc based function again performsbetter except in the region of low comparability.

MCMC sampling of rankings

The results above have shown that approximating expected ranks andrelative rank probabilities on the basis of a random sample generallygive the best results, except in the region of low comparability. Thisproblem can be mitigated by increasing the number of samples. The belowboxplot shows the mean absolute error for the expected ranks of an emptypartial ranking with 10 elements, when the number of samples isincreased. For each sample size, 100 repetition were done.

The same procedure is carried out for the relative ranks below.

That is, increasing the number of samples (quite naturally) leads tobetter approximations, but of course also comes with an increase inrunning time, especially for larger partial rankings.

Runtimes of approximation

The functionmcmc_rank_prob() generally gives the bestapproximations. The larger the number of samples, the better. Since thenumber of samples should be at least cubic in the number of elements, itis limited to partial rankings with a couple of hundred elements.

Although the functionapprox_rank_expected() andapprox_rank_relative() performed the worst in the abovetests, they are computationally the least expensive with a timecomplexity of\(\mathcal{O}(n^2)\).Thus, they are able to at least give a rough approximation also forpartial rankings beyond 1000 elements.

Session info

## R version 4.4.2 (2024-10-31)## Platform: aarch64-apple-darwin20## Running under: macOS Sequoia 15.3## ## Matrix products: default## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0## ## locale:## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8## ## time zone: Europe/Berlin## tzcode source: internal## ## attached base packages:## [1] stats     graphics  grDevices utils     datasets  methods   base     ## ## loaded via a namespace (and not attached):##  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.50        ##  [5] cachem_1.1.0      knitr_1.49        htmltools_0.5.8.1 rmarkdown_2.29   ##  [9] lifecycle_1.0.4   cli_3.6.3         sass_0.4.9        jquerylib_0.1.4  ## [13] compiler_4.4.2    tools_4.4.2       evaluate_1.0.3    bslib_0.8.0      ## [17] yaml_2.3.10       rlang_1.1.5       jsonlite_1.8.9