Movatterモバイル変換

User-based k-nearest neighbors in rrecsys

Given a target user and her positively rated items, the algorithm will identify the \(k\)-most similar users of the target user.

The choice of the \(k\) nearest neighbors for the neighborhood formation results in a tradeoff:a very small \(k\) leads to few candidate items that can be recommended because there are not a lot of neighbors to support the predictions. In contrast, a very large \(k\) impacts precision as the particularities of user's preferences can be blunted due to the large neighborhood size. In most related works \(k\) has been set to be in the range of values from 10 to 100, where the optimum \(k\) also depends on data characteristics such as sparsity.

The similarity is measured based on three algorithms: cosine(simFunct ='cos') and Pearson Correlation(simFunct = 'Pearson').

For the Rating Prediction task, to train a model with this algorithm, it is required to define an additional argument,neigh the neighborhood size.

data("ml100k")d <- defineData(ml100k)e <- evalModel(d, folds = 2)evalPred(e, "ubknn", simFunct = "Pearson", neigh = 10)

For the Item Recommendation task, to provide item recommendations, it is required to define two additional arguments,positiveThreshold the threshold for “positive” ratings, and thetopN the number of recommended items.

data("ml100k")d <- defineData(ml100k)e <- evalModel(d, folds = 2)evalRec(e, "ubknn", simFunct = "Pearson", neigh = 10, positiveThreshold = 3, topN = 3)

Theneigh default value is 10.ThepositiveThreshold default value is 3.ThetopN default value is 10.

The returned object is of typeUBclass.

To get more details about the slots read the referencemanual.

[8]ページ先頭