Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Question about stringdist() #104

Open
@JackGuo15

Description

@JackGuo15

Hi Mark,

I hope you're doing well. My name is Ruohan, and I'm a second-year PhD student at UCL. I'm currently using the stringdist package to measure linguistic distance, and I’ve found it incredibly useful. However, I’ve encountered a few issues that I’m hoping you can help clarify.

I’ve been working through the R manual for stringdist (https://cran.r-project.org/web/packages/stringdist/stringdist.pdf), which discusses how different edits (deletion, insertion, substitution, transposition) can be weighted (on page 20). For example, in the case stringdist('ab', 'ba', weight=c(1,1,1,0.5)), the output is "0.5," suggesting that a transposition was performed.

Building on this example, I tried the following cases:

  • 1. stringdist('ab', 'a', weight=c(0.5, 1, 1, 1))
  • I expected an output of "0.5" due to a weighted deletion, but the output was "1."
  • 2. stringdist('ab', 'a', weight=c(1, 0.5, 1, 1))
  • This returned "0.5," which seems to indicate an insertion rather than a deletion.
  • 3. stringdist('a', 'ab', weight=c(0.5, 1, 1, 1))
  • Here, I received the "0.5" output, indicating a weighted deletion.

Given these results, I’m wondering if I might have misunderstood the string distance calculation. Specifically, I assumed that stringdist('ab', 'a') would attempt to match 'ab' to 'a' by deleting a character, while stringdist('a', 'ab') would result in an insertion. Could you clarify how the algorithm determines whether to apply an insertion or deletion in these cases?

Additionally, when I tried stringdist('abc', 'ca', method = "dl", weight = c(1, 0.1, 0.01, 0.001)), I received an output of "0.002," which suggests that two transpositions were performed to match "abc" to "ca." Shouldn’t this also involve a deletion or insertion?

I look forward to your insights. Thank you very much for your time.

Best wishes,
Ruohan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp