Movatterモバイル変換

[0]ホーム

Jump to content

Talk:String-searching algorithm

Page contents not supported in other languages.

From Wikipedia, the free encyclopedia

This

level-5 vital article is ratedStart-class on Wikipedia'scontent assessment scale.
It is of interest to the followingWikiProjects:

Computer scienceHigh‑importance

This article is within the scope ofWikiProject Computer science, a collaborative effort to improve the coverage ofComputer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can jointhe discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science

High

This article has been rated asHigh-importance on theproject's importance scale.

Things you can helpWikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (seeList of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles inCategory:Computer science and sub-categories with{{WikiProject Computer science}}

Untitled

[edit]

Moved from article

These descriptions are insufficient!

Couldm somebody expand this entry ?

The "Finite Automata" link is entirely unhelpful, redirection to a page about finite state machines that has no information on string searching. --Furrykef 05:53, 29 Jun 2004 (UTC)

Nasty encodings

[edit]

Some mention should be made of nasty character encodings that require special precautions when searching..........Plugwash 11:35, 21 March 2006 (UTC)[reply]

Matching times

[edit]

Are the matching times listed in the comparison table factual at all?I see most of them have been marked as Θ(n). Is it just a stub? References?--Bisqwit 11:46, 26 May 2006 (UTC)[reply]

Also, Θ() notation is used incorrectly, e.g. naive string searching is O(nm) -- when the two strings are the same, running time is Θ(n).

DFA search image

[edit]

I've just fixed up the DFA imageImage:DFA search mommy.svg so that it renders again. It should be useful here as a demonstration of the simple DFA-based algorithm, but I'm not sure where to put it. Please assist.Dcoetzee 03:57, 2 April 2007 (UTC)[reply]

inclusion of link for more algorithms on exact string search

[edit]

Hi,

  I would like to include the following link which has more algorithms for exact string searchEXACT STRING MATCHING ALGORITHMS.

  Any other thoughts also welcome.

Thanksczar— Precedingunsigned comment added byCrxz0193 (talk •contribs)06:37, 23 April 2012 (UTC)[reply]

O( (N/M + M) logM ) for alphabet size 2 and up, optimal average case.

[edit]

Alpha Skip Search was published at Combinatorial Pattern Matching 1998. The method appears to be optimal on average, for all alphabet sizes, and large M,N. Keys are sequences of logM letters. Preprocess the pattern into a lookup table with M-logM+1 keys of size logM (log base is alphabet size), encoding the key position in the pattern. Then skip through the text by length M-logM+1, building a logM sized key, and try to match the key in the lookup table. If a match is found, then try naively to match the pattern at the implied start position in the text. The expected run time of the algorithm is O((N/M + M)logM) for all alphabet sizes, including 2! Daniel Pehoushek24.33.70.144 (talk)18:45, 24 August 2014 (UTC)[reply]

Two-way string search algorithm

[edit]

This page seems to forget to mention the existence of thetwo-way string-matching algorithm, which provides a worst-case running time of O(n+m). This is the algorithm used by glibc's strstr() function, for example.EdSchouten (talk)10:18, 20 June 2016 (UTC)[reply]

Reference to "vector_space"

[edit]

Have just corrected where intro said strings are "formally" "vectors", and linked tovector_space. Perhaps someone merely accidentally linked to the wrong sense of "vector"; but it's incorrect as well as unhelpful. It was never mentioned again in the article, never had a connection described, and was unsourced). I changed it toarray, which is not without problems, but at least a common basic analogy, and I removed the "formal", because it the analogy does not approach formality (nor does it need to).

Character strings are sequences of characters, not numeric quantities. There is no intrinsic relationship between "A" and 1 (or 65), or between control characters, ligatures, diacritics, etc., etc. Character strings (not to mention DNA base-pair sequences) existed millennia before ASCII.

Even a *coded* character string in a particular coded character set, is a sequence of integers draw from an extremely small set (even for Unicode), whereas even the infinite set of integers do not constitute a [Field_(mathematics)|field], which is required to define a vector space. Characters do not support addition, subtraction, multiplication, and division:
- What is "C" + 1? Not "D" if it's EBCDIC; and the integer 1 is not a character anyway, so we could only refer to "C" + "1", which may be a totally different thing)
- What is "C" * "#"?
- What is "Z"/" "?

Strings do not behave like a space (or lattice) of numbers. One does not typically do linear algebra on strings. The dot product of two strings is an unusual notion (dot products are important in information theory, on arrays of words; but not for "string searching" per se). Affine transforms would be very strange way to do case-folding or space-normalizaiton (both highly important in string searching), if possible at all.

In strings, the length is highly variable, and the length in and of itself is of little importance; in linear algebra the length of a vector is an extremely important intrinsic property. This again shows they are quite different concepts, and one cannot trivially be a "formal" model for another.

Saying "Formally" suggested the introduction of a mathematical model which covers the important properties of a certain thing. No such model is in evidence. The mathematical properties of vector spaces are not even a basically correct model of strings, and do not approach being adequate, much less conceptually helpful.— Precedingunsigned comment added bySderose (talk •contribs)20:28, 31 August 2017 (UTC)[reply]

Retrieved from "https://en.wikipedia.org/w/index.php?title=Talk:String-searching_algorithm&oldid=1194062896"

Categories:

Hidden category:

All Wikipedia vital articles

[8]ページ先頭