This article has multiple issues. Please helpimprove it or discuss these issues on thetalk page.(Learn how and when to remove these messages) (Learn how and when to remove this message)
|
In computer science, theRaita algorithm is astring searching algorithm which improves the performance ofBoyer–Moore–Horspool algorithm. This algorithm preprocesses the string being searched for the pattern, which is similar toBoyer–Moore string-search algorithm. The searching pattern of particular sub-string in a given string is different from Boyer–Moore–Horspool algorithm. This algorithm was published by Timo Raita in 1991.[1]
Raita algorithm searches for a pattern "P" in a given text "T" by comparing each character of pattern in the given text. Searching will be done as follows. Window for a text "T" is defined as the length of "P".
If everything in the pre-check is successful, then the original comparison starts from the second character to last but one. If there is a mismatch at any stage in the algorithm, it performs the bad character shift function which was computed in pre-processing phase. Bad character shift function is identical to the one proposed in Boyer–Moore–Horspool algorithm.[1]
A modern formulation of a similar pre-check is found instd::string::find, a linear/quadratic string-matcher, in libc++ and libstdc++. Assuming a well-optimized version ofmemcmp, not skipping characters in the "original comparison" tends to be more efficient as the pattern is likely to be aligned.[2]
#include<limits.h>#include<stddef.h>#define ALPHABET_SIZE (1 << CHAR_BITS)/* typically 256 *//* Preprocessing: the BMH bad-match table. */staticinlinevoidpreBmBc(char*pat,size_tlpat,ptrdiff_tbmBc[]){size_ti;for(i=0;i<ALPHABET_SIZE;++i)bmBc[i]=lpat;for(i=0;i<lpat-1;++i)bmBc[pat[i]]=lpat-i-1;}voidRAITA(char*pat,size_tlpat,char*s,size_tn){ptrdiff_tbmBc[ALPHABET_SIZE];/* Quick edge cases. */if(lpat==0||lpat>n)return;if(lpat==1){char*match_ptr=s;while(match_ptr<s+n){match_ptr=memchr(match_ptr,pat[0],n-(match_ptr-s));if(match_ptr!=NULL){OUTPUT(match_ptr-s);match_ptr++;}elsereturn;}}preBmBc(pat,lpat,bmBc);/* The prematch-window. */charfirstCh=pat[0];charmiddleCh=pat[lpat/2];charlastCh=pat[lpat-1];/* Searching */ptrdiff_tj=0;while(j<=n-m){charc=s[j+lpat-1];/* This could harm data locality on long patterns. For these consider reducing * the number of pre-tests, or using more clustered indices. */if(lastCh==c&&middleCh==s[j+lpat/2]&&firstCh==s[j]&&memcmp(&pat[1],&s[j+1],lpat-2)==0)OUTPUT(j);j+=bmBc[c];}}
Pattern: abddb
Text:abbaabaabddbabadbb
Pre- Processing stage:
a b d 4 3 1
Attempt 1: abbaabaabddbabadbb ....b Shift by 4 (bmBc[a])
Comparison of last character of pattern to rightmost character in the window. It's a mismatch and shifted by 4 according to the value in pre-processing stage.
Attempt 2: abbaabaabddbabadbb A.d.B Shift by 3 (bmBc[b])
Here last and first character of the pattern are matched but middle character is a mismatch. So the pattern is shifted according to the pre-processing stage.
Attempt 3: abbaabaabddbabadbb ABDDB Shift by 3 (bmBc[b])
We found exact match here but the algorithm continues until it can't move further.
Attempt 4: abbaabaABDDBabadbb ....b Shift by 4 (bmBc[a])
At this stage, we need to shift by 4 and we can't move the pattern by 4. So, the algorithm terminates. Letters in capital letter are exact match of the pattern in the text.