* fix: replace sh with shell fix replace terminal with shell fix replace node with js fix replace output with shell fix replace cs with csharp fix replace c++ with cpp fix replace c# with csharp fix replace javasctipt with js fix replace syntax with js fix replace unix with shell fix replace linux with shell fix replace java 8 with java fix replace swift4 with swift fix replace react.js with jsx fix replace javascriot with js fix replace javacsript with js fix replace c++ - with cpp fix: corrected various typos fix: replace Algorithm with nothing fix: replace xaml with xml fix: replace solidity with nothing fix: replace c++ with cpp fix: replace txt with shell fix: replace code with json and css fix: replace console with shell
		
			
				
	
	
	
		
			3.2 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	title
| title | 
|---|
| Boyer-Moore | 
Boyer-Moore
This algorithm has been shown to be more effective than the Knuth-Morris-Pratt algorithm and others for pattern matching in natural languages like English. It relies on the use of two key heuristics, namely:
- The looking-glass heuristic: where the pattern P is compared to a substring of the text T starting from P’s last letter
- Character-jump heuristic: if there is a mismatch at T[i] = c then
- If P contains c, shift P to line up the last instance of c in P with T[i]
- Else move P to align P[0] with T[i+1]
 
Before applying these heuristics however, the algorithm analyses pattern P and alphabet Σ to create a last occurrence function. This function ties the letters of the alphabet to the letters in P according to where they occur in P.
So if the last occurence of the letter c in P is at index P[i] then at L(c) the index i of that letter will be stored. If the letter c does not occur in P, -1 will be stored there.
Example:
Σ = {e, f, g, h} P = egef
| c | e | f | g | h | 
| L(c) | 2 | 3 | 1 | -1 | 
The last occurrence function can be stored as an array indexed by the numeric codes of the characters. This function can be calculated in O(m+s) time where m is the length of the pattern and s the size of the alphabet.
Code of Boyer-Moore Algorithm in C++
#include<iostream>
#define MAXCHAR 256 //there are 256 ASCII characters
using namespace std;
int min(int num1, int num2) {
  if (num1 < num2) {
     return num1;
  }
  return num2;
}
void makeLOFunction(string pattern, int LOFunction[MAXCHAR]) {
    int n = pattern.size();
    for (int c = 0; c < MAXCHAR; c++)
        LOFunction[c] = -1;
    for (int c = 0; c < n; c++)
        LOFunction[(int)pattern[c]] = c;
}
int boyerMoore(string pattern, string text) {
    int m = pattern.size();
    int n = text.size();
    int LOFunction[MAXCHAR];
    makeLOFunction(pattern, LOFunction); //fill array of last occurrence function
    int i = m-1;
    int j = m-1;
    do {
       if (text[i] == pattern[j]) {
           if (j == 0) {
               cout << "Pattern starts at index " << i << "\n";
               return 1; //match found at i
           } else {
               i--;
               j--; //move backwards through pattern
           }
       } else {
           int leap = LOFunction[(int)text[i]]; //character jump
           i = i + m - min(j, leap+1);
           j = m - 1;
       }
    }
    while (i <= n-1); //while not reached end of text
    cout << "Match not found\n";
    return -1; //no match found
}
int main() {
    string pattern = "abacab";
    string text = "abacaabadcabacabaabb";
    boyerMoore(pattern, text);
}
References
- Aliaga, D (2016) Strings and Pattern Matching, lecture notes, Data Structures and Algorithms CS251, Purdue University, delivered 9 April 2016.
- Christine, S (2018) Bad Character Heuristic [Source code]. https://www.tutorialspoint.com/Bad-Character-Heuristic
- Buerger, H (2012) Boyer-Moore Algorithm [Source code]. https://gist.github.com/obstschale/3060059