This algorithm has been shown to be more effective than the Knuth-Morris-Pratt algorithm and others for pattern matching in natural languages like English. It relies on the use of [two key heuristics][1], namely:
* The looking-glass heuristic: where the pattern P is compared to a substring of the text T starting from P’s last letter
* Character-jump heuristic: if there is a mismatch at T[i] = c then
- If P contains c, shift P to line up the last instance of c in P with T[i]
- Else move P to align P[0] with T[i+1]
Before applying these heuristics however, the algorithm analyses pattern P and alphabet Σ to create a last occurrence function. This function ties the letters of the alphabet to the letters in P according to where they occur in P.
So if the last occurence of the letter c in P is at index P[i] then at L(c) the index i of that letter will be stored. If the letter c does not occur in P, -1 will be stored there.
#### Example:
__Σ =__ {e, f, g, h}
__P =__ egef
| | | | | |
|--- |--- |--- |--- |--- |
| __c__ | e | f | g | h |
| __L(c)__ | 2 | 3 | 1 | -1 |
The last occurrence function can be stored as an array indexed by the numeric codes of the characters. This function can be calculated in O(m+s) time where m is the length of the pattern and s the size of the alphabet.