Naive string matcher
PROBLEM
The text is an array T.1,T.2,...,T.N and the pattern is an array P.1,P.2,...,P.M. The elements of P. and T. are characters drawn from a finite alphabet Sigma., Sigma. is an array of R elements. We say that pattern P. occurs with shift S in text T. if 0<=S<=N-M and if T.SpJ=P.J, where SpJ indicates S+J, for J=1,...,M. String-matching problem is the problem of finding all valid shifts with which a given pattern occurs in a given text.
ALGORITHM
The worst-case running time of naive string-matcher is O(N**2).
PRACTICE
When a text and a pattern are real strings then the fastest solution is by
REAL_STRING_MATCHER. In general case there is not an unambiguous winner. You have to test subroutines:
on your data. Example is following, where M respectively R is number of elements in P. respectively in Sigma. and S is number of valid shifts:
Running time in seconds for N=100000 |
Algorithm |
M=10,R=1999,S=50 |
M=10,R=4,S=10000 |
M=100,R=4,S=10000 |
Naive |
16 |
25 |
150 |
KMP |
21 |
22 |
23 |
Boyer-Moore |
4 |
15 |
138 |
IMPLEMENTATION
Unit: internal subroutine
Global variables: array T.1,...,T.N of strings, array P.1,...,P.M of strings
Parameters: a positive integer N - number of elements in T., a positive integer M - number of elements in P.
Output: displays on the screen Pattern occurs with shift S for all valid shift S
NAIVE_STRING_MATCHER:
procedure expose T. P.
parse arg M, N
do S = 0 to N - M + 1
do J = 1 to M
SpJ = S + J
if P.J <> T.SpJ then iterate S
end
say "Pattern occurs with shift" S
end
return
|
CONNECTIONS
Literature
Cormen T. H., Leiserson Ch. E., Rivest R. L. Introduction to Algorithms
The MIT Press, Cambridge, 1990