Knuth-Morris-Pratt matcher

PROBLEM
The text is an array T.1,T.2,...,T.N and the pattern is an array P.1,P.2,...,P.M. The elements of P. and T. are characters drawn from a finite alphabet Sigma., Sigma. is an array of R elements. We say that pattern P. occurs with shift S in text T. if 0<=S<=N-M and if T.SpJ=P.J, where SpJ indicates S+J, for J=1,...,M. String-matching problem is the problem of finding all valid shifts with which a given pattern occurs in a given text.

ALGORITHM
The Knuth-Morris-Pratt algorithm runs in time O(M+N), as well as in the worst case. It was invented independently by D. E. Knuth and V. R. Pratt and by J. H. Morris.

PRACTICE
When a text and a pattern are real strings then the fastest solution is by REAL_STRING_MATCHER. In general case there is not an unambiguous winner. You have to test subroutines:

NAIVE_STRING_MATCHER
KMP_MATCHER
BOYER_MOOR_MATCHER

on your data. Example, where M respectively R is number of elements in P. respectively in Sigma. and S is number of valid shifts:

Running time in seconds for N=100000
Algorithm M=10,R=1999,S=50 M=10,R=4,S=10000 M=100,R=4,S=10000
Naive 16  25  150 
KMP 21  22  23 
Boyer-Moore 15  138 

IMPLEMENTATION
Unit: internal subroutine
 
Global variables: array T.1,...,T.N of strings, array P.1,...,P.M of strings
 
Parameters: a positive integer N - number of elements in T., a positive integer M - number of elements in P.
 
Output: displays on the screen Pattern occurs with shift S for all valid shift S
 


KMP_MATCHER: procedure expose T. P.
parse arg M, N
call COMPUTE_PREFIX_FUNCTION M
Q = 0
do I = 1 to N
  Qp1=Q + 1
  do while Q > 0 & P.Qp1 <> T.I
    Q = Pi.Q; Qp1 = Q + 1
  end
  if P.Qp1 = T.I then Q = Q + 1
  if Q = M
    then do
      say "Pattern occurs with shift" I - M
      Q = Pi.Q
    end
end
return
 
COMPUTE_PREFIX_FUNCTION:
procedure expose Pi. P.
parse arg M
Pi.1 = 0; K = 0
do Q = 2 to M
  Kp1 = K + 1
  do while K > 0 & P.Kp1 <> P.Q
    K = Pi.K; Kp1 = K + 1
  end
  if P.Kp1 = P.Q then K = K + 1
  Pi.Q = K
end
return

 

CONNECTIONS
String-Matching Problem
     Matcher for real string
     Naive string matcher
     Boyer-Moore algorithm

Literature
Cormen T. H., Leiserson Ch. E., Rivest R. L. Introduction to Algorithms
The MIT Press, Cambridge, 1990


Cover Contents Index Main page Rexx page   Mail

last modified 3rd October 2001
Copyright © 2000-2001 Vladimir Zabrodsky
Czech Republic
 

1